kmx git

Commit	Date	Message
359240b6	2022-02-11T17:56:05	diff: indicate when the file size is "valid" When we know the file size (because we're producing it from a working directory iterator, or an index with an up-to-date cache) then set a flag indicating as such. This removes the ambiguity about a 0 file size, which could indicate that a file exists and is 0 bytes, or that we haven't read it yet.
c19a3c7a	2022-02-07T11:22:04	odb: check for write failures
ceddeed8	2021-11-11T15:20:50	Merge pull request #6104 from libgit2/ethomson/path path: refactor utility path functions
95117d47	2021-10-31T09:45:46	path: separate git-specific path functions from util Introduce `git_fs_path`, which operates on generic filesystem paths. `git_path` will be kept for only git-specific path functionality (for example, checking for `.git` in a path).
81662d43	2021-11-08T14:48:45	Support checking for object existence without refresh Looking up a non-existent object currently always invokes `git_odb_refresh`. If looking up a large batch of objects, many of which may legitimately not exist, this will repeatedly refresh the ODB to no avail. Add a `git_odb_exists_ext` that accepts flags controlling the ODB lookup, and add a flag to suppress the refresh. This allows the user to control if and when they refresh (for instance, refreshing once before starting the batch).
f0e693b1	2021-09-07T17:53:49	str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
31ecaca2	2021-09-30T08:11:40	hash: hash functions operate on byte arrays not git_oids Separate the concerns of the hash functions from the git_oid functions. The git_oid structure will need to understand either SHA1 or SHA256; the hash functions should only deal with the appropriate one of these.
2a713da1	2021-09-29T21:31:17	hash: accept the algorithm in inputs
ea285904	2020-02-18T00:02:13	midx: Introduce git_odb_write_multi_pack_index() This change introduces git_odb_write_multi_pack_index(), which creates a `multi-pack-index` file from all the `.pack` files that have been loaded in the ODB. Fixes: #5399
231ca4fa	2021-02-23T19:33:34	Proof-of-concept for a more aggressive GIT_UNUSED() This adds a `-Wunused-result`-proof `GIT_UNUSED()`, just to demonstrate that it works. With this, sortedcache.h is now completely `GIT_WARN_UNUSED_RESULT`-annotated!
e5975f36	2021-07-30T11:37:12	tests: reset odb backend priority
cd460522	2020-04-20T22:16:52	odb: Implement option for overriding of default odb backend priority Introduce GIT_OPT_SET_ODB_LOOSE_PRIORITY and GIT_OPT_SET_ODB_PACKED_PRIORITY to allow overriding the default priority values for the default ODB backends. Libgit2 has historically assumed that most objects for long- running operations will be packed, therefore GIT_LOOSE_PRIORITY is set to 1 by default, and GIT_PACKED_PRIORITY to 2. When a client allows libgit2 to set the default backends, they can specify an override for the two priority values in order to change the order in which each ODB backend is accessed.
2370e491	2021-07-26T16:27:54	Merge pull request #5765 from lhchavez/cgraph-revwalks commit-graph: Use the commit-graph in revwalks
cf9196bd	2021-05-30T10:42:25	Tolerate readlink size less than st_size
31d9c24b	2021-05-06T16:32:14	filter: internal git_buf filter handling function Introduce `git_filter_list__convert_buf` which behaves like the old implementation of `git_filter_list__apply_data`, where it might move the input data buffer over into the output data buffer space for efficiency. This new implementation will do so in a more predictible way, always freeing the given input buffer (either moving it to the output buffer or filtering it into the output buffer first). Convert internal users to it.
25b75cd9	2021-03-10T07:06:15	commit-graph: Create `git_commit_graph` as an abstraction for the file This change does a medium-size refactor of the git_commit_graph_file and the interaction with the ODB. Now instead of the ODB owning a direct reference to the git_commit_graph_file, there will be an intermediate git_commit_graph. The main advantage of that is that now end users can explicitly set a git_commit_graph that is eagerly checked for errors, while still being able to lazily use the commit-graph in a regular ODB, if the file is present.
248606eb	2021-01-05T17:20:27	commit-graph: Use the commit-graph in revwalks This change makes revwalks a bit faster by using the `commit-graph` file (if present). This is thanks to the `commit-graph` allow much faster parsing of the commit information by requiring near-zero I/O (aside from reading a few dozen bytes off of a `mmap(2)`-ed file) for each commit, instead of having to read the ODB, inflate the commit, and parse it. This is done by modifying `git_commit_list_parse()` and letting it use the ODB-owned commit-graph file. Part of: #5757
4ae41f9c	2020-08-02T16:26:25	Make the odb race-free This change adds all the necessary locking to the odb to avoid races in the backends. Part of: #5592
931fd6b0	2020-04-05T17:29:13	odb: use GIT_ASSERT
cc1d7f5c	2020-08-01T17:47:20	Improve the support of atomics This change: * Starts using GCC's and clang's `__atomic_` intrinsics instead of the `__sync_` ones, since the former supercede the latter (and can be safely replaced by their equivalent `__atomic_` version with the sequentially consistent model). Makes `git_atomic64`'s value `volatile`. Otherwise, this will make ThreadSanitizer complain. * Adds ways to load the values from atomics. As it turns out, unsynchronized read are okay only in some architectures, but if we want to be correct (and make ThreadSanitizer happy), those loads should also be performed with the atomic builtins. * Fixes two ThreadSanitizer warnings, as a proof-of-concept that this works: - Avoid directly accessing `git_refcount`'s `owner` directly, and instead makes all callers go through the `GIT_REFCOUNT_*()` macros, which also use the atomic utilities. - Makes `pool_system_page_size()` race-free. Part of: #5592
3a197ea7	2020-06-27T12:33:32	Make the tests pass cleanly with MemorySanitizer This change: * Initializes a few variables that were being read before being initialized. * Includes https://github.com/madler/zlib/pull/393. As such, it only works reliably with `-DUSE_BUNDLED_ZLIB=ON`.
c6184f0c	2020-06-08T21:07:36	tree-wide: do not compile deprecated functions with hard deprecation When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h" header to be empty. As a result, no function declarations are made available to callers, but the implementations are still available to link against. This has the problem that function declarations also aren't visible to the implementations, meaning that the symbol's visibility will not be set up correctly. As a result, the resulting library may not expose those deprecated symbols at all on some platforms and thus cause linking errors. Fix the issue by conditionally compiling deprecated functions, only. While it becomes impossible to link against such a library in case one uses deprecated functions, distributors of libgit2 aren't expected to pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real" hard deprecation still makes sense in the context of CI to test we don't use deprecated symbols ourselves and in case a dependant uses libgit2 in a vendored way and knows it won't ever use any of the deprecated symbols anyway.
fb2198db	2019-06-23T16:23:59	futils_filesize: use `uint64_t` for object size Instead of using a signed type (`off_t`) use `uint64_t` for the maximum size of files.
bed9fc6b	2019-06-23T15:16:47	odb: use `git_object_size_t` for object size Instead of using a signed type (`off_t`) use a new `git_object_size_t` for the sizes of objects.
e54343a4	2019-06-29T09:17:32	fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.
658022c4	2019-07-18T13:53:41	configuration: cvar -> configmap `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.
9f723c97	2019-06-26T14:49:37	docs: fixups
5d92e547	2019-06-08T17:28:35	oid: `is_zero` instead of `iszero` The only function that is named `issomething` (without underscore) was `git_oid_iszero`. Rename it to `git_oid_is_zero` for consistency with the rest of the library.
459ac856	2019-02-23T18:42:53	odb: provide a free function for custom backends Custom backends can allocate memory when reading objects and providing them to libgit2. However, if an error occurs in the custom backend after the memory has been allocated for the custom object but before it's returned to libgit2, the custom backend has no way to free that memory and it must be leaked. Provide a free function that corresponds to the alloc function so that custom backends have an opportunity to free memory before they return an error.
790aae77	2019-02-23T18:40:43	odb: rename git_odb_backend_malloc for consistency The `git_odb_backend_malloc` name is a system function that is provided for custom ODB backends and allows them to allocate memory for an ODB object in the read callback. This is important so that libgit2 can later free the memory used by an ODB object that was read from the custom backend. However, the name _suggests_ that it actually allocates a `git_odb_backend`. It does not; rename it to make it clear that it actually allocates backend _data_.
a1ef995d	2019-02-21T10:33:30	indexer: use git_indexer_progress throughout Update internal usage of `git_transfer_progress` to `git_indexer_progreses`.
bbdcd450	2019-02-20T10:40:06	cache: fix misnaming of `git_cache_free` Functions that free a structure's contents but not the structure itself shall be named `dispose` in the libgit2 project, but the function `git_cache_free` does not follow this naming pattern. Fix this by renaming it to `git_cache_dispose` and adjusting all callers to make use of the new name.
6b3730d4	2019-02-16T19:55:30	Fix a memory leak in odb_otype_fast() This change frees a copy of a cached object in odb_otype_fast().
dd45539d	2019-02-16T22:06:58	Fix a _very_ improbable memory leak in git_odb_new() This change fixes a mostly theoretical memory leak in got_odb_new() that can only manifest if git_cache_init() fails due to running out of memory or not being able to acquire its lock.
c6cac733	2019-01-20T22:40:38	blob: validate that blob sizes fit in a size_t Our blob size is a `git_off_t`, which is a signed 64 bit int. This may be erroneously negative or larger than `SIZE_MAX`. Ensure that the blob size fits into a `size_t` before casting.
f673e232	2018-12-27T13:47:34	git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
f7416509	2019-01-20T20:15:31	Fix odb foreach to also close on positive error code In include/git2/odb.h it states that callback can also return positive value which should break looping. Implementations of git_odb_foreach() and pack_backend__foreach() did not respect that.
b2c2dc64	2019-01-19T01:36:40	Merge pull request #4940 from libgit2/ethomson/git_obj More `git_obj` to `git_object` updates
cd350852	2019-01-17T10:40:13	object_type: GIT_OBJECT_BAD is now GIT_OBJECT_INVALID We use the term "invalid" to refer to bad or malformed data, eg `GIT_REF_INVALID` and `GIT_EINVALIDSPEC`. Since we're changing the names of the `git_object_t`s in this release, update it to be `GIT_OBJECT_INVALID` instead of `BAD`.
7b453e7e	2019-01-05T22:12:48	Fix a bunch of warnings This change fixes a bunch of warnings that were discovered by compiling with `clang -target=i386-pc-linux-gnu`. It turned out that the intrinsics were not necessarily being used in all platforms! Especially in GCC, since it does not support __has_builtin. Some more warnings were gleaned from the Windows build, but I stopped when I saw that some third-party dependencies (e.g. zlib) have warnings of their own, so we might never be able to enable -Werror there.
168fe39b	2018-11-28T14:26:57	object_type: use new enumeration names Use the new object_type enumeration names within the codebase.
0fcd0563	2018-08-06T12:00:21	odb: fix use of wrong printf formatters The `git_odb_stream` members `declared_size` and `received_bytes` are both of the type `git_off_t`, which we usually defined to be a 64 bit signed integer. Thus, passing these members to "PRIdZ" formatters is not correct, as they are not guaranteed to accept big enough numbers. Instead, use the "PRId64" formatter, which is able to represent 64 bit signed integers.
ecf4f33a	2018-02-08T11:14:48	Convert usage of `git_buf_free` to new `git_buf_dispose`
a52b4c51	2018-03-23T09:59:46	odb: fix writing to fake write streams In commit 7ec7aa4a7 (odb: assert on logic errors when writing objects, 2018-02-01), the check for whether we are trying to overflowing the fake stream buffer was changed from returning an error to raising an assert. The conversion forgot though that the logic around `assert`s are basically inverted. Previously, if the statement stream->written + len > steram->size evaluated to true, we would return a `-1`. Now we are asserting that this statement is true, and in case it is not we will raise an error. So the conversion to the `assert` in fact changed the behaviour to the complete opposite intention. Fix the assert by inverting its condition again and add a regression test.
a43bcd2c	2018-02-09T17:31:50	odb: fix memory leaks due to not freeing hash context
619f61a8	2018-02-01T06:22:36	odb: error when we can't create object header Return an error to the caller when we can't create an object header for some reason (printf failure) instead of simply asserting.
7ec7aa4a	2018-02-01T05:54:57	odb: assert on logic errors when writing objects There's no recovery possible if we're so confused or corrupted that we're trying to overwrite our memory. Simply assert.
138e4c2b	2018-02-01T06:35:31	git_odb__hashfd: propagate error on failures
35ed256b	2018-02-01T05:11:05	git_odb__hashobj: provide errors messages on failures Provide error messages on hash failures: assert when given invalid input instead of failing with a user error; provide error messages on program errors.
59d99adc	2018-01-31T09:34:52	odb: check for alloc errors on hardcoded objects It's unlikely that we'll fail to allocate a single byte, but let's check for allocation failures for good measure. Untangle `-1` being a marker of not having found the hardcoded odb object; use that to reflect actual errors.
ef902864	2018-01-31T09:30:51	odb: error when we can't alloc an object At the moment, we're swallowing the allocation failure. We need to return the error to the caller.
97f9a5f0	2017-12-17T01:12:49	odb: provide length and type with streaming read The streaming read functionality should provide the length and the type of the object, like the normal read functionality does.
275f103d	2018-01-12T08:59:40	odb: reject reading and writing null OIDs The null OID (hash with all zeroes) indicates a missing object in upstream git and is thus not a valid object ID. Add defensive measurements to avoid writing such a hash to the object database in the very unlikely case where some data results in the null OID. Furthermore, add shortcuts when reading the null OID from the ODB to avoid ever returning an object when a faulty repository may contain the null OID.
0c7f49dd	2017-06-30T13:39:01	Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
cb3010c5	2017-06-12T12:56:40	odb_read_prefix: reset error in backends loop When looking for an object by prefix, we query all the backends so that we can ensure that there is no ambiguity. We need to reset the `error` value between backends; otherwise the first backend may find an object by prefix, but subsequent backends may not. If we do not reset the `error` value then it will remain at `GIT_ENOTFOUND` and `read_prefix_1` will fail, despite having actually found an object.
8d93a11c	2017-05-03T12:38:55	odb: fix printf formatter for git_off_t The fields `declared_size` and `received_bytes` of the `git_odb_stream` are both of type `git_off_t` which is defined as a signed integer. When passing these values to a printf-style string in `git_odb_stream__invalid_length`, though, we format these as PRIuZ, which is unsigned. Fix the issue by using PRIdZ instead, silencing warnings on macOS.
7776db51	2017-05-03T12:15:12	odb: shut up gcc warnings regarding uninitilized variables The `error` variable is used as a return value in the out-section of both `odb_read_1` and `read_prefix_1`. While the value will actually always be initialized inside of this section, GCC fails to realize this due to interactions with the `found` variable: if `found` is set, the error will always be initialized. If it is not, we return early without reaching the out-statements. Shut up the warnings by initializing the error variable, even though it is unnecessary.
e0973bc0	2017-04-28T14:05:15	odb: verify hashes in read_prefix_1 While the function reading an object from the complete OID already verifies OIDs, we do not yet do so for reading objects from a partial OID. Do so when strict OID verification is enabled.
14109620	2017-04-28T14:03:54	odb: improve error handling in read_prefix_1 The read_prefix_1 function has several return statements springled throughout the code. As we have to free memory upon getting an error, the free code has to be repeated at every single retrun -- which it is not, so we have a memory leak here. Refactor the code to use the typical `goto out` pattern, which will free data when an error has occurred. While we're at it, we can also improve the error message thrown when multiple ambiguous prefixes are found. It will now include the colliding prefixes.
35079f50	2017-04-21T07:31:56	odb: add option to turn off hash verification Verifying hashsums of objects we are reading from the ODB may be costly as we have to perform an additional hashsum calculation on the object. Especially when reading large objects, the penalty can be as high as 35%, as can be seen when executing the equivalent of `git cat-file` with and without verification enabled. To mitigate for this, we add a global option for libgit2 which enables the developer to turn off the verification, e.g. when he can be reasonably sure that the objects on disk won't be corrupted.
28a0741f	2017-04-10T09:30:08	odb: verify object hashes The upstream git.git project verifies objects when looking them up from disk. This avoids scenarios where objects have somehow become corrupt on disk, e.g. due to hardware failures or bit flips. While our mantra is usually to follow upstream behavior, we do not do so in this case, as we never check hashes of objects we have just read from disk. To fix this, we create a new error class `GIT_EMISMATCH` which denotes that we have looked up an object with a hashsum mismatch. `odb_read_1` will then, after having read the object from its backend, hash the object and compare the resulting hash to the expected hash. If hashes do not match, it will return an error. This obviously introduces another computation of checksums and could potentially impact performance. Note though that we usually perform I/O operations directly before doing this computation, and as such the actual overhead should be drowned out by I/O. Running our test suite seems to confirm this guess. On a Linux system with best-of-five timings, we had 21.592s with the check enabled and 21.590s with the ckeck disabled. Note though that our test suite mostly contains very small blobs only. It is expected that repositories with bigger blobs may notice an increased hit by this check. In addition to a new test, we also had to change the odb::backend::nonrefreshing test suite, which now triggers a hashsum mismatch when looking up the commit "deadbeef...". This is expected, as the fake backend allocated inside of the test will return an empty object for the OID "deadbeef...", which will obviously not hash back to "deadbeef..." again. We can simply adjust the hash to equal the hash of the empty object here to fix this test.
6fd6c678	2017-03-22T20:29:22	Merge pull request #4030 from libgit2/ethomson/fsync fsync all the things
52d03f37	2017-03-03T13:26:29	git_commit_create: freshen tree objects in commit Freshen the tree object that a commit points to during commit time.
1c04a96b	2017-02-28T12:29:29	Honor `core.fsyncObjectFiles`
909d5494	2016-12-29T12:25:15	giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
901434b0	2016-11-14T10:07:37	common: cast precision specifiers to int
becadafc	2016-08-05T19:30:56	odb: only provide the empty tree Only provide the empty tree internally, which matches git's behavior. If we provide the empty blob then any users trying to write it with libgit2 would omit it from actually landing in the odb, which appear to git proper as a broken repository (missing that object).
8f09a98e	2016-07-14T16:23:24	odb: freshen existing objects when writing When writing an object, we calculate its OID and see if it exists in the object database. If it does, we need to freshen the file that contains it.
20302aa4	2016-06-25T23:33:05	Merge pull request #3223 from ethomson/apply Reading patch files
2076d329	2016-06-09T22:50:53	fix error message SHA truncation in git_odb__error_notfound()
6a2d2f8a	2015-06-17T06:42:20	delta: move delta application to delta.c Move the delta application functions into `delta.c`, next to the similar delta creation functions. Make the `git__delta_apply` functions adhere to other naming and parameter style within the library.
1bbcb2b2	2016-03-09T17:47:53	odb: Try to lookup headers in all backends before passthrough
e78d2ac9	2016-03-09T16:41:08	odb: Refactor `git_odb_expand_ids`
4416aa77	2016-03-09T11:29:46	odb: Implement new helper to read types without refreshing
9a786650	2016-03-09T11:00:27	odb: Handle corner cases in `git_odb_expand_ids` The old implementation had two issues: 1. OIDs that were too short as to be ambiguous were not being handled properly. 2. If the last OID to expand in the array was missing from the ODB, we would leak a `GIT_ENOTFOUND` error code from the function.
62484f52	2016-03-08T14:09:55	git_odb_expand_ids: accept git_odb_expand_id array Take (and write to) an array of a struct, `git_odb_expand_id`.
4b1f0f79	2016-03-08T11:44:21	git_odb_expand_ids: rename func, return the type
6c04269c	2016-03-04T00:50:35	git_odb_exists_many_prefixes: query odb for multiple short ids Query the object database for multiple objects at a time, given their object ID (which may be abbreviated) and optional type.
e10144ae	2016-03-04T01:18:30	odb: improved not found error messages When looking up an abbreviated oid, show the actual (abbreviated) oid the caller passed instead of a full (but ambiguously truncated) oid.
a0a1b19a	2015-10-14T19:31:54	odb: Prioritize alternate backends For most real use cases, repositories with alternates use them as main object storage. Checking the alternate for objects before the main repository should result in measurable speedups. Because of this, we're changing the sorting algorithm to prioritize alternates in cases where two backends have the same priority. This means that the pack backend for the alternate will be checked before the pack backend for the main repository but both of them will be checked before any loose backends.
43820f20	2015-10-14T19:24:07	odb: Be smarter when refreshing backends In the current implementation of ODB backends, each backend is tasked with refreshing itself after a failed lookup. This is standard Git behavior: we want to e.g. reload the packfiles on disk in case they have changed and that's the reason we can't find the object we're looking for. This behavior, however, becomes pathological in repositories where multiple alternates have been loaded. Given that each alternate counts as a separate backend, a miss in the main repository (which can potentially be very frequent in cases where object storage comes from the alternate) will result in refreshing all its packfiles before we move on to the alternate backend where the object will most likely be found. To fix this, the code in `odb.c` has been refactored as to perform the refresh of all the backends externally, once we've verified that the object is nowhere to be found. If the refresh is successful, we then perform the lookup sequentially through all the backends, skipping the ones that we know for sure weren't refreshed (because they have no refresh API). The on-disk pack backend has been adjusted accordingly: it no longer performs refreshes internally.
d3b29fb9	2015-10-01T00:50:37	refdb and odb backends must provide `free` function As refdb and odb backends can be allocated by client code, libgit2 can’t know whether an alternative memory allocator was used, and thus should not try to call `git__free` on those objects. Instead, odb and refdb backend implementations must always provide their own `free` functions to ensure memory gets freed correctly.
e5f9df7b	2015-06-29T21:45:04	odb: cast to long long for printf
9f3c18e2	2015-06-02T08:36:15	Fixed build warnings on Xcode 6.1
a6f2ceaf	2015-05-13T12:11:55	Merge pull request #3118 from libgit2/cmn/stream-size odb: make the writestream's size a git_off_t
b0d7f329	2015-05-13T10:23:19	odb: reverse the default backend priorities We currently first look in the loose object dir and then in the packs for objects. When performing operations on recent history this has a higher likelihood of hitting, but when we deal with operations which look further back into the past, we start spending a large amount of time getting ENOTENT from `access`. Reversing the priorities means that long-running operations can get to their objects faster, as we can look at the index data we have in memory (or rather mapped) to figure out whether we have an object, which is faster than going out to the filesystem. The packed backend already implements an optimistic read algorithm by first looking at the packs we know about and only going out to disk to referesh if the object is not found which means that in the case where we do have the object (which will be in the majority for anything that traverses the graph) we can avoid going to to disk entirely to determine whether an object exists. Operations which look at recent history may take a slight impact, but these would be operations which look a lot less at object and thus take less time regardless.
77b339f7	2015-05-12T13:06:33	odb: make the writestream's size a git_off_t Restricting files to size_t is a silly limitation. The loose backend writes to a file directly, so there is no issue in using 63 bits for the size. We still assume that the header is going to fit in 64 bytes, which does mean quite a bit smaller files due to the run-length encoding, but it's still a much larger size than you would want Git to handle.
7dd22538	2015-05-11T10:19:25	centralizing all IO buffer size values
f1453c59	2015-02-12T12:19:37	Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
15d54fdd	2015-02-10T22:34:03	odb__hashlink: check st.st_size before casting
392702ee	2015-02-09T23:41:13	allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
c251f3bb	2014-12-08T16:05:47	win32: remember to cleanup our hash_ctx
e0156651	2014-11-21T13:50:46	odb: `git_odb_object` contents are never NULL This is a contract that we made in the library and that we need to uphold. The contents of a blob can never be NULL because several parts of the library (including the filter and attributes code) expect `git_blob_rawcontent` to always return a valid pointer.
e1ac0101	2014-11-08T14:40:53	odb: hardcode the empty blob and tree git hardocodes these as objects which exist regardless of whether they are in the odb and uses them in the shell interface as a way of expressing the lack of a blob or tree for one side of e.g. a diff. In the library we use each language's natural way of declaring a lack of value which makes a workaround like this unnecessary. Since git uses it, it does however mean each shell application would need to perform this check themselves. This makes it common work across a range of applications and an issue with compatibility with git, which fits right into what the library aims to provide. Thus we introduce the hard-coded empty blob and tree in the odb frontend. These hard-coded objects are checked for before going to the backends, but after the cache check, which means the second time they're used, they will be treated as normal cached objects instead of creating new ones.
530594c0	2014-05-23T05:53:41	odb: clear backend errors on successful read We go through the different backends in order, so it's not an error if at least one of the backends has the data we want.
bc91347b	2014-04-30T11:16:31	Fix remaining init_options inconsistencies There were a couple of "init_opts()" functions a few more cases of structure initialization that I somehow missed.
48e60ae7	2014-04-21T11:23:29	Don't redefine the same callback types, their signatures may change
3ab57816	2014-03-31T23:23:32	Merge pull request #2178 from libgit2/rb/fix-short-id Fix git_odb_short_id and git_odb_exists_prefix bugs
31a14982	2014-03-21T17:36:34	Fix wrong assertion Fixes issue #2196
89499078	2014-03-10T10:53:39	Fix a number of git_odb_exists_prefix bugs The git_odb_exists_prefix API was not dealing correctly when a later backend returned GIT_ENOTFOUND even if an earlier backend had found the object. Additionally, the unit tests were not properly exercising the API and had a couple mistakes in checking the results. Lastly, since the backends are not expected to behavior correctly unless all bytes of the short id are zero except for the prefix, this makes the ODB prefix APIs explicitly clear out the extra bytes so the user doesn't have to be as careful.

359240b6

2022-02-11T17:56:05

diff: indicate when the file size is "valid" When we know the file size (because we're producing it from a working directory iterator, or an index with an up-to-date cache) then set a flag indicating as such. This removes the ambiguity about a 0 file size, which could indicate that a file exists and is 0 bytes, or that we haven't read it yet.

c19a3c7a

2022-02-07T11:22:04

odb: check for write failures

ceddeed8

2021-11-11T15:20:50

Merge pull request #6104 from libgit2/ethomson/path path: refactor utility path functions

95117d47

2021-10-31T09:45:46

path: separate git-specific path functions from util Introduce `git_fs_path`, which operates on generic filesystem paths. `git_path` will be kept for only git-specific path functionality (for example, checking for `.git` in a path).

81662d43

2021-11-08T14:48:45

Support checking for object existence without refresh Looking up a non-existent object currently always invokes `git_odb_refresh`. If looking up a large batch of objects, many of which may legitimately not exist, this will repeatedly refresh the ODB to no avail. Add a `git_odb_exists_ext` that accepts flags controlling the ODB lookup, and add a flag to suppress the refresh. This allows the user to control if and when they refresh (for instance, refreshing once before starting the batch).

f0e693b1

2021-09-07T17:53:49

str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.

31ecaca2

2021-09-30T08:11:40

hash: hash functions operate on byte arrays not git_oids Separate the concerns of the hash functions from the git_oid functions. The git_oid structure will need to understand either SHA1 or SHA256; the hash functions should only deal with the appropriate one of these.

2a713da1

2021-09-29T21:31:17

hash: accept the algorithm in inputs

ea285904

2020-02-18T00:02:13

midx: Introduce git_odb_write_multi_pack_index() This change introduces git_odb_write_multi_pack_index(), which creates a `multi-pack-index` file from all the `.pack` files that have been loaded in the ODB. Fixes: #5399

231ca4fa

2021-02-23T19:33:34

Proof-of-concept for a more aggressive GIT_UNUSED() This adds a `-Wunused-result`-proof `GIT_UNUSED()`, just to demonstrate that it works. With this, sortedcache.h is now completely `GIT_WARN_UNUSED_RESULT`-annotated!

e5975f36

2021-07-30T11:37:12

tests: reset odb backend priority

cd460522

2020-04-20T22:16:52

odb: Implement option for overriding of default odb backend priority Introduce GIT_OPT_SET_ODB_LOOSE_PRIORITY and GIT_OPT_SET_ODB_PACKED_PRIORITY to allow overriding the default priority values for the default ODB backends. Libgit2 has historically assumed that most objects for long- running operations will be packed, therefore GIT_LOOSE_PRIORITY is set to 1 by default, and GIT_PACKED_PRIORITY to 2. When a client allows libgit2 to set the default backends, they can specify an override for the two priority values in order to change the order in which each ODB backend is accessed.

2370e491

2021-07-26T16:27:54

Merge pull request #5765 from lhchavez/cgraph-revwalks commit-graph: Use the commit-graph in revwalks

cf9196bd

2021-05-30T10:42:25

Tolerate readlink size less than st_size

31d9c24b

2021-05-06T16:32:14

filter: internal git_buf filter handling function Introduce `git_filter_list__convert_buf` which behaves like the old implementation of `git_filter_list__apply_data`, where it might move the input data buffer over into the output data buffer space for efficiency. This new implementation will do so in a more predictible way, always freeing the given input buffer (either moving it to the output buffer or filtering it into the output buffer first). Convert internal users to it.

25b75cd9

2021-03-10T07:06:15

commit-graph: Create `git_commit_graph` as an abstraction for the file This change does a medium-size refactor of the git_commit_graph_file and the interaction with the ODB. Now instead of the ODB owning a direct reference to the git_commit_graph_file, there will be an intermediate git_commit_graph. The main advantage of that is that now end users can explicitly set a git_commit_graph that is eagerly checked for errors, while still being able to lazily use the commit-graph in a regular ODB, if the file is present.

248606eb

2021-01-05T17:20:27

commit-graph: Use the commit-graph in revwalks This change makes revwalks a bit faster by using the `commit-graph` file (if present). This is thanks to the `commit-graph` allow much faster parsing of the commit information by requiring near-zero I/O (aside from reading a few dozen bytes off of a `mmap(2)`-ed file) for each commit, instead of having to read the ODB, inflate the commit, and parse it. This is done by modifying `git_commit_list_parse()` and letting it use the ODB-owned commit-graph file. Part of: #5757

4ae41f9c

2020-08-02T16:26:25

Make the odb race-free This change adds all the necessary locking to the odb to avoid races in the backends. Part of: #5592

931fd6b0

2020-04-05T17:29:13

odb: use GIT_ASSERT

cc1d7f5c

2020-08-01T17:47:20

Improve the support of atomics This change: * Starts using GCC's and clang's `__atomic_*` intrinsics instead of the `__sync_*` ones, since the former supercede the latter (and can be safely replaced by their equivalent `__atomic_*` version with the sequentially consistent model). * Makes `git_atomic64`'s value `volatile`. Otherwise, this will make ThreadSanitizer complain. * Adds ways to load the values from atomics. As it turns out, unsynchronized read are okay only in some architectures, but if we want to be correct (and make ThreadSanitizer happy), those loads should also be performed with the atomic builtins. * Fixes two ThreadSanitizer warnings, as a proof-of-concept that this works: - Avoid directly accessing `git_refcount`'s `owner` directly, and instead makes all callers go through the `GIT_REFCOUNT_*()` macros, which also use the atomic utilities. - Makes `pool_system_page_size()` race-free. Part of: #5592

3a197ea7

2020-06-27T12:33:32

Make the tests pass cleanly with MemorySanitizer This change: * Initializes a few variables that were being read before being initialized. * Includes https://github.com/madler/zlib/pull/393. As such, it only works reliably with `-DUSE_BUNDLED_ZLIB=ON`.

c6184f0c

2020-06-08T21:07:36

tree-wide: do not compile deprecated functions with hard deprecation When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h" header to be empty. As a result, no function declarations are made available to callers, but the implementations are still available to link against. This has the problem that function declarations also aren't visible to the implementations, meaning that the symbol's visibility will not be set up correctly. As a result, the resulting library may not expose those deprecated symbols at all on some platforms and thus cause linking errors. Fix the issue by conditionally compiling deprecated functions, only. While it becomes impossible to link against such a library in case one uses deprecated functions, distributors of libgit2 aren't expected to pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real" hard deprecation still makes sense in the context of CI to test we don't use deprecated symbols ourselves and in case a dependant uses libgit2 in a vendored way and knows it won't ever use any of the deprecated symbols anyway.

fb2198db

2019-06-23T16:23:59

futils_filesize: use `uint64_t` for object size Instead of using a signed type (`off_t`) use `uint64_t` for the maximum size of files.

bed9fc6b

2019-06-23T15:16:47

odb: use `git_object_size_t` for object size Instead of using a signed type (`off_t`) use a new `git_object_size_t` for the sizes of objects.

e54343a4

2019-06-29T09:17:32

fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.

658022c4

2019-07-18T13:53:41

configuration: cvar -> configmap `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.

9f723c97

2019-06-26T14:49:37

docs: fixups

5d92e547

2019-06-08T17:28:35

oid: `is_zero` instead of `iszero` The only function that is named `issomething` (without underscore) was `git_oid_iszero`. Rename it to `git_oid_is_zero` for consistency with the rest of the library.

459ac856

2019-02-23T18:42:53

odb: provide a free function for custom backends Custom backends can allocate memory when reading objects and providing them to libgit2. However, if an error occurs in the custom backend after the memory has been allocated for the custom object but before it's returned to libgit2, the custom backend has no way to free that memory and it must be leaked. Provide a free function that corresponds to the alloc function so that custom backends have an opportunity to free memory before they return an error.

790aae77

2019-02-23T18:40:43

odb: rename git_odb_backend_malloc for consistency The `git_odb_backend_malloc` name is a system function that is provided for custom ODB backends and allows them to allocate memory for an ODB object in the read callback. This is important so that libgit2 can later free the memory used by an ODB object that was read from the custom backend. However, the name _suggests_ that it actually allocates a `git_odb_backend`. It does not; rename it to make it clear that it actually allocates backend _data_.

a1ef995d

2019-02-21T10:33:30

indexer: use git_indexer_progress throughout Update internal usage of `git_transfer_progress` to `git_indexer_progreses`.

bbdcd450

2019-02-20T10:40:06

cache: fix misnaming of `git_cache_free` Functions that free a structure's contents but not the structure itself shall be named `dispose` in the libgit2 project, but the function `git_cache_free` does not follow this naming pattern. Fix this by renaming it to `git_cache_dispose` and adjusting all callers to make use of the new name.

6b3730d4

2019-02-16T19:55:30

Fix a memory leak in odb_otype_fast() This change frees a copy of a cached object in odb_otype_fast().

dd45539d

2019-02-16T22:06:58

Fix a _very_ improbable memory leak in git_odb_new() This change fixes a mostly theoretical memory leak in got_odb_new() that can only manifest if git_cache_init() fails due to running out of memory or not being able to acquire its lock.

c6cac733

2019-01-20T22:40:38

blob: validate that blob sizes fit in a size_t Our blob size is a `git_off_t`, which is a signed 64 bit int. This may be erroneously negative or larger than `SIZE_MAX`. Ensure that the blob size fits into a `size_t` before casting.

f673e232

2018-12-27T13:47:34

git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.

f7416509

2019-01-20T20:15:31

Fix odb foreach to also close on positive error code In include/git2/odb.h it states that callback can also return positive value which should break looping. Implementations of git_odb_foreach() and pack_backend__foreach() did not respect that.

b2c2dc64

2019-01-19T01:36:40

Merge pull request #4940 from libgit2/ethomson/git_obj More `git_obj` to `git_object` updates

cd350852

2019-01-17T10:40:13

object_type: GIT_OBJECT_BAD is now GIT_OBJECT_INVALID We use the term "invalid" to refer to bad or malformed data, eg `GIT_REF_INVALID` and `GIT_EINVALIDSPEC`. Since we're changing the names of the `git_object_t`s in this release, update it to be `GIT_OBJECT_INVALID` instead of `BAD`.

7b453e7e

2019-01-05T22:12:48

Fix a bunch of warnings This change fixes a bunch of warnings that were discovered by compiling with `clang -target=i386-pc-linux-gnu`. It turned out that the intrinsics were not necessarily being used in all platforms! Especially in GCC, since it does not support __has_builtin. Some more warnings were gleaned from the Windows build, but I stopped when I saw that some third-party dependencies (e.g. zlib) have warnings of their own, so we might never be able to enable -Werror there.

168fe39b

2018-11-28T14:26:57

object_type: use new enumeration names Use the new object_type enumeration names within the codebase.

0fcd0563

2018-08-06T12:00:21

odb: fix use of wrong printf formatters The `git_odb_stream` members `declared_size` and `received_bytes` are both of the type `git_off_t`, which we usually defined to be a 64 bit signed integer. Thus, passing these members to "PRIdZ" formatters is not correct, as they are not guaranteed to accept big enough numbers. Instead, use the "PRId64" formatter, which is able to represent 64 bit signed integers.

ecf4f33a

2018-02-08T11:14:48

Convert usage of `git_buf_free` to new `git_buf_dispose`

a52b4c51

2018-03-23T09:59:46

odb: fix writing to fake write streams In commit 7ec7aa4a7 (odb: assert on logic errors when writing objects, 2018-02-01), the check for whether we are trying to overflowing the fake stream buffer was changed from returning an error to raising an assert. The conversion forgot though that the logic around `assert`s are basically inverted. Previously, if the statement stream->written + len > steram->size evaluated to true, we would return a `-1`. Now we are asserting that this statement is true, and in case it is not we will raise an error. So the conversion to the `assert` in fact changed the behaviour to the complete opposite intention. Fix the assert by inverting its condition again and add a regression test.

a43bcd2c

2018-02-09T17:31:50

odb: fix memory leaks due to not freeing hash context

619f61a8

2018-02-01T06:22:36

odb: error when we can't create object header Return an error to the caller when we can't create an object header for some reason (printf failure) instead of simply asserting.

7ec7aa4a

2018-02-01T05:54:57

odb: assert on logic errors when writing objects There's no recovery possible if we're so confused or corrupted that we're trying to overwrite our memory. Simply assert.

138e4c2b

2018-02-01T06:35:31

git_odb__hashfd: propagate error on failures

35ed256b

2018-02-01T05:11:05

git_odb__hashobj: provide errors messages on failures Provide error messages on hash failures: assert when given invalid input instead of failing with a user error; provide error messages on program errors.

59d99adc

2018-01-31T09:34:52

odb: check for alloc errors on hardcoded objects It's unlikely that we'll fail to allocate a single byte, but let's check for allocation failures for good measure. Untangle `-1` being a marker of not having found the hardcoded odb object; use that to reflect actual errors.

ef902864

2018-01-31T09:30:51

odb: error when we can't alloc an object At the moment, we're swallowing the allocation failure. We need to return the error to the caller.

97f9a5f0

2017-12-17T01:12:49

odb: provide length and type with streaming read The streaming read functionality should provide the length and the type of the object, like the normal read functionality does.

275f103d

2018-01-12T08:59:40

odb: reject reading and writing null OIDs The null OID (hash with all zeroes) indicates a missing object in upstream git and is thus not a valid object ID. Add defensive measurements to avoid writing such a hash to the object database in the very unlikely case where some data results in the null OID. Furthermore, add shortcuts when reading the null OID from the ODB to avoid ever returning an object when a faulty repository may contain the null OID.

0c7f49dd

2017-06-30T13:39:01

Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.

cb3010c5

2017-06-12T12:56:40

odb_read_prefix: reset error in backends loop When looking for an object by prefix, we query all the backends so that we can ensure that there is no ambiguity. We need to reset the `error` value between backends; otherwise the first backend may find an object by prefix, but subsequent backends may not. If we do not reset the `error` value then it will remain at `GIT_ENOTFOUND` and `read_prefix_1` will fail, despite having actually found an object.

8d93a11c

2017-05-03T12:38:55

odb: fix printf formatter for git_off_t The fields `declared_size` and `received_bytes` of the `git_odb_stream` are both of type `git_off_t` which is defined as a signed integer. When passing these values to a printf-style string in `git_odb_stream__invalid_length`, though, we format these as PRIuZ, which is unsigned. Fix the issue by using PRIdZ instead, silencing warnings on macOS.

7776db51

2017-05-03T12:15:12

odb: shut up gcc warnings regarding uninitilized variables The `error` variable is used as a return value in the out-section of both `odb_read_1` and `read_prefix_1`. While the value will actually always be initialized inside of this section, GCC fails to realize this due to interactions with the `found` variable: if `found` is set, the error will always be initialized. If it is not, we return early without reaching the out-statements. Shut up the warnings by initializing the error variable, even though it is unnecessary.

e0973bc0

2017-04-28T14:05:15

odb: verify hashes in read_prefix_1 While the function reading an object from the complete OID already verifies OIDs, we do not yet do so for reading objects from a partial OID. Do so when strict OID verification is enabled.

14109620

2017-04-28T14:03:54

odb: improve error handling in read_prefix_1 The read_prefix_1 function has several return statements springled throughout the code. As we have to free memory upon getting an error, the free code has to be repeated at every single retrun -- which it is not, so we have a memory leak here. Refactor the code to use the typical `goto out` pattern, which will free data when an error has occurred. While we're at it, we can also improve the error message thrown when multiple ambiguous prefixes are found. It will now include the colliding prefixes.

35079f50

2017-04-21T07:31:56

odb: add option to turn off hash verification Verifying hashsums of objects we are reading from the ODB may be costly as we have to perform an additional hashsum calculation on the object. Especially when reading large objects, the penalty can be as high as 35%, as can be seen when executing the equivalent of `git cat-file` with and without verification enabled. To mitigate for this, we add a global option for libgit2 which enables the developer to turn off the verification, e.g. when he can be reasonably sure that the objects on disk won't be corrupted.

28a0741f

2017-04-10T09:30:08

odb: verify object hashes The upstream git.git project verifies objects when looking them up from disk. This avoids scenarios where objects have somehow become corrupt on disk, e.g. due to hardware failures or bit flips. While our mantra is usually to follow upstream behavior, we do not do so in this case, as we never check hashes of objects we have just read from disk. To fix this, we create a new error class `GIT_EMISMATCH` which denotes that we have looked up an object with a hashsum mismatch. `odb_read_1` will then, after having read the object from its backend, hash the object and compare the resulting hash to the expected hash. If hashes do not match, it will return an error. This obviously introduces another computation of checksums and could potentially impact performance. Note though that we usually perform I/O operations directly before doing this computation, and as such the actual overhead should be drowned out by I/O. Running our test suite seems to confirm this guess. On a Linux system with best-of-five timings, we had 21.592s with the check enabled and 21.590s with the ckeck disabled. Note though that our test suite mostly contains very small blobs only. It is expected that repositories with bigger blobs may notice an increased hit by this check. In addition to a new test, we also had to change the odb::backend::nonrefreshing test suite, which now triggers a hashsum mismatch when looking up the commit "deadbeef...". This is expected, as the fake backend allocated inside of the test will return an empty object for the OID "deadbeef...", which will obviously not hash back to "deadbeef..." again. We can simply adjust the hash to equal the hash of the empty object here to fix this test.

6fd6c678

2017-03-22T20:29:22

Merge pull request #4030 from libgit2/ethomson/fsync fsync all the things

52d03f37

2017-03-03T13:26:29

git_commit_create: freshen tree objects in commit Freshen the tree object that a commit points to during commit time.

1c04a96b

2017-02-28T12:29:29

Honor `core.fsyncObjectFiles`

909d5494

2016-12-29T12:25:15

giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one

901434b0

2016-11-14T10:07:37

common: cast precision specifiers to int

becadafc

2016-08-05T19:30:56

odb: only provide the empty tree Only provide the empty tree internally, which matches git's behavior. If we provide the empty blob then any users trying to write it with libgit2 would omit it from actually landing in the odb, which appear to git proper as a broken repository (missing that object).

8f09a98e

2016-07-14T16:23:24

odb: freshen existing objects when writing When writing an object, we calculate its OID and see if it exists in the object database. If it does, we need to freshen the file that contains it.

20302aa4

2016-06-25T23:33:05

Merge pull request #3223 from ethomson/apply Reading patch files

2076d329

2016-06-09T22:50:53

fix error message SHA truncation in git_odb__error_notfound()

6a2d2f8a

2015-06-17T06:42:20

delta: move delta application to delta.c Move the delta application functions into `delta.c`, next to the similar delta creation functions. Make the `git__delta_apply` functions adhere to other naming and parameter style within the library.

1bbcb2b2

2016-03-09T17:47:53

odb: Try to lookup headers in all backends before passthrough

e78d2ac9

2016-03-09T16:41:08

odb: Refactor `git_odb_expand_ids`

4416aa77

2016-03-09T11:29:46

odb: Implement new helper to read types without refreshing

9a786650

2016-03-09T11:00:27

odb: Handle corner cases in `git_odb_expand_ids` The old implementation had two issues: 1. OIDs that were too short as to be ambiguous were not being handled properly. 2. If the last OID to expand in the array was missing from the ODB, we would leak a `GIT_ENOTFOUND` error code from the function.

62484f52

2016-03-08T14:09:55

git_odb_expand_ids: accept git_odb_expand_id array Take (and write to) an array of a struct, `git_odb_expand_id`.

4b1f0f79

2016-03-08T11:44:21

git_odb_expand_ids: rename func, return the type

6c04269c

2016-03-04T00:50:35

git_odb_exists_many_prefixes: query odb for multiple short ids Query the object database for multiple objects at a time, given their object ID (which may be abbreviated) and optional type.

e10144ae

2016-03-04T01:18:30

odb: improved not found error messages When looking up an abbreviated oid, show the actual (abbreviated) oid the caller passed instead of a full (but ambiguously truncated) oid.

a0a1b19a

2015-10-14T19:31:54

odb: Prioritize alternate backends For most real use cases, repositories with alternates use them as main object storage. Checking the alternate for objects before the main repository should result in measurable speedups. Because of this, we're changing the sorting algorithm to prioritize alternates *in cases where two backends have the same priority*. This means that the pack backend for the alternate will be checked before the pack backend for the main repository *but* both of them will be checked before any loose backends.

43820f20

2015-10-14T19:24:07

odb: Be smarter when refreshing backends In the current implementation of ODB backends, each backend is tasked with refreshing itself after a failed lookup. This is standard Git behavior: we want to e.g. reload the packfiles on disk in case they have changed and that's the reason we can't find the object we're looking for. This behavior, however, becomes pathological in repositories where multiple alternates have been loaded. Given that each alternate counts as a separate backend, a miss in the main repository (which can potentially be very frequent in cases where object storage comes from the alternate) will result in refreshing all its packfiles before we move on to the alternate backend where the object will most likely be found. To fix this, the code in `odb.c` has been refactored as to perform the refresh of all the backends externally, once we've verified that the object is nowhere to be found. If the refresh is successful, we then perform the lookup sequentially through all the backends, skipping the ones that we know for sure weren't refreshed (because they have no refresh API). The on-disk pack backend has been adjusted accordingly: it no longer performs refreshes internally.

d3b29fb9

2015-10-01T00:50:37

refdb and odb backends must provide `free` function As refdb and odb backends can be allocated by client code, libgit2 can’t know whether an alternative memory allocator was used, and thus should not try to call `git__free` on those objects. Instead, odb and refdb backend implementations must always provide their own `free` functions to ensure memory gets freed correctly.

e5f9df7b

2015-06-29T21:45:04

odb: cast to long long for printf

9f3c18e2

2015-06-02T08:36:15

Fixed build warnings on Xcode 6.1

a6f2ceaf

2015-05-13T12:11:55

Merge pull request #3118 from libgit2/cmn/stream-size odb: make the writestream's size a git_off_t

b0d7f329

2015-05-13T10:23:19

odb: reverse the default backend priorities We currently first look in the loose object dir and then in the packs for objects. When performing operations on recent history this has a higher likelihood of hitting, but when we deal with operations which look further back into the past, we start spending a large amount of time getting ENOTENT from `access`. Reversing the priorities means that long-running operations can get to their objects faster, as we can look at the index data we have in memory (or rather mapped) to figure out whether we have an object, which is faster than going out to the filesystem. The packed backend already implements an optimistic read algorithm by first looking at the packs we know about and only going out to disk to referesh if the object is not found which means that in the case where we do have the object (which will be in the majority for anything that traverses the graph) we can avoid going to to disk entirely to determine whether an object exists. Operations which look at recent history may take a slight impact, but these would be operations which look a lot less at object and thus take less time regardless.

77b339f7

2015-05-12T13:06:33

odb: make the writestream's size a git_off_t Restricting files to size_t is a silly limitation. The loose backend writes to a file directly, so there is no issue in using 63 bits for the size. We still assume that the header is going to fit in 64 bytes, which does mean quite a bit smaller files due to the run-length encoding, but it's still a much larger size than you would want Git to handle.

7dd22538

2015-05-11T10:19:25

centralizing all IO buffer size values

f1453c59

2015-02-12T12:19:37

Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.

15d54fdd

2015-02-10T22:34:03

odb__hashlink: check st.st_size before casting

392702ee

2015-02-09T23:41:13

allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.

c251f3bb

2014-12-08T16:05:47

win32: remember to cleanup our hash_ctx

e0156651

2014-11-21T13:50:46

odb: `git_odb_object` contents are never NULL This is a contract that we made in the library and that we need to uphold. The contents of a blob can never be NULL because several parts of the library (including the filter and attributes code) expect `git_blob_rawcontent` to always return a valid pointer.

e1ac0101

2014-11-08T14:40:53

odb: hardcode the empty blob and tree git hardocodes these as objects which exist regardless of whether they are in the odb and uses them in the shell interface as a way of expressing the lack of a blob or tree for one side of e.g. a diff. In the library we use each language's natural way of declaring a lack of value which makes a workaround like this unnecessary. Since git uses it, it does however mean each shell application would need to perform this check themselves. This makes it common work across a range of applications and an issue with compatibility with git, which fits right into what the library aims to provide. Thus we introduce the hard-coded empty blob and tree in the odb frontend. These hard-coded objects are checked for before going to the backends, but after the cache check, which means the second time they're used, they will be treated as normal cached objects instead of creating new ones.

530594c0

2014-05-23T05:53:41

odb: clear backend errors on successful read We go through the different backends in order, so it's not an error if at least one of the backends has the data we want.

bc91347b

2014-04-30T11:16:31

Fix remaining init_options inconsistencies There were a couple of "init_opts()" functions a few more cases of structure initialization that I somehow missed.

48e60ae7

2014-04-21T11:23:29

Don't redefine the same callback types, their signatures may change

3ab57816

2014-03-31T23:23:32

Merge pull request #2178 from libgit2/rb/fix-short-id Fix git_odb_short_id and git_odb_exists_prefix bugs

31a14982

2014-03-21T17:36:34

Fix wrong assertion Fixes issue #2196

89499078

2014-03-10T10:53:39

Fix a number of git_odb_exists_prefix bugs The git_odb_exists_prefix API was not dealing correctly when a later backend returned GIT_ENOTFOUND even if an earlier backend had found the object. Additionally, the unit tests were not properly exercising the API and had a couple mistakes in checking the results. Lastly, since the backends are not expected to behavior correctly unless all bytes of the short id are zero except for the prefix, this makes the ODB prefix APIs explicitly clear out the extra bytes so the user doesn't have to be as careful.

thodg/libgit2/src/odb.c

src/odb.c

Log