src/odb.c


Log

Author Commit Date CI Message
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Edward Thomson cb3010c5 2017-06-12T12:56:40 odb_read_prefix: reset error in backends loop When looking for an object by prefix, we query all the backends so that we can ensure that there is no ambiguity. We need to reset the `error` value between backends; otherwise the first backend may find an object by prefix, but subsequent backends may not. If we do not reset the `error` value then it will remain at `GIT_ENOTFOUND` and `read_prefix_1` will fail, despite having actually found an object.
Patrick Steinhardt 8d93a11c 2017-05-03T12:38:55 odb: fix printf formatter for git_off_t The fields `declared_size` and `received_bytes` of the `git_odb_stream` are both of type `git_off_t` which is defined as a signed integer. When passing these values to a printf-style string in `git_odb_stream__invalid_length`, though, we format these as PRIuZ, which is unsigned. Fix the issue by using PRIdZ instead, silencing warnings on macOS.
Patrick Steinhardt 7776db51 2017-05-03T12:15:12 odb: shut up gcc warnings regarding uninitilized variables The `error` variable is used as a return value in the out-section of both `odb_read_1` and `read_prefix_1`. While the value will actually always be initialized inside of this section, GCC fails to realize this due to interactions with the `found` variable: if `found` is set, the error will always be initialized. If it is not, we return early without reaching the out-statements. Shut up the warnings by initializing the error variable, even though it is unnecessary.
Patrick Steinhardt e0973bc0 2017-04-28T14:05:15 odb: verify hashes in read_prefix_1 While the function reading an object from the complete OID already verifies OIDs, we do not yet do so for reading objects from a partial OID. Do so when strict OID verification is enabled.
Patrick Steinhardt 14109620 2017-04-28T14:03:54 odb: improve error handling in read_prefix_1 The read_prefix_1 function has several return statements springled throughout the code. As we have to free memory upon getting an error, the free code has to be repeated at every single retrun -- which it is not, so we have a memory leak here. Refactor the code to use the typical `goto out` pattern, which will free data when an error has occurred. While we're at it, we can also improve the error message thrown when multiple ambiguous prefixes are found. It will now include the colliding prefixes.
Patrick Steinhardt 35079f50 2017-04-21T07:31:56 odb: add option to turn off hash verification Verifying hashsums of objects we are reading from the ODB may be costly as we have to perform an additional hashsum calculation on the object. Especially when reading large objects, the penalty can be as high as 35%, as can be seen when executing the equivalent of `git cat-file` with and without verification enabled. To mitigate for this, we add a global option for libgit2 which enables the developer to turn off the verification, e.g. when he can be reasonably sure that the objects on disk won't be corrupted.
Patrick Steinhardt 28a0741f 2017-04-10T09:30:08 odb: verify object hashes The upstream git.git project verifies objects when looking them up from disk. This avoids scenarios where objects have somehow become corrupt on disk, e.g. due to hardware failures or bit flips. While our mantra is usually to follow upstream behavior, we do not do so in this case, as we never check hashes of objects we have just read from disk. To fix this, we create a new error class `GIT_EMISMATCH` which denotes that we have looked up an object with a hashsum mismatch. `odb_read_1` will then, after having read the object from its backend, hash the object and compare the resulting hash to the expected hash. If hashes do not match, it will return an error. This obviously introduces another computation of checksums and could potentially impact performance. Note though that we usually perform I/O operations directly before doing this computation, and as such the actual overhead should be drowned out by I/O. Running our test suite seems to confirm this guess. On a Linux system with best-of-five timings, we had 21.592s with the check enabled and 21.590s with the ckeck disabled. Note though that our test suite mostly contains very small blobs only. It is expected that repositories with bigger blobs may notice an increased hit by this check. In addition to a new test, we also had to change the odb::backend::nonrefreshing test suite, which now triggers a hashsum mismatch when looking up the commit "deadbeef...". This is expected, as the fake backend allocated inside of the test will return an empty object for the OID "deadbeef...", which will obviously not hash back to "deadbeef..." again. We can simply adjust the hash to equal the hash of the empty object here to fix this test.
Edward Thomson 6fd6c678 2017-03-22T20:29:22 Merge pull request #4030 from libgit2/ethomson/fsync fsync all the things
Edward Thomson 52d03f37 2017-03-03T13:26:29 git_commit_create: freshen tree objects in commit Freshen the tree object that a commit points to during commit time.
Edward Thomson 1c04a96b 2017-02-28T12:29:29 Honor `core.fsyncObjectFiles`
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Patrick Steinhardt 901434b0 2016-11-14T10:07:37 common: cast precision specifiers to int
Edward Thomson becadafc 2016-08-05T19:30:56 odb: only provide the empty tree Only provide the empty tree internally, which matches git's behavior. If we provide the empty blob then any users trying to write it with libgit2 would omit it from actually landing in the odb, which appear to git proper as a broken repository (missing that object).
Edward Thomson 8f09a98e 2016-07-14T16:23:24 odb: freshen existing objects when writing When writing an object, we calculate its OID and see if it exists in the object database. If it does, we need to freshen the file that contains it.
Edward Thomson 20302aa4 2016-06-25T23:33:05 Merge pull request #3223 from ethomson/apply Reading patch files
Sim Domingo 2076d329 2016-06-09T22:50:53 fix error message SHA truncation in git_odb__error_notfound()
Edward Thomson 6a2d2f8a 2015-06-17T06:42:20 delta: move delta application to delta.c Move the delta application functions into `delta.c`, next to the similar delta creation functions. Make the `git__delta_apply` functions adhere to other naming and parameter style within the library.
Vicent Marti 1bbcb2b2 2016-03-09T17:47:53 odb: Try to lookup headers in all backends before passthrough
Vicent Marti e78d2ac9 2016-03-09T16:41:08 odb: Refactor `git_odb_expand_ids`
Vicent Marti 4416aa77 2016-03-09T11:29:46 odb: Implement new helper to read types without refreshing
Vicent Marti 9a786650 2016-03-09T11:00:27 odb: Handle corner cases in `git_odb_expand_ids` The old implementation had two issues: 1. OIDs that were too short as to be ambiguous were not being handled properly. 2. If the last OID to expand in the array was missing from the ODB, we would leak a `GIT_ENOTFOUND` error code from the function.
Edward Thomson 62484f52 2016-03-08T14:09:55 git_odb_expand_ids: accept git_odb_expand_id array Take (and write to) an array of a struct, `git_odb_expand_id`.
Edward Thomson 4b1f0f79 2016-03-08T11:44:21 git_odb_expand_ids: rename func, return the type
Edward Thomson 6c04269c 2016-03-04T00:50:35 git_odb_exists_many_prefixes: query odb for multiple short ids Query the object database for multiple objects at a time, given their object ID (which may be abbreviated) and optional type.
Edward Thomson e10144ae 2016-03-04T01:18:30 odb: improved not found error messages When looking up an abbreviated oid, show the actual (abbreviated) oid the caller passed instead of a full (but ambiguously truncated) oid.
Vicent Marti a0a1b19a 2015-10-14T19:31:54 odb: Prioritize alternate backends For most real use cases, repositories with alternates use them as main object storage. Checking the alternate for objects before the main repository should result in measurable speedups. Because of this, we're changing the sorting algorithm to prioritize alternates *in cases where two backends have the same priority*. This means that the pack backend for the alternate will be checked before the pack backend for the main repository *but* both of them will be checked before any loose backends.
Vicent Marti 43820f20 2015-10-14T19:24:07 odb: Be smarter when refreshing backends In the current implementation of ODB backends, each backend is tasked with refreshing itself after a failed lookup. This is standard Git behavior: we want to e.g. reload the packfiles on disk in case they have changed and that's the reason we can't find the object we're looking for. This behavior, however, becomes pathological in repositories where multiple alternates have been loaded. Given that each alternate counts as a separate backend, a miss in the main repository (which can potentially be very frequent in cases where object storage comes from the alternate) will result in refreshing all its packfiles before we move on to the alternate backend where the object will most likely be found. To fix this, the code in `odb.c` has been refactored as to perform the refresh of all the backends externally, once we've verified that the object is nowhere to be found. If the refresh is successful, we then perform the lookup sequentially through all the backends, skipping the ones that we know for sure weren't refreshed (because they have no refresh API). The on-disk pack backend has been adjusted accordingly: it no longer performs refreshes internally.
Arthur Schreiber d3b29fb9 2015-10-01T00:50:37 refdb and odb backends must provide `free` function As refdb and odb backends can be allocated by client code, libgit2 can’t know whether an alternative memory allocator was used, and thus should not try to call `git__free` on those objects. Instead, odb and refdb backend implementations must always provide their own `free` functions to ensure memory gets freed correctly.
Edward Thomson e5f9df7b 2015-06-29T21:45:04 odb: cast to long long for printf
Pierre-Olivier Latour 9f3c18e2 2015-06-02T08:36:15 Fixed build warnings on Xcode 6.1
Edward Thomson a6f2ceaf 2015-05-13T12:11:55 Merge pull request #3118 from libgit2/cmn/stream-size odb: make the writestream's size a git_off_t
Carlos Martín Nieto b0d7f329 2015-05-13T10:23:19 odb: reverse the default backend priorities We currently first look in the loose object dir and then in the packs for objects. When performing operations on recent history this has a higher likelihood of hitting, but when we deal with operations which look further back into the past, we start spending a large amount of time getting ENOTENT from `access`. Reversing the priorities means that long-running operations can get to their objects faster, as we can look at the index data we have in memory (or rather mapped) to figure out whether we have an object, which is faster than going out to the filesystem. The packed backend already implements an optimistic read algorithm by first looking at the packs we know about and only going out to disk to referesh if the object is not found which means that in the case where we do have the object (which will be in the majority for anything that traverses the graph) we can avoid going to to disk entirely to determine whether an object exists. Operations which look at recent history may take a slight impact, but these would be operations which look a lot less at object and thus take less time regardless.
Carlos Martín Nieto 77b339f7 2015-05-12T13:06:33 odb: make the writestream's size a git_off_t Restricting files to size_t is a silly limitation. The loose backend writes to a file directly, so there is no issue in using 63 bits for the size. We still assume that the header is going to fit in 64 bytes, which does mean quite a bit smaller files due to the run-length encoding, but it's still a much larger size than you would want Git to handle.
J Wyman 7dd22538 2015-05-11T10:19:25 centralizing all IO buffer size values
Edward Thomson f1453c59 2015-02-12T12:19:37 Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
Edward Thomson 15d54fdd 2015-02-10T22:34:03 odb__hashlink: check st.st_size before casting
Edward Thomson 392702ee 2015-02-09T23:41:13 allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
Edward Thomson c251f3bb 2014-12-08T16:05:47 win32: remember to cleanup our hash_ctx
Vicent Marti e0156651 2014-11-21T13:50:46 odb: `git_odb_object` contents are never NULL This is a contract that we made in the library and that we need to uphold. The contents of a blob can never be NULL because several parts of the library (including the filter and attributes code) expect `git_blob_rawcontent` to always return a valid pointer.
Carlos Martín Nieto e1ac0101 2014-11-08T14:40:53 odb: hardcode the empty blob and tree git hardocodes these as objects which exist regardless of whether they are in the odb and uses them in the shell interface as a way of expressing the lack of a blob or tree for one side of e.g. a diff. In the library we use each language's natural way of declaring a lack of value which makes a workaround like this unnecessary. Since git uses it, it does however mean each shell application would need to perform this check themselves. This makes it common work across a range of applications and an issue with compatibility with git, which fits right into what the library aims to provide. Thus we introduce the hard-coded empty blob and tree in the odb frontend. These hard-coded objects are checked for before going to the backends, but after the cache check, which means the second time they're used, they will be treated as normal cached objects instead of creating new ones.
Carlos Martín Nieto 530594c0 2014-05-23T05:53:41 odb: clear backend errors on successful read We go through the different backends in order, so it's not an error if at least one of the backends has the data we want.
Russell Belfer bc91347b 2014-04-30T11:16:31 Fix remaining init_options inconsistencies There were a couple of "init_opts()" functions a few more cases of structure initialization that I somehow missed.
Jacques Germishuys 48e60ae7 2014-04-21T11:23:29 Don't redefine the same callback types, their signatures may change
Edward Thomson 3ab57816 2014-03-31T23:23:32 Merge pull request #2178 from libgit2/rb/fix-short-id Fix git_odb_short_id and git_odb_exists_prefix bugs
Linquize 31a14982 2014-03-21T17:36:34 Fix wrong assertion Fixes issue #2196
Russell Belfer 89499078 2014-03-10T10:53:39 Fix a number of git_odb_exists_prefix bugs The git_odb_exists_prefix API was not dealing correctly when a later backend returned GIT_ENOTFOUND even if an earlier backend had found the object. Additionally, the unit tests were not properly exercising the API and had a couple mistakes in checking the results. Lastly, since the backends are not expected to behavior correctly unless all bytes of the short id are zero except for the prefix, this makes the ODB prefix APIs explicitly clear out the extra bytes so the user doesn't have to be as careful.
Matthew Bowen b9f81997 2014-03-05T21:49:23 Added function-based initializers for every options struct. The basic structure of each function is courtesy of arrbee.
Vicent Marti a064dc2d 2014-03-06T00:47:05 Merge pull request #2159 from libgit2/rb/odb-exists-prefix Add ODB API to check for existence by prefix and object id shortener
Russell Belfer 26875825 2014-03-05T13:06:22 Check short OID len in odb, not in backends
Edward Thomson 7bd2f401 2014-03-05T11:35:47 ODB writing fails gracefully when unsupported If no ODB backends support writing, we should fail gracefully.
Russell Belfer f5753999 2014-03-04T15:34:23 Add exists_prefix to ODB backend and ODB API
Brodie Rao ae3b6d61 2014-01-12T23:31:13 odb: handle NULL pointers passed to git_odb_stream_free Signed-off-by: Brodie Rao <brodie@sf.io>
Edward Thomson dd64c71c 2013-11-04T14:50:25 Allow backend consumers to specify file mode
Vicent Martí 5c50f22a 2013-10-28T09:25:44 Merge pull request #1891 from libgit2/cmn/fix-thin-packs Add support for thin packs
Vicent Marti 98fec8a9 2013-10-22T16:05:47 Implement `git_odb_object_dup`
Carlos Martín Nieto 0b33fca0 2013-10-02T13:39:35 indexer: fix thin packs When given an ODB from which to read objects, the indexer will attempt to inject the missing bases at the end of the pack and update the header and trailer to reflect the new contents.
Vicent Martí 92d19d16 2013-09-21T09:34:03 Merge pull request #1840 from linquize/warning Fix warning
Linquize 66566516 2013-09-08T17:15:42 Fix warning
Russell Belfer a9f51e43 2013-09-11T22:00:36 Merge git_buf and git_buffer This makes the git_buf struct that was used internally into an externally available structure and eliminates the git_buffer. As part of that, some of the special cases that arose with the externally used git_buffer were blended into the git_buf, such as being careful about git_buf objects that may have a NULL ptr and allowing for bufs with a valid ptr and size but zero asize as a way of referring to externally owned data.
Russell Belfer 2a7d224f 2013-09-10T16:33:32 Extend public filter api with filter lists This moves the git_filter_list into the public API so that users can create, apply, and dispose of filter lists. This allows more granular application of filters to user data outside of libgit2 internals. This also converts all the internal usage of filters to the public APIs along with a few small tweaks to make it easier to use the public git_buffer stuff alongside the internal git_buf.
Russell Belfer 85d54812 2013-08-28T16:44:04 Create public filter object and use it This creates include/sys/filter.h with a basic definition of a git_filter and then converts the internal code to use it. There are related internal objects (git_filter_list) that we will want to publish at some point, but this is a first step.
nulltoken 8cf80525 2013-09-11T20:13:59 errors: Fix format of some error messages
nulltoken 031f3f80 2013-09-07T22:39:05 odb: Error when streaming in too [few|many] bytes
nulltoken 4047950f 2013-08-29T14:19:34 odb: Prevent stream_finalize_write() from overwriting Now that #1785 is merged, git_odb_stream_finalize_write() calculates the object id before invoking the odb backend. This commit gives a chance to the backend to check if it already knows this object.
nulltoken b1a6c316 2013-08-30T17:36:00 odb: Move the auto refresh logic to the pack backend Previously, `git_object_read()`, `git_object_read_prefix()` and `git_object_exists()` were implementing an auto refresh logic. When the expected object couldn't be found in any backend, a call to `git_odb_refresh()` was triggered and the lookup was once again performed against all backends. This commit removes this auto-refresh logic from the odb layer and pushes it down into the pack-backend (as it's the only one currently exposing a `refresh()` endpoint).
nulltoken a12e069a 2013-08-30T16:31:52 odb: Honor the non refreshing capability of a backend
Carlos Martín Nieto 090a07d2 2013-08-17T02:12:04 odb: avoid hashing twice in and edge case If none of the backends support direct writes and we must stream the whole file, we already know what the object's id should be; so use the stream's functions directly, bypassing the frontend's hashing and overwriting of our existing id.
Carlos Martín Nieto fe0c6d4e 2013-08-17T01:41:08 odb: make it clearer that the id is calculated in the frontend The frontend is in charge of calculating the id of the objects. Thus the backends should treat it as a read-only value. The positioning in the function signature made it seem as though it was an output parameter. Make the id const and move it from the front to behind the subject (backend or stream).
Carlos Martín Nieto 8380b39a 2013-08-15T14:29:39 odb: perform the stream hashing in the frontend Hash the data as it's coming into the stream and tell the backend what its name is when finalizing the write. This makes it consistent with the way a plain git_odb_write() performs the write.
Carlos Martín Nieto 376e6c9f 2013-08-15T13:48:35 odb: wrap the stream reading and writing functions This is in preparation for moving the hashing to the frontend, which requires us to handle the incoming data before passing it to the backend's stream.
Carlos Martín Nieto e54cfb9b 2013-08-12T11:50:27 odb: free object data when id is ambiguous By the time we recognise this as an ambiguous id, the object's data has been loaded into memory. Free it when returning EABMIGUOUS.
Rémi Duraffort c6451624 2013-07-15T16:00:07 Fix some more memory leaks in error path
Vicent Marti 6de9b2ee 2013-06-12T21:10:33 util: It's called `memzero`
Russell Belfer 3e9e6cda 2013-06-07T09:54:33 Add safe memset and use it This adds a `git__memset` routine that will not be optimized away and updates the places where I memset() right before a free() call to use it.
Russell Belfer f658dc43 2013-05-31T14:09:58 Zero memory for major objects before freeing By zeroing out the memory when we free larger objects (i.e. those that serve as collections of other data, such as repos, odb, refdb), I'm hoping that it will be easier for libgit2 bindings to find errors in their object management code.
Vicent Martí 03c28d92 2013-05-06T06:45:53 Merge pull request #1526 from arrbee/cleanup-error-return-without-msg Make sure error messages are set for most error returns
Vicent Marti dfec726b 2013-05-03T23:30:54 odb: Do not error out if an alternate ODB is missing
Russell Belfer f063f578 2013-05-01T14:48:35 Catch some odd odb backend corner case errors There are some cases, particularly where no loaded ODB backends support a particular operation, where we would return an error code without having set an error. This catches those cases and reports that no ODB backends support the operation in question.
Vicent Martí cd2ed9f0 2013-04-30T04:02:52 Merge pull request #1518 from arrbee/export-oid-comparison Remove most inlines from the public API
Russell Belfer b7f167da 2013-04-29T13:52:12 Make git_oid_cmp public and add git_oid__cmp
Edward Thomson c8a4e8a5 2013-04-29T11:14:56 don't use uninitialized struct stat in win32
Russell Belfer 78606263 2013-04-15T00:05:44 Add callback to git_objects_table This adds create and free callback to the git_objects_table so that more of the creation and destruction of objects can be table driven instead of using switch statements. This also makes the semantics of certain object creation functions consistent so that we can make better use of function pointers. This also fixes a theoretical error case where an object allocation fails and we end up storing NULL into the cache.
Vicent Marti 5df18424 2013-04-01T19:38:23 lol this worked first try wtf
Vicent Marti 8842c75f 2013-04-03T22:30:07 What has science done.
Vicent Marti 0edad3cc 2013-04-22T16:41:56 Merge branch 'development' into vmg/dupe-odb-backends Conflicts: src/odb.c
Vicent Marti 4ef2c79c 2013-04-22T16:37:40 odb: Disable inode checks for Win32
Russell Belfer 83cc70d9 2013-04-19T12:48:33 Move odb_backend implementors stuff into git2/sys This moves some of the odb_backend stuff that is related to the internals of an odb_backend implementation into include/git2/sys. Some of the stuff related to streaming I left in include/git2 because it seemed like it would be reasonably needed by a normal user who wanted to stream objects into and out of the ODB. Also, I added APIs for traversing the list of backends so that some of the tests would not need to access ODB internals.
Vicent Marti a29c6b5f 2013-04-19T23:51:18 odb: Do not allow duplicate on-disk backends
Michael Schubert f5e28202 2013-03-25T13:38:43 opts: allow configuration of odb cache size Currently, the odb cache has a fixed size of 128 slots as defined by GIT_DEFAULT_CACHE_SIZE. Allow users to set the size of the cache via git_libgit2_opts(). Fixes #1035.
Arkadiy Shapkin 10c06114 2013-03-17T04:46:46 Several warnings detected by static code analyzer fixed Implicit type conversion argument of function to size_t type Suspicious sequence of types castings: size_t -> int -> size_t Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)' Unsigned type is never < 0
Vicent Marti 8fe6bc5c 2013-01-10T15:43:08 odb: Refresh on `exists` query too
Vicent Marti 4a863c06 2013-01-03T20:36:26 Sane refresh logic All the ODB backends have a specific refresh interface. When reading an object, first we attempt every single backend: if the read fails, then we refresh all the backends and retry the read one more time to see if the object has appeared.
Vicent Marti 891a4681 2013-01-04T17:42:41 dat errorcode
Edward Thomson 359fc2d2 2013-01-08T17:07:25 update copyrights
David Michael Barr 4d185dd9 2012-12-19T14:30:06 odb: check if object exists before writing Update the procondition of git_odb_backend::write. It may now be assumed that the object has already been hashed.
Vicent Martí 0249a503 2012-12-07T09:40:21 Merge pull request #1091 from carlosmn/stream-object Indexer speedup with large objects
Ben Straub c7231c45 2012-11-30T16:31:42 Deploy GITERR_CHECK_VERSION
Ben Straub 55f6f21b 2012-11-29T19:59:18 Deploy versioned git_odb_backend structure
Carlos Martín Nieto f56f8585 2012-11-19T22:23:16 indexer: use the packfile streaming API The new API allows us to read the object bit by bit from the packfile, instead of needing it all at once in the packfile. This also allows us to hash the object as it comes in from the network instead of having to try to read it all and failing repeatedly for larger objects. This is only the first step, but it already shows huge improvements when dealing with objects over a few megabytes in size. It reduces the memory needs in some cases, but delta objects still need to be completely in memory and the old inefficent method is still used for that.