src/pack.c


Log

Author Commit Date CI Message
lhchavez c3514b0b 2017-12-23T14:59:07 Fix unpack double free If an element has been cached, but then the call to packfile_unpack_compressed() fails, the very next thing that happens is that its data is freed and then the element is not removed from the cache, which frees the data again. This change sets obj->data to NULL to avoid the double-free. It also stops trying to resolve deltas after two continuous failed rounds of resolution, and adds a test for this.
lhchavez 53f2c6b1 2017-12-15T15:01:50 Simplified overflow condition
lhchavez e7fac2af 2017-12-09T05:26:27 Using unsigned instead
lhchavez 28662c13 2017-12-08T06:00:27 libFuzzer: Prevent a potential shift overflow The type of |base_offset| in get_delta_base() is `git_off_t`, which is a signed `long`. That means that we need to make sure that the 8 most significant bits are zero (instead of 7) to avoid an overflow when it is shifted by 7 bits. Found using libFuzzer.
Edward Thomson 1560b580 2017-08-15T10:35:47 Merge pull request #4288 from pks-t/pks/include-fixups Include fixups
Jeff King 9842b327 2017-08-09T16:47:14 sha1_lookup: drop sha1_entry_pos function This was pulled over from git.git, and is an experiment in making binary-searching lists of sha1s faster. It was never compiled by default (nor was it used upstream by default without a special environment variable). Unfortunately, it is actually slower in practice, and upstream is planning to drop it in git/git@f1068efefe6dd3beaa89484db5e2db730b094e0b (which has some timing results). It's worth doing the same here for simplicity.
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Patrick Steinhardt a693b873 2017-06-07T10:20:44 buffer: use `git_buf_init` with length The `git_buf_init` function has an optional length parameter, which will cause the buffer to be initialized and allocated in one step. This can be used instead of static initialization with `GIT_BUF_INIT` followed by a `git_buf_grow`. This patch does so for two functions where it is applicable.
Patrick Steinhardt 97eb5ef0 2017-06-07T10:05:54 buffer: rely on `GITERR_OOM` set by `git_buf_try_grow` The function `git_buf_try_grow` consistently calls `giterr_set_oom` whenever growing the buffer fails due to insufficient memory being available. So in fact, we do not have to do this ourselves when a call to any buffer-growing function has failed due to an OOM situation. But we still do so in two functions, which this patch cleans up.
Jason Haslam 685f2251 2017-02-22T09:29:00 pack: fix looping over cache entries Fixes a regression from #4092. This is a crash on 32-bit and I assume that it doesn't do the right thing on 64-bit either. MSVC emits a warning for this, but of course, it's easy to get lost among all of the similar 'possible loss of data' warnings.
Patrick Steinhardt 8f5fe903 2017-02-02T11:58:48 offmap: remove GIT__USE_OFFMAP macro
Patrick Steinhardt 0d716905 2017-01-27T15:23:15 oidmap: remove GIT__USE_OIDMAP macro
Patrick Steinhardt 85d2748c 2017-01-27T14:05:10 khash: avoid using `kh_key`/`kh_val` as lvalue
Patrick Steinhardt f31cb45a 2017-01-25T15:31:12 khash: avoid using `kh_put` directly
Patrick Steinhardt a8cd560b 2017-01-25T14:41:17 khash: avoid using `kh_del` directly
Patrick Steinhardt cb18386f 2017-01-25T14:26:58 khash: avoid using `kh_val`/`kh_value` directly
Patrick Steinhardt a853c527 2017-01-25T14:14:32 khash: avoid using `kh_get` directly
Patrick Steinhardt 64e46dc3 2017-01-25T14:14:12 khash: avoid using `kh_end` directly
Patrick Steinhardt 036daa59 2017-01-25T14:11:42 khash: use `git_map_exists` where applicable
Patrick Steinhardt 9694d9ba 2017-01-25T14:09:17 khash: avoid using `kh_foreach`/`kh_foreach_value` directly
Edward Thomson bf339ab0 2017-01-21T14:51:31 indexer: introduce `git_packfile_close` Encapsulation!
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Carlos Martín Nieto 903955f7 2016-12-19T17:26:09 Merge pull request #4027 from pks-t/pks/pack-deref-cache-on-error pack: dereference cached pack entry on error
Patrick Steinhardt ff5eea06 2016-12-12T09:36:15 pack: dereference cached pack entry on error When trying to uncompress deltas in a packfile's delta chain, we try to add object bases to the packfile cache, subsequently decrementing its reference count if it has been added successfully. This may lead to a mismatched reference count in the case where we exit the loop early due to an encountered error. Fix the issue by decrementing the reference count in error cleanup.
Patrick Steinhardt 34b32053 2016-11-25T15:02:34 Fix potential use of uninitialized values
Patrick Steinhardt 0cf15e39 2016-11-02T12:23:12 pack: fix race in pack_entry_find_offset In `pack_entry_find_offset`, we try to find the offset of a certain object in the pack file. To do so, we first assert if the packfile has already been opened and open it if not. Opening the packfile is guarded with a mutex, so concurrent access to this is in fact safe. What is not thread-safe though is our calculation of offsets inside the packfile. Assume two threads calling `pack_entry_find_offset` at the same time. We first calculate the offset and index location and only then determine if the pack has already been opened. If so, we re-calculate the offset and index address. Now the case for two threads: thread 1 first calculates the addresses and is subsequently suspended. The second thread will now call `pack_index_open` and initialize the pack file, calculating its addresses correctly. When the first thread is resumed now, he'll see that the pack file has already been initialized and will happily proceed with the addresses it has already calculated before the check. As the pack file was not initialized before, these addresses are bogus. Fix the issue by only calculating the addresses after having checked if the pack file is open.
Edward Thomson 6a2d2f8a 2015-06-17T06:42:20 delta: move delta application to delta.c Move the delta application functions into `delta.c`, next to the similar delta creation functions. Make the `git__delta_apply` functions adhere to other naming and parameter style within the library.
Carlos Martín Nieto a97b769a 2016-04-27T12:00:31 odb: avoid inflating the full delta to read the header When we read the header, we want to know the size and type of the object. We're currently inflating the full delta in order to read the first few bytes. This can mean hundreds of kB needlessly inflated for large objects. Instead use a packfile stream to read just enough so we can read the two varints in the header and avoid inflating most of the delta.
Carlos Martín Nieto d53cc13e 2016-03-31T04:12:46 Merge pull request #3575 from pmq20/master-13jan16 Remove duplicated calls to git_mwindow_close
Edward Thomson e10144ae 2016-03-04T01:18:30 odb: improved not found error messages When looking up an abbreviated oid, show the actual (abbreviated) oid the caller passed instead of a full (but ambiguously truncated) oid.
Carlos Martín Nieto 6d97beb9 2016-02-25T15:46:59 pack: don't allow a negative offset
Carlos Martín Nieto ea9e00cb 2016-02-23T18:15:43 pack: make sure we don't go out of bounds for extended entries A corrupt index might have data that tells us to go look past the end of the file for data. Catch these cases and return an appropriate error message.
Patrick Steinhardt a53d2e39 2016-02-09T09:58:56 pack: do not free passed in poiter on error The function `git_packfile_stream_open` tries to free the passed in stream when an error occurs. The only call site is `git_indexer_append`, though, which passes in the address of a stream struct which has not been allocated on the heap. Fix the issue by simply removing the call to free. In case of an error we did not allocate any memory yet and otherwise it should be the caller's responsibility to manage it's object's lifetime.
P.S.V.R d4e4f272 2016-01-13T11:07:14 Remove duplicated calls to git_mwindow_close
P.S.V.R b644e223 2016-01-13T11:02:38 Make packfile_unpack_compressed a private API
Stefan Widgren c369b379 2015-07-31T16:23:11 Remove extra semicolon outside of a function Without this change, compiling with gcc and pedantic generates warning: ISO C does not allow extra ‘;’ outside of a function.
Carlos Martín Nieto 878293f7 2015-06-10T10:44:14 pack: use git_buf when building the index name The way we currently do it depends on the subtlety of strlen vs sizeof and the fact that .pack is one longer than .idx. Let's use a git_buf so we can express the manipulation we want much more clearly.
Edward Thomson 38c10ecd 2015-05-16T19:00:50 indexer: don't look for the index we're creating When creating an index, know that we do not have an index for our own packfile, preventing some unnecessary file opens and error reporting.
Carlos Martín Nieto b63b76e0 2014-10-12T11:42:31 Reorder some khash declarations Keep the definitions in the headers, while putting the declarations in the C files. Putting the function definitions in headers causes them to be duplicated if you include two headers with them.
Carlos Martín Nieto 5091aff7 2015-02-20T08:40:40 Merge pull request #2907 from jasonhaslam/git_packfile_unpack_race Fix race in git_packfile_unpack.
Jason Haslam 8588cb0c 2015-02-14T23:43:26 Fix race in git_packfile_unpack. Increment refcount of newly added cache entries just like existing entries looked up from the cache. Otherwise the new entry can be evicted from the cache and destroyed while it's still in use.
Edward Thomson f1453c59 2015-02-12T12:19:37 Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
Edward Thomson 392702ee 2015-02-09T23:41:13 allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
Jacques Germishuys 6f73e026 2014-12-24T11:42:50 Plug some leaks
Ravindra Patel ec7e680c 2014-11-20T12:07:55 Fix for misleading "missing delta bases" error - Fix #2721.
Pierre-Olivier Latour ea66215d 2014-10-26T10:29:19 Removed some useless variable assignments
Jacques Germishuys e640a77c 2014-09-25T15:29:03 Silence uninitialized warning
Arkady Shapkin 5cd81bb3 2014-09-03T01:01:25 Several CppCat warnings fixed
Carlos Martín Nieto b3d3459f 2014-08-26T15:09:47 pack: return the correct final offset The callers of git_packfile_unpack() expect the obj_offset argument to be set to the beginning of the next object. We were mistakenly returning the the offset of the object's data, which causes the CRC function to try to use the wrong offset. Set obj_offset to curpos instead of elem->offset to point to the next element and bring back expected behaviour.
Carlos Martín Nieto 5e0f47c3 2014-06-25T21:20:39 pack: free the new pack struct if we fail to insert If we fail to insert the packfile in the map, make sure to free it. This makes the free function only attempt to remove its mwindows from the global list if we have opened the packfile to avoid accessing the list unlocked.
Carlos Martín Nieto b3b66c57 2014-06-18T17:13:12 Share packs across repository instances Opening the same repository multiple times will currently open the same file multiple times, as well as map the same region of the file multiple times. This is not necessary, as the packfile data is immutable. Instead of opening and closing packfiles directly, introduce an indirection and allocate packfiles globally. This does mean locking on each packfile open, but we already use this lock for the global mwindow list so it doesn't introduce a new contention point.
Carlos Martín Nieto 649214be 2014-05-15T19:59:05 pack: init the cache on packfile alloc When running multithreaded, it is not enough to check for the offmap allocation. Move the call to cache_init() to packfile allocation so we can be sure it is always allocated free of races. This fixes #2355.
Carlos Martín Nieto c968ce2c 2014-05-12T02:01:05 pack: don't forget to cache the base object The base object is a good cache candidate, so we shouldn't forget to add it to the cache.
Carlos Martín Nieto 15bcced2 2014-05-11T05:31:22 pack: use stack allocation for smaller delta chains This avoid allocating the array on the heap for relatively small chains. The expected performance increase is sadly not really noticeable.
Carlos Martín Nieto a3ffbf23 2014-05-11T03:50:34 pack: expose a cached delta base directly Instead of going through a special entry in the chain, let's pass it as an output parameter.
Carlos Martín Nieto 9dbd150f 2014-05-09T09:36:09 pack: simplify delta chain code The switch makes the loop somewhat unwieldy. Let's assume it's fine and perform the check when we're accessing the data. This makes our code look a lot more like git's.
Carlos Martín Nieto b2559f47 2014-05-08T17:14:59 pack: preallocate a 64-element chain Dependency chains are often large and require a few reallocations. Allocate a 64-element chain before doing anything else to avoid allocations during the loop. This value comes from the stack-allocated one git uses. We still allocate this on the heap, but it does help performance a little bit.
Carlos Martín Nieto e6d10c58 2014-05-08T16:24:54 pack: make sure not to leak the dep chain
Carlos Martín Nieto a332e91c 2014-05-06T23:37:28 pack: use a cache for delta bases when unpacking Bring back the use of the delta base cache for unpacking objects. When generating the delta chain, we stop when we find a delta base in the pack's cache and use that as the starting point.
Carlos Martín Nieto 2acdf4b8 2014-05-06T19:20:33 pack: unpack using a loop We currently make use of recursive function calls to unpack an object, resolving the deltas as we come back down the chain. This means that we have unbounded stack growth as we look up objects in a pack. This is now done in two steps: first we figure out what the dependency chain is by looking up the delta bases until we reach a non-delta object, pushing the information we need onto a stack and then we pop from that stack and apply the deltas until there are no more left. This version of the code does not make use of the delta base cache so it is slower than what's in the mainline. A later commit will reintroduce it.
Carlos Martín Nieto ae081739 2014-05-06T21:21:04 pack: do not repeat the same error message four times Repeating this error message makes it harder to find out where we actually are finding the error, and they don't really describe what we're trying to do.
Carlos Martín Nieto 86d5810b 2014-05-06T16:20:14 pack: remove misleading comment
Linquize 8610487c 2014-01-23T23:28:28 Drop parsing pack filename SHA1 part, no one cares the filename
Russell Belfer 26c1cb91 2013-12-09T09:44:03 One more rename/cleanup for callback err functions
Russell Belfer c7b3e1b3 2013-12-06T15:42:20 Some callback error check style cleanups I find this easier to read...
Russell Belfer 25e0b157 2013-12-06T15:07:57 Remove converting user error to GIT_EUSER This changes the behavior of callbacks so that the callback error code is not converted into GIT_EUSER and instead we propagate the return value through to the caller. Instead of using the giterr_capture and giterr_restore functions, we now rely on all functions to pass back the return value from a callback. To avoid having a return value with no error message, the user can call the public giterr_set_str or some such function to set an error message. There is a new helper 'giterr_set_callback' that functions can invoke after making a callback which ensures that some error message was set in case the callback did not set one. In places where the sign of the callback return value is meaningful (e.g. positive to skip, negative to abort), only the negative values are returned back to the caller, obviously, since the other values allow for continuing the loop. The hardest parts of this were in the checkout code where positive return values were overloaded as meaningful values for checkout. I fixed this by adding an output parameter to many of the internal checkout functions and removing the overload. This added some code, but it is probably a better implementation. There is some funkiness in the network code where user provided callbacks could be returning a positive or a negative value and we want to rely on that to cancel the loop. There are still a couple places where an user error might get turned into GIT_EUSER there, I think, though none exercised by the tests.
Russell Belfer dab89f9b 2013-12-04T21:22:57 Further EUSER and error propagation fixes This continues auditing all the places where GIT_EUSER is being returned and making sure to clear any existing error using the new giterr_user_cancel helper. As a result, places that relied on intercepting GIT_EUSER but having the old error preserved also needed to be cleaned up to correctly stash and then retrieve the actual error. Additionally, as I encountered places where error codes were not being propagated correctly, I tried to fix them up. A number of those fixes are included in the this commit as well.
Vicent Marti 51a3dfb5 2013-11-01T16:31:02 pack: `__object_header` always returns unsigned values
Linquize 3343b5ff 2013-10-31T22:59:42 Fix warning on win64
Carlos Martín Nieto 51e82492 2013-10-03T16:54:25 pack: move the object header function here
Vicent Marti 67591c8c 2013-08-14T10:28:01 sha1_lookup: do not use the "experimental" lookup mode
Sven Strickroth 3a2d48d5 2013-07-25T14:54:19 Close p->mwf.fd only if necessary This fixes a regression introduced in revision 9d2f841a5d39fc25ce722a3904f6ebc9aa112222. Signed-off-by: Sven Strickroth <email@cs-ware.de>
Rémi Duraffort 050af8bb 2013-07-15T16:00:00 pack: fix memory leak in error path
Russell Belfer 1a42dd17 2013-05-31T14:13:11 Mutex init can fail It is obviously quite a serious problem if this happens, but mutex initialization can fail and we should detect it. It's a bit like a memory allocation failure, in that you're probably pretty screwed if this occurs, but at least we'll catch it.
Russell Belfer f658dc43 2013-05-31T14:09:58 Zero memory for major objects before freeing By zeroing out the memory when we free larger objects (i.e. those that serve as collections of other data, such as repos, odb, refdb), I'm hoping that it will be easier for libgit2 bindings to find errors in their object management code.
Carlos Martín Nieto 0ddfcb40 2013-05-02T18:06:14 Switch to index_version as "git_pack_file is ready" flag We use p->index_map.data to check whether the struct has been set up and all the information about the index is stored there. This variable gets set up halfway through the setup process, however, and a thread can come along and use fields that haven't been written to yet. Crucially, pack_entry_find_offset() needs to read the index version (which is written after index_map) to know the offset and stride length to pass to sha1_entry_pos(). If these values are wrong, assertions in it will fail, as it will be reading bogus data. Make index_version the last field to be written and switch from using p->index_map.data to p->index_version as "git_pack_file is ready" flag as we can use it to know if every field has been written.
Carlos Martín Nieto 34bd5999 2013-05-02T17:14:05 Revert "Protect sha1_entry_pos call with mutex" This reverts commit 8c535f3f6879c6796d8107d7eb80dd8b2105621b.
Russell Belfer 8c535f3f 2013-05-02T03:34:56 Protect sha1_entry_pos call with mutex There is an occasional assertion failure in sha1_entry_pos from pack_entry_find_index when running threaded. Holding the mutex around the code that grabs the index_map data and processes it makes this assertion failure go away.
Russell Belfer 9d2f841a 2013-05-02T03:03:54 Add extra locking around packfile open We were still seeing a few issues in threaded access to packs. This adds extra locks around the opening of the mwindow to avoid a different race.
Russell Belfer b7f167da 2013-04-29T13:52:12 Make git_oid_cmp public and add git_oid__cmp
Russell Belfer 5d2d21e5 2013-04-16T15:00:43 Consolidate packfile allocation further Rename git_packfile_check to git_packfile_alloc since it is now being used more in that capacity. Fix the various places that use it. Consolidate some repeated code in odb_pack.c related to the allocation of a new pack_backend.
Russell Belfer 38eef611 2013-04-16T14:19:27 Make indexer use shared packfile open code The indexer was creating a packfile object separately from the code in pack.c which was a problem since I put a call to git_mutex_init into just pack.c. This commit updates the pack function for creating a new pack object (i.e. git_packfile_check()) so that it can be used in both places and then makes indexer.c use the shared initialization routine. There are also a few minor formatting and warning message fixes.
Russell Belfer 53607868 2013-04-15T00:09:03 Further threading fixes This builds on the earlier thread safety work to make it so that setting the odb, index, refdb, or config for a repository is done in a threadsafe manner with minimized locking time. This is done by adding a lock to the repository object and using it to guard the assignment of the above listed pointers. The lock is only held to assign the pointer value. This also contains some minor fixes to the other work with pack files to reduce the time that locks are being held to and fix an apparently memory leak.
Russell Belfer 24c70804 2013-04-12T12:59:38 Add mutex around mapping and unmapping pack files When I was writing threading tests for the new cache, the main error I kept running into was a pack file having it's content unmapped underneath the running thread. This adds a lock around the routines that map and unmap the pack data so that threads can effectively reload the data when they need it. This also required reworking the error handling paths in a couple places in the code which I tried to make consistent.
Carlos Martín Nieto 0e040c03 2013-03-03T14:50:47 indexer: use a hashtable for keeping track of offsets These offsets are needed for REF_DELTA objects, which encode which object they use as a base, but not where it lies in the packfile, so we need a list. These objects are mostly from older packfiles, before OFS_DELTA was widely spread. The time spent in indexing these packfiles is greatly reduced, though remains above what git is able to do.
Philip Kelley 11d9f6b3 2013-01-27T14:17:07 Vector improvements and their fallout
Philip Kelley aa3bf89d 2013-01-26T15:12:53 Fix a mutex leak in pack.c
Carlos Martín Nieto 9c62aaab 2013-01-14T17:42:12 pack: evict all of the pages at once Somewhat surprisingly, this can increase the speed considerably, as we don't bother trying to decide what to evict, and the most used entries are quickly back into the cache.
Carlos Martín Nieto ed6648ba 2013-01-14T16:39:22 pack: evict objects from the cache in groups of eight This drops the cache eviction below libcrypto and zlib in the perf output. The number has been chosen empirically.
Carlos Martín Nieto 09e29e47 2013-01-12T19:31:07 pack: fixes to the cache The offset should be git_off_t, and we should check the return value of the mutex lock function.
Carlos Martín Nieto 96c9b9f0 2013-01-12T18:38:19 indexer: properly free the packfile resources The indexer needs to call the packfile's free function so it takes care of freeing the caches. We still need to close the mwf descriptor manually so we can rename the packfile into its final name on Windows.
Carlos Martín Nieto 80d647ad 2013-01-11T20:15:06 Revert "pack: packfile_free -> git_packfile_free and use it in the indexers" This reverts commit f289f886cb81bb570bed747053d5ebf8aba6bef7, which makes the tests fail on Windows. Revert until we can figure out a solution.
nulltoken 090d5e1f 2013-01-11T14:40:09 Fix MSVC compilation warnings
Carlos Martín Nieto f289f886 2013-01-11T17:24:52 pack: packfile_free -> git_packfile_free and use it in the indexers It turns out the indexers have been ignoring the pack's free function and leaking data. Plug that.
Carlos Martín Nieto 0ed75620 2012-12-21T13:46:48 pack: limit the amount of memory the base delta cache can use Currently limited to 16MB (like git) and to objects up to 1MB in size.
Carlos Martín Nieto c8f79c2b 2012-12-21T10:59:10 pack: abstract out the cache into its own functions
Carlos Martín Nieto c0f4a011 2012-12-19T16:48:12 pack: introduce a delta base cache Many delta bases are re-used. Cache them to avoid inflating the same data repeatedly. This version doesn't limit the amount of entries to store, so it can end up using a considerable amound of memory.
Carlos Martín Nieto 525d961c 2012-12-20T07:55:51 pack: refcount entries and add a mutex around cache access
Edward Thomson 359fc2d2 2013-01-08T17:07:25 update copyrights
Vicent Martí 0249a503 2012-12-07T09:40:21 Merge pull request #1091 from carlosmn/stream-object Indexer speedup with large objects