src/pack-objects.c


Log

Author Commit Date CI Message
Edward Thomson f0e693b1 2021-09-07T17:53:49 str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
Edward Thomson 31ecaca2 2021-09-30T08:11:40 hash: hash functions operate on byte arrays not git_oids Separate the concerns of the hash functions from the git_oid functions. The git_oid structure will need to understand either SHA1 or SHA256; the hash functions should only deal with the appropriate one of these.
Edward Thomson 2a713da1 2021-09-29T21:31:17 hash: accept the algorithm in inputs
Edward Thomson 419ffdde 2021-07-19T15:51:53 packbuilder: don't try to malloc(0)
Peter Pettersson e4e173e8 2021-07-15T21:00:02 Allow compilation on systems without CLOCK_MONOTONIC Makes usage of CLOCK_MONOTONIC conditional and makes functions that uses git__timer handle clock resynchronization. Call gettimeofday with tzp set to NULL as required by https://pubs.opengroup.org/onlinepubs/9699919799/functions/gettimeofday.html
Edward Thomson 404dd024 2020-12-05T15:57:48 threads: rename thread files to thread.[ch]
Edward Thomson 9800728a 2020-12-05T16:08:34 util: move git_online_cpus into util The number of CPUs is useful information for creating a thread pool or a number of workers, but it's not really about threading directly. Evict it from the thread file
Edward Thomson fe07ef8a 2020-12-05T21:38:29 packbuilder: use git__noop on non-threaded systems Our git_packbuilder__cache_lock function returns a value; use git__noop.
Edward Thomson 4dba9303 2020-11-22T10:42:41 pack-objects: use GIT_ASSERT
Edward Thomson 7cd0bf65 2020-04-05T18:26:52 pack: use GIT_ASSERT
Edward Thomson cb4bfbc9 2020-04-05T11:07:54 buffer: git_buf_sanitize should return a value `git_buf_sanitize` is called with user-input, and wants to sanity-check that input. Allow it to return a value if the input was malformed in a way that we cannot cope.
Patrick Steinhardt a6c9e0b3 2020-06-08T12:40:47 tree-wide: mark local functions as static We've accumulated quite some functions which are never used outside of their respective code unit, but which are lacking the `static` keyword. Add it to reduce their linkage scope and allow the compiler to optimize better.
Edward Thomson 6de8aa7f 2020-06-02T12:21:22 Merge pull request #5532 from joshtriplett/pack-default-path git_packbuilder_write: Allow setting path to NULL to use the default path
Edward Thomson 0f35efeb 2020-05-23T10:15:51 git_pool_init: handle failure cases Propagate failures caused by pool initialization errors.
Josh Triplett 5278a006 2020-05-23T16:07:54 git_packbuilder_write: Allow setting path to NULL to use the default path If given a NULL path, write to the object path of the repository. Add tests for the new behavior.
Josh Triplett 0bc091dd 2020-05-23T15:35:38 git_packbuilder_write: Unify cleanup path Clean up and return via a single label, to avoid duplicate error handling before each return, and to make it easier to extend the set of cleanups needed.
Patrick Steinhardt b169cd52 2020-02-07T12:13:42 pack-objects: check return code of `git_zstream_set_input` While `git_zstream_set_input` cannot fail right now, it might change in the future if we ever decide to have it check its parameters more vigorously. Let's thus check whether its return code signals an error.
Patrick Steinhardt 658022c4 2019-07-18T13:53:41 configuration: cvar -> configmap `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.
brian m. carlson c4df926b 2019-07-16T21:54:10 pack-objects: allocate memory more efficiently The packbuilder code allocates memory in chunks. When it needs to allocate, it tries to add 1024 to the number of objects and multiply by 3/2. However, it actually multiplies by 1 instead, since it performs an integral division in the expression "3 / 2" and only then multiplies by the increased number of objects. The current behavior causes the code to waste massive amounts of time copying memory when it reallocates, causing inserting all non-blob objects in the Linux repository into a new pack to take some indeterminate time greater than 5 minutes instead of 52 seconds. Correct this error by first dividing by two, and only then multiplying by 3. We still check for overflow for the multiplication, which is the only part that can overflow. This appears to be the only place in the code base which has this problem.
Edward Thomson a1ef995d 2019-02-21T10:33:30 indexer: use git_indexer_progress throughout Update internal usage of `git_transfer_progress` to `git_indexer_progreses`.
Patrick Steinhardt 2e0a3048 2019-01-23T10:48:55 oidmap: introduce high-level setter for key/value pairs Currently, one would use either `git_oidmap_insert` to insert key/value pairs into a map or `git_oidmap_put` to insert a key only. These function have historically been macros, which is why their syntax is kind of weird: instead of returning an error code directly, they instead have to be passed a pointer to where the return value shall be stored. This does not match libgit2's common idiom of directly returning error codes.Furthermore, `git_oidmap_put` is tightly coupled with implementation details of the map as it exposes the index of inserted entries. Introduce a new function `git_oidmap_set`, which takes as parameters the map, key and value and directly returns an error code. Convert all trivial callers of `git_oidmap_insert` and `git_oidmap_put` to make use of it.
Patrick Steinhardt 9694ef20 2018-12-17T09:01:53 oidmap: introduce high-level getter for values The current way of looking up an entry from a map is tightly coupled with the map implementation, as one first has to look up the index of the key and then retrieve the associated value by using the index. As a caller, you usually do not care about any indices at all, though, so this is more complicated than really necessary. Furthermore, it invites for errors to happen if the correct error checking sequence is not being followed. Introduce a new high-level function `git_oidmap_get` that takes a map and a key and returns a pointer to the associated value if such a key exists. Otherwise, a `NULL` pointer is returned. Adjust all callers that can trivially be converted.
Patrick Steinhardt 351eeff3 2019-01-23T10:42:46 maps: use uniform lifecycle management functions Currently, the lifecycle functions for maps (allocation, deallocation, resize) are not named in a uniform way and do not have a uniform function signature. Rename the functions to fix that, and stick to libgit2's naming scheme of saying `git_foo_new`. This results in the following new interface for allocation: - `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an error code if we ran out of memory - `void git_<t>map_free(git_<t>map *map)` to free a map - `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map This commit also fixes all existing callers.
Edward Thomson f673e232 2018-12-27T13:47:34 git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
Edward Thomson 168fe39b 2018-11-28T14:26:57 object_type: use new enumeration names Use the new object_type enumeration names within the codebase.
Patrick Steinhardt 852bc9f4 2018-11-23T19:26:24 khash: remove intricate knowledge of khash types Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types, simply use `size_t` instead. This decouples code from the khash stuff and makes it possible to move the khash includes into the implementation files.
Anders Borum b36cc7a4 2018-09-27T11:18:00 fix check if blob is uninteresting when inserting tree to packbuilder Blobs that have been marked as uninteresting should not be inserted into packbuilder when inserting a tree. The check as to whether a blob was uninteresting looked at the status for the tree itself instead of the blob. This could cause significantly larger packfiles.
Patrick Steinhardt c16556aa 2017-11-12T10:31:48 indexer: introduce options struct to `git_indexer_new` We strive to keep an options structure to many functions to be able to extend options in the future without breaking the API. `git_indexer_new` doesn't have one right now, but we want to be able to add an option for enabling strict packfile verification. Add a new `git_indexer_options` structure and adjust callers to use that.
Patrick Steinhardt 4e8dc055 2017-10-11T10:29:11 pack-objects: make `git_walk_object` internal to pack-objects The `git_walk_objects` structure is currently only being used inside of the pack-objects.c file, but being declared in its header. This has actually been the case since its inception in 04a36feff (pack-objects: fill a packbuilder from a walk, 2014-10-11) and has never really changed. Move the struct declaration into pack-objects.c to improve code encapsulation.
Patrick Steinhardt ecf4f33a 2018-02-08T11:14:48 Convert usage of `git_buf_free` to new `git_buf_dispose`
Edward Thomson 8cdf439b 2017-12-30T13:07:03 Merge pull request #4028 from chescock/improve-local-fetch Transfer fewer objects on push and local fetch
Edward Thomson 1c04a96b 2017-02-28T12:29:29 Honor `core.fsyncObjectFiles`
Patrick Steinhardt 0d716905 2017-01-27T15:23:15 oidmap: remove GIT__USE_OIDMAP macro
Patrick Steinhardt 73028af8 2017-01-27T14:20:24 khash: avoid using macro magic to get return address
Patrick Steinhardt 85d2748c 2017-01-27T14:05:10 khash: avoid using `kh_key`/`kh_val` as lvalue
Patrick Steinhardt f31cb45a 2017-01-25T15:31:12 khash: avoid using `kh_put` directly
Patrick Steinhardt cb18386f 2017-01-25T14:26:58 khash: avoid using `kh_val`/`kh_value` directly
Patrick Steinhardt c37b069b 2017-01-25T14:16:35 khash: avoid using `kh_clear` directly
Patrick Steinhardt a853c527 2017-01-25T14:14:32 khash: avoid using `kh_get` directly
Patrick Steinhardt 64e46dc3 2017-01-25T14:14:12 khash: avoid using `kh_end` directly
Patrick Steinhardt 036daa59 2017-01-25T14:11:42 khash: use `git_map_exists` where applicable
Etienne Samson 49be45a1 2016-12-26T22:15:31 pack: report revwalk error
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Chris Hescock 709768a8 2016-12-09T15:59:52 Don't fetch objects we don't need in local transport. Hide all local refs in the revwalk. Packbuilder should not add hidden trees or blobs.
Edward Thomson 60e15ecd 2016-07-15T17:18:39 packbuilder: `size_t` all the things After 1cd65991, we were passing a pointer to an `unsigned long` to a function that now expected a pointer to a `size_t`. These types differ on 64-bit Windows, which means that we trash the stack. Use `size_t`s in the packbuilder to avoid this.
Edward Thomson 20302aa4 2016-06-25T23:33:05 Merge pull request #3223 from ethomson/apply Reading patch files
Patrick Steinhardt faebc1c6 2016-06-20T17:44:04 threads: split up OS-dependent thread code
Edward Thomson a03952f0 2015-09-25T12:44:14 patch: formatting cleanups
Edward Thomson b88f1713 2015-06-17T08:07:34 zstream: offer inflating, `git_zstream_inflatebuf` Introduce `git_zstream_inflatebuf` for simple uses.
Edward Thomson 1cd65991 2015-06-17T07:31:47 delta: refactor git_delta functions for consistency Refactor the git_delta functions to have consistent naming and parameters with the rest of the library.
Patrick Steinhardt 3fe5768b 2016-03-01T17:55:40 pack-objects: fix memory leak on overflow
Patrick Steinhardt bac52ab0 2016-02-22T13:48:45 pack-objects: return early when computing write order fails The function `compute_write_order` may return a `NULL`-pointer when an error occurs. In such cases we jump to the `done`-label where we try to clean up allocated memory. Unfortunately we try to deallocate the `write_order` array, though, which may be NULL here. Fix this error by returning early instead of jumping to the `done` label. There is no data to be cleaned up anyway.
Patrick Steinhardt d1c9a48d 2016-02-23T10:45:09 pack-objects: check realloc in try_delta with GITERR_CHECK_ALLOC
Patrick Steinhardt 39c9dd24 2016-02-09T10:53:30 pack-objects: fix memory leak in packbuilder_config
Patrick Steinhardt 0b2437bb 2016-02-09T10:43:28 pack-objects: fix memory leak in compute_write_order
Vicent Marti 1e5e02b4 2015-10-27T17:26:04 pool: Simplify implementation
Stefan Widgren c369b379 2015-07-31T16:23:11 Remove extra semicolon outside of a function Without this change, compiling with gcc and pedantic generates warning: ISO C does not allow extra ‘;’ outside of a function.
Carlos Martín Nieto 3c337a5d 2015-05-06T13:09:00 packbuilder: report progress during deltification This is useful to send to the client while we're performing the work. The reporting function has a force parameter which makes sure that we do send out the message of 100% completed, even if this comes before the next udpate window.
Carlos Martín Nieto a61fa4c0 2015-03-12T01:26:09 packbuilder: introduce git_packbuilder_insert_recur() This function recursively inserts the given object and any referenced ones. It can be thought of as a more general version of the functions to insert a commit or tree.
Carlos Martín Nieto 04a36fef 2014-10-11T15:48:29 pack-objects: fill a packbuilder from a walk Most use-cases for the object packer communicate in terms of commits which each side has. We already have an object to specify this relationship between commits, namely git_revwalk. By knowing which commits we want to pack and which the other side already has, we can perform similar optimisations to git, by marking each tree as interesting or uninteresting only once, and not sending those trees which we know the other side has.
Carlos Martín Nieto b63b76e0 2014-10-12T11:42:31 Reorder some khash declarations Keep the definitions in the headers, while putting the declarations in the C files. Putting the function definitions in headers causes them to be duplicated if you include two headers with them.
Edward Thomson 0f07d54b 2015-02-13T09:35:20 pack-objects: unlock the cache on integer overflow
Edward Thomson f1453c59 2015-02-12T12:19:37 Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
Edward Thomson 190b76a6 2015-02-11T14:52:08 Introduce git__add_sizet_overflow and friends Add some helper functions to check for overflow in a type-specific manner.
Edward Thomson ec3b4d35 2015-02-11T11:20:05 Use `size_t` to hold size of arrays Use `size_t` to hold the size of arrays to ease overflow checking, lest we check for overflow of a `size_t` then promptly truncate by packing the length into a smaller type.
Edward Thomson 2884cc42 2015-02-11T09:39:38 overflow checking: don't make callers set oom Have the ALLOC_OVERFLOW testing macros also simply set_oom in the case where a computation would overflow, so that callers don't need to.
Edward Thomson 3603cb09 2015-02-10T23:13:49 git__*allocarray: safer realloc and malloc Introduce git__reallocarray that checks the product of the number of elements and element size for overflow before allocation. Also introduce git__mallocarray that behaves like calloc, but without the `c`. (It does not zero memory, for those truly worried about every cycle.)
Edward Thomson 392702ee 2015-02-09T23:41:13 allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
Philip Kelley fb591767 2014-06-07T12:51:48 Win32: Fix object::cache::threadmania test on x64
Russell Belfer af567e88 2014-05-12T10:44:13 Merge pull request #2334 from libgit2/rb/fix-2333 Be more careful with user-supplied buffers
Russell Belfer d2c4d1c6 2014-05-12T10:04:52 Merge pull request #2188 from libgit2/cmn/config-snapshot Configuration snapshotting
Russell Belfer 1e4976cb 2014-05-08T10:17:14 Be more careful with user-supplied buffers This adds in missing calls to `git_buf_sanitize` and fixes a number of places where `git_buf` APIs could inadvertently write NUL terminator bytes into invalid buffers. This also changes the behavior of `git_buf_sanitize` to NUL terminate a buffer if it can and of `git_buf_shorten` to do nothing if it can. Adds tests of filtering code with zeroed (i.e. unsanitized) buffer which was previously triggering a segfault.
Carlos Martín Nieto ac99d86b 2014-05-07T11:34:32 repository: introduce a convenience config snapshot method Accessing the repository's config and immediately taking a snapshot of it is a common operation, so let's provide a convenience function for it.
Carlos Martín Nieto 38d338b2 2014-04-26T18:15:39 pack-objects: always write out the status in write_one() Make sure we set the output parameter to a value.
Jacques Germishuys 48e60ae7 2014-04-21T11:23:29 Don't redefine the same callback types, their signatures may change
Carlos Martín Nieto 29c4cb09 2014-03-15T03:53:36 Use config snapshotting This way we can assume we have a consistent view of the config situation when we're looking up remote, branch, pack-objects, etc.
Carlos Martín Nieto a14aa1e7 2014-03-04T20:09:17 pack-objects: free memory safely A few fixes have accumulated in this area which have made the freeing of data a bit muddy. Make sure to free the data only when needed and once. When we are going to write a delta to the packfile, we need to free the data, otherwise leave it. The current version of the code mixes up the checks for po->data and po->delta_data.
Russell Belfer d9b04d78 2014-01-29T15:02:35 Reorganize zstream API and fix wrap problems There were some confusing issues mixing up the number of bytes written to the zstream output buffer with the number of bytes consumed from the zstream input. This reorganizes the zstream API and makes it easier to deflate an arbitrarily large input while still using a fixed size output.
Russell Belfer e9d5e5f3 2014-01-28T16:25:42 Some fixes for Windows x64 warnings
XTao 1cb5a811 2014-01-26T17:07:39 Fix write_object.
Edward Thomson 52a8a130 2014-01-06T16:41:12 Packbuilder contains its own zstream
Edward Thomson 0ade2f7a 2013-12-14T10:37:57 Packbuilder stream deflate instead of one-shot
Edward Thomson c6f26b48 2013-12-13T18:26:46 Refactor zlib for easier deflate streaming
Russell Belfer 7e3ed419 2013-12-11T16:56:17 Fix up some valgrind leaks and warnings
Russell Belfer 26c1cb91 2013-12-09T09:44:03 One more rename/cleanup for callback err functions
Russell Belfer c7b3e1b3 2013-12-06T15:42:20 Some callback error check style cleanups I find this easier to read...
Russell Belfer 25e0b157 2013-12-06T15:07:57 Remove converting user error to GIT_EUSER This changes the behavior of callbacks so that the callback error code is not converted into GIT_EUSER and instead we propagate the return value through to the caller. Instead of using the giterr_capture and giterr_restore functions, we now rely on all functions to pass back the return value from a callback. To avoid having a return value with no error message, the user can call the public giterr_set_str or some such function to set an error message. There is a new helper 'giterr_set_callback' that functions can invoke after making a callback which ensures that some error message was set in case the callback did not set one. In places where the sign of the callback return value is meaningful (e.g. positive to skip, negative to abort), only the negative values are returned back to the caller, obviously, since the other values allow for continuing the loop. The hardest parts of this were in the checkout code where positive return values were overloaded as meaningful values for checkout. I fixed this by adding an output parameter to many of the internal checkout functions and removing the overload. This added some code, but it is probably a better implementation. There is some funkiness in the network code where user provided callbacks could be returning a positive or a negative value and we want to rely on that to cancel the loop. There are still a couple places where an user error might get turned into GIT_EUSER there, I think, though none exercised by the tests.
Russell Belfer dab89f9b 2013-12-04T21:22:57 Further EUSER and error propagation fixes This continues auditing all the places where GIT_EUSER is being returned and making sure to clear any existing error using the new giterr_user_cancel helper. As a result, places that relied on intercepting GIT_EUSER but having the old error preserved also needed to be cleaned up to correctly stash and then retrieve the actual error. Additionally, as I encountered places where error codes were not being propagated correctly, I tried to fix them up. A number of those fixes are included in the this commit as well.
Linquize fb190bbb 2013-11-12T19:44:13 Fix warnings
Edward Thomson 1e60e5f4 2013-11-07T12:03:44 Allow callers to set mode on packfile creation
Edward Thomson cc2447da 2013-11-06T18:41:08 Add git_packbuilder_hash to query pack filename
Russell Belfer 948f00b4 2013-11-01T09:38:03 Merge pull request #1933 from libgit2/vmg/gcc-warnings Warnings for Windows x64 (MSVC) and GCC on Linux
Linquize 3343b5ff 2013-10-31T22:59:42 Fix warning on win64
Vicent Martí ac5e507c 2013-11-01T09:31:52 Merge pull request #1918 from libgit2/cmn/indexer-naming indexer: remove the stream infix
Carlos Martín Nieto a6154f21 2013-10-30T15:00:05 indexer: remove the stream infix It was there to keep it apart from the one which read in from a file on disk. This other indexer does not exist anymore, so there is no need for anything other than git_indexer to refer to it. While here, rename _add() function to _append() and _finalize() to _commit(). The former change is cosmetic, while the latter avoids talking about "finalizing", which OO languages use to mean something completely different.
Vicent Marti 04e0c2b2 2013-10-30T14:00:44 pack-objects: Depth can be negative
Vicent Martí 5c50f22a 2013-10-28T09:25:44 Merge pull request #1891 from libgit2/cmn/fix-thin-packs Add support for thin packs
Carlos Martín Nieto 0b33fca0 2013-10-02T13:39:35 indexer: fix thin packs When given an ODB from which to read objects, the indexer will attempt to inject the missing bases at the end of the pack and update the header and trailer to reflect the new contents.
Carlos Martín Nieto 51e82492 2013-10-03T16:54:25 pack: move the object header function here
Jameson Miller 5b188225 2013-10-02T13:45:32 Support cancellation in push operation This commit adds cancellation for the push operation. This work consists of: 1) Support cancellation during push operation - During object counting phase - During network transfer phase - Propagate GIT_EUSER error code out to caller 2) Improve cancellation support during fetch - Handle cancellation request during network transfer phase - Clear error string when cancelled during indexing 3) Fix error handling in git_smart__download_pack Cancellation during push is still only handled in the pack building and network transfer stages of push (and not during packbuilding).