kmx git

Commit	Date	Message
95117d47	2021-10-31T09:45:46	path: separate git-specific path functions from util Introduce `git_fs_path`, which operates on generic filesystem paths. `git_path` will be kept for only git-specific path functionality (for example, checking for `.git` in a path).
f0e693b1	2021-09-07T17:53:49	str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
50b3c2d5	2021-08-30T08:31:50	pack: don't assert in the lock
9d117e38	2020-02-17T21:28:13	midx: Add a way to write multi-pack-index files This change adds the git_midx_writer_* functions to allow to write and create `multi-pack-index` files from `.idx`/`.pack` files. Part of: #5399
366115e0	2021-08-27T04:06:31	Review feedback
fff209c4	2020-02-17T21:28:13	midx: Add a way to write multi-pack-index files This change adds the git_midx_writer_* functions to allow to write and create `multi-pack-index` files from `.idx`/`.pack` files. Part of: #5399
ff6f6754	2021-01-07T05:44:16	Use `p_pwrite`/`p_pread` consistently throughout the codebase This change stops using the seek+read/write combo to perform I/O with an offset, since this is faster by one system call (and also more atomic and therefore safer).
93f61c5a	2020-12-15T23:03:03	pack: continue zlib while we can make progress Continue the zlib stream as long as we can make progress; stop when we stop getting output _or_ when zlib stops taking input from us.
37763d38	2020-12-05T15:26:59	threads: rename git_atomic to git_atomic32 Clarify the `git_atomic` type and functions now that we have a 64 bit version as well (`git_atomic64`).
322c15ee	2020-08-01T18:24:41	Make the pack and mwindow implementations data-race-free This change fixes a packfile heap corruption that can happen when interacting with multiple packfiles concurrently across multiple threads. This is exacerbated by setting a lower mwindow open file limit. This change: * Renames most of the internal methods in pack.c to clearly indicate that they expect to be called with a certain lock held, making reasoning about the state of locks a bit easier. * Splits the `git_pack_file` lock in two: the one in `git_pack_file` only protects the `index_map`. The protection to `git_mwindow_file` is now in that struct. * Explicitly checks for freshness of the `git_pack_file` in `git_packfile_unpack_header`: this allows the mwindow implementation to close files whenever there is enough cache pressure, and `git_packfile_unpack_header` will reopen the packfile if needed. * After a call to `p_munmap()`, the `data` and `len` fields are poisoned with `NULL` to make use-after-frees more evident and crash rather than being open to the possibility of heap corruption. * Adds a test case to prevent this from regressing in the future. Fixes: #5591
4ae41f9c	2020-08-02T16:26:25	Make the odb race-free This change adds all the necessary locking to the odb to avoid races in the backends. Part of: #5592
f847fa7b	2020-02-16T02:00:56	midx: Support multi-pack-index files in odb_pack.c This change adds support for reading multi-pack-index files from the packfile odb backend. This also makes git_pack_file objects open their backing failes lazily in more scenarios, since the multi-pack-index can avoid having to open them in some cases (yay!). This change also refreshes the documentation found in src/odb_pack.c to match the updated code. Part of: #5399
7cd0bf65	2020-04-05T18:26:52	pack: use GIT_ASSERT
005e7715	2020-02-23T22:28:52	multipack: Introduce a parser for multi-pack-index files This change is the first in a series to add support for git's multi-pack-index. This should speed up large repositories significantly. Part of: #5399
4d4befac	2020-08-05T10:07:23	pack: check pack_window_open return
9bb61bad	2020-08-05T09:42:52	zstream: handle Z_BUF_ERROR appropriately in get_output_chunk Our processing loop in git_zstream_get_output_chunk does not handle `Z_BUF_ERROR` appropriately at the end of a compressed window. From the zlib manual, inflate will return: > Z_BUF_ERROR if no progress was possible or if there was not enough > room in the output buffer when Z_FINISH is used. Note that Z_BUF_ERROR > is not fatal, and inflate() can be called again with more input and > more output space to continue decompressing. In our loop, we were waiting until we got the expected size, then ensuring that we were at `Z_STREAM_END`. We are not guaranteed to be, since zlib may be in the `Z_BUF_ERROR` state where it has consumed a full window's worth of data, but it doesn't know that it's really at the end of the stream. There _could_ be more compressed data, but it doesn't _know_ that there's not until we make a subsequent call. We can change the loop to look for the end of stream instead of our expected size. This allows us to call inflate one last time when we are at the end of a window (and in the `Z_BUF_ERROR` state), allowing it to recognize the end of the stream, and move from the `Z_BUF_ERROR` state to the `Z_STREAM_END` state. If we do this, we need another exit condition: when `bytes == 0`, then no progress could be made and we should stop trying to inflate. This will be an error case, caught by the size and/or end-of-stream test.
4d4c8e0a	2020-04-02T07:34:55	Re-adding the "delta offset is zero" error case
ba59a4a2	2020-04-01T12:34:16	Making get_delta_base() conform to the general error-handling pattern This makes get_delta_base() return the error code as the return value and the delta base as an out-parameter.
f3273725	2020-02-25T20:58:09	pack: Improve error handling for get_delta_base() This change moves the responsibility of setting the error upon failures of get_delta_base() to get_delta_base() instead of its callers. That way, the caller chan always check if the return value is negative and mark the whole operation as an error instead of using garbage values, which can lead to crashes if the .pack files are malformed.
f0f1cd1d	2020-02-07T10:51:17	sha1_lookup: inline its only function into "pack.c" The file "sha1_lookup.c" contains a single function `sha1_position` only which is used only in the packfile implementation. As the function is comparatively small, to enable the compiler to optimize better and to remove symbol visibility, move it into "pack.c".
0edc26c8	2019-12-13T18:54:13	pack: refactor streams to use `git_zstream` While we do have a `git_zstream` abstraction that encapsulates all the calls to zlib as well as its error handling, we do not use it in our pack file code. Refactor it to make the code a lot easier to understand.
d8f6fee3	2019-12-13T14:57:53	pack: refactor unpacking of raw objects to use `git_zstream` While we do have a zstream abstraction that encapsulates all the calls to zlib as well as its error handling, we do not use it in our pack file code. Refactor it to make the code a lot easier to understand.
6460e8ab	2019-06-23T18:13:29	internal: use off64_t instead of git_off_t Prefer `off64_t` internally.
e54343a4	2019-06-29T09:17:32	fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.
b9d0b664	2018-12-17T09:10:53	offmap: introduce high-level setter for key/value pairs Currently, there is only one caller that adds entries into an offset map, and this caller first uses `git_offmap_put` to add a key and then set the value at the returned index by using `git_offmap_set_value_at`. This is just too tighlty coupled with implementation details of the map as it exposes the index of inserted entries, which we really do not care about at all. Introduce a new function `git_offmap_set`, which takes as parameters the map, key and value and directly returns an error code. Convert the caller to make use of it instead.
aa245623	2018-11-30T18:28:05	offmap: introduce high-level getter for values The current way of looking up an entry from a map is tightly coupled with the map implementation, as one first has to look up the index of the key and then retrieve the associated value by using the index. As a caller, you usually do not care about any indices at all, though, so this is more complicated than really necessary. Furthermore, it invites for errors to happen if the correct error checking sequence is not being followed. Introduce a new high-level function `git_offmap_get` that takes a map and a key and returns a pointer to the associated value if such a key exists. Otherwise, a `NULL` pointer is returned. Adjust all callers that can trivially be converted.
9694ef20	2018-12-17T09:01:53	oidmap: introduce high-level getter for values The current way of looking up an entry from a map is tightly coupled with the map implementation, as one first has to look up the index of the key and then retrieve the associated value by using the index. As a caller, you usually do not care about any indices at all, though, so this is more complicated than really necessary. Furthermore, it invites for errors to happen if the correct error checking sequence is not being followed. Introduce a new high-level function `git_oidmap_get` that takes a map and a key and returns a pointer to the associated value if such a key exists. Otherwise, a `NULL` pointer is returned. Adjust all callers that can trivially be converted.
351eeff3	2019-01-23T10:42:46	maps: use uniform lifecycle management functions Currently, the lifecycle functions for maps (allocation, deallocation, resize) are not named in a uniform way and do not have a uniform function signature. Rename the functions to fix that, and stick to libgit2's naming scheme of saying `git_foo_new`. This results in the following new interface for allocation: - `int git_<t>map_new(git_<t>map *out)` to allocate a new map, returning an error code if we ran out of memory - `void git_<t>map_free(git_<t>map map)` to free a map - `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map This commit also fixes all existing callers.
004a3398	2019-01-28T18:31:21	Allow bypassing check '.keep' files using libgit2 option 'GIT_OPT_IGNORE_PACK_KEEP_FILE_CHECK'
f673e232	2018-12-27T13:47:34	git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
cd350852	2019-01-17T10:40:13	object_type: GIT_OBJECT_BAD is now GIT_OBJECT_INVALID We use the term "invalid" to refer to bad or malformed data, eg `GIT_REF_INVALID` and `GIT_EINVALIDSPEC`. Since we're changing the names of the `git_object_t`s in this release, update it to be `GIT_OBJECT_INVALID` instead of `BAD`.
168fe39b	2018-11-28T14:26:57	object_type: use new enumeration names Use the new object_type enumeration names within the codebase.
852bc9f4	2018-11-23T19:26:24	khash: remove intricate knowledge of khash types Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types, simply use `size_t` instead. This decouples code from the khash stuff and makes it possible to move the khash includes into the implementation files.
ecf4f33a	2018-02-08T11:14:48	Convert usage of `git_buf_free` to new `git_buf_dispose`
c8ee5270	2017-12-08T09:05:58	pack: rename `git_packfile_stream_free` The function `git_packfile_stream_free` frees all state of the packfile stream without freeing the structure itself. This naming makes it hard to spot whether it will try to free the pointer itself or not, causing potential future errors. Due to this reason, we have decided to name a function freeing state without freeing the actual struture a "dispose" function. Rename `git_packfile_stream_free` to `git_packfile_stream_dispose` as a first example following this rule.
c3514b0b	2017-12-23T14:59:07	Fix unpack double free If an element has been cached, but then the call to packfile_unpack_compressed() fails, the very next thing that happens is that its data is freed and then the element is not removed from the cache, which frees the data again. This change sets obj->data to NULL to avoid the double-free. It also stops trying to resolve deltas after two continuous failed rounds of resolution, and adds a test for this.
53f2c6b1	2017-12-15T15:01:50	Simplified overflow condition
e7fac2af	2017-12-09T05:26:27	Using unsigned instead
28662c13	2017-12-08T06:00:27	libFuzzer: Prevent a potential shift overflow The type of \|base_offset\| in get_delta_base() is `git_off_t`, which is a signed `long`. That means that we need to make sure that the 8 most significant bits are zero (instead of 7) to avoid an overflow when it is shifted by 7 bits. Found using libFuzzer.
1560b580	2017-08-15T10:35:47	Merge pull request #4288 from pks-t/pks/include-fixups Include fixups
9842b327	2017-08-09T16:47:14	sha1_lookup: drop sha1_entry_pos function This was pulled over from git.git, and is an experiment in making binary-searching lists of sha1s faster. It was never compiled by default (nor was it used upstream by default without a special environment variable). Unfortunately, it is actually slower in practice, and upstream is planning to drop it in git/git@f1068efefe6dd3beaa89484db5e2db730b094e0b (which has some timing results). It's worth doing the same here for simplicity.
0c7f49dd	2017-06-30T13:39:01	Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
a693b873	2017-06-07T10:20:44	buffer: use `git_buf_init` with length The `git_buf_init` function has an optional length parameter, which will cause the buffer to be initialized and allocated in one step. This can be used instead of static initialization with `GIT_BUF_INIT` followed by a `git_buf_grow`. This patch does so for two functions where it is applicable.
97eb5ef0	2017-06-07T10:05:54	buffer: rely on `GITERR_OOM` set by `git_buf_try_grow` The function `git_buf_try_grow` consistently calls `giterr_set_oom` whenever growing the buffer fails due to insufficient memory being available. So in fact, we do not have to do this ourselves when a call to any buffer-growing function has failed due to an OOM situation. But we still do so in two functions, which this patch cleans up.
685f2251	2017-02-22T09:29:00	pack: fix looping over cache entries Fixes a regression from #4092. This is a crash on 32-bit and I assume that it doesn't do the right thing on 64-bit either. MSVC emits a warning for this, but of course, it's easy to get lost among all of the similar 'possible loss of data' warnings.
8f5fe903	2017-02-02T11:58:48	offmap: remove GIT__USE_OFFMAP macro
0d716905	2017-01-27T15:23:15	oidmap: remove GIT__USE_OIDMAP macro
85d2748c	2017-01-27T14:05:10	khash: avoid using `kh_key`/`kh_val` as lvalue
f31cb45a	2017-01-25T15:31:12	khash: avoid using `kh_put` directly
a8cd560b	2017-01-25T14:41:17	khash: avoid using `kh_del` directly
cb18386f	2017-01-25T14:26:58	khash: avoid using `kh_val`/`kh_value` directly
a853c527	2017-01-25T14:14:32	khash: avoid using `kh_get` directly
64e46dc3	2017-01-25T14:14:12	khash: avoid using `kh_end` directly
036daa59	2017-01-25T14:11:42	khash: use `git_map_exists` where applicable
9694d9ba	2017-01-25T14:09:17	khash: avoid using `kh_foreach`/`kh_foreach_value` directly
bf339ab0	2017-01-21T14:51:31	indexer: introduce `git_packfile_close` Encapsulation!
909d5494	2016-12-29T12:25:15	giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
903955f7	2016-12-19T17:26:09	Merge pull request #4027 from pks-t/pks/pack-deref-cache-on-error pack: dereference cached pack entry on error
ff5eea06	2016-12-12T09:36:15	pack: dereference cached pack entry on error When trying to uncompress deltas in a packfile's delta chain, we try to add object bases to the packfile cache, subsequently decrementing its reference count if it has been added successfully. This may lead to a mismatched reference count in the case where we exit the loop early due to an encountered error. Fix the issue by decrementing the reference count in error cleanup.
34b32053	2016-11-25T15:02:34	Fix potential use of uninitialized values
0cf15e39	2016-11-02T12:23:12	pack: fix race in pack_entry_find_offset In `pack_entry_find_offset`, we try to find the offset of a certain object in the pack file. To do so, we first assert if the packfile has already been opened and open it if not. Opening the packfile is guarded with a mutex, so concurrent access to this is in fact safe. What is not thread-safe though is our calculation of offsets inside the packfile. Assume two threads calling `pack_entry_find_offset` at the same time. We first calculate the offset and index location and only then determine if the pack has already been opened. If so, we re-calculate the offset and index address. Now the case for two threads: thread 1 first calculates the addresses and is subsequently suspended. The second thread will now call `pack_index_open` and initialize the pack file, calculating its addresses correctly. When the first thread is resumed now, he'll see that the pack file has already been initialized and will happily proceed with the addresses it has already calculated before the check. As the pack file was not initialized before, these addresses are bogus. Fix the issue by only calculating the addresses after having checked if the pack file is open.
6a2d2f8a	2015-06-17T06:42:20	delta: move delta application to delta.c Move the delta application functions into `delta.c`, next to the similar delta creation functions. Make the `git__delta_apply` functions adhere to other naming and parameter style within the library.
a97b769a	2016-04-27T12:00:31	odb: avoid inflating the full delta to read the header When we read the header, we want to know the size and type of the object. We're currently inflating the full delta in order to read the first few bytes. This can mean hundreds of kB needlessly inflated for large objects. Instead use a packfile stream to read just enough so we can read the two varints in the header and avoid inflating most of the delta.
d53cc13e	2016-03-31T04:12:46	Merge pull request #3575 from pmq20/master-13jan16 Remove duplicated calls to git_mwindow_close
e10144ae	2016-03-04T01:18:30	odb: improved not found error messages When looking up an abbreviated oid, show the actual (abbreviated) oid the caller passed instead of a full (but ambiguously truncated) oid.
6d97beb9	2016-02-25T15:46:59	pack: don't allow a negative offset
ea9e00cb	2016-02-23T18:15:43	pack: make sure we don't go out of bounds for extended entries A corrupt index might have data that tells us to go look past the end of the file for data. Catch these cases and return an appropriate error message.
a53d2e39	2016-02-09T09:58:56	pack: do not free passed in poiter on error The function `git_packfile_stream_open` tries to free the passed in stream when an error occurs. The only call site is `git_indexer_append`, though, which passes in the address of a stream struct which has not been allocated on the heap. Fix the issue by simply removing the call to free. In case of an error we did not allocate any memory yet and otherwise it should be the caller's responsibility to manage it's object's lifetime.
d4e4f272	2016-01-13T11:07:14	Remove duplicated calls to git_mwindow_close
b644e223	2016-01-13T11:02:38	Make packfile_unpack_compressed a private API
c369b379	2015-07-31T16:23:11	Remove extra semicolon outside of a function Without this change, compiling with gcc and pedantic generates warning: ISO C does not allow extra ‘;’ outside of a function.
878293f7	2015-06-10T10:44:14	pack: use git_buf when building the index name The way we currently do it depends on the subtlety of strlen vs sizeof and the fact that .pack is one longer than .idx. Let's use a git_buf so we can express the manipulation we want much more clearly.
38c10ecd	2015-05-16T19:00:50	indexer: don't look for the index we're creating When creating an index, know that we do not have an index for our own packfile, preventing some unnecessary file opens and error reporting.
b63b76e0	2014-10-12T11:42:31	Reorder some khash declarations Keep the definitions in the headers, while putting the declarations in the C files. Putting the function definitions in headers causes them to be duplicated if you include two headers with them.
5091aff7	2015-02-20T08:40:40	Merge pull request #2907 from jasonhaslam/git_packfile_unpack_race Fix race in git_packfile_unpack.
8588cb0c	2015-02-14T23:43:26	Fix race in git_packfile_unpack. Increment refcount of newly added cache entries just like existing entries looked up from the cache. Otherwise the new entry can be evicted from the cache and destroyed while it's still in use.
f1453c59	2015-02-12T12:19:37	Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
392702ee	2015-02-09T23:41:13	allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
6f73e026	2014-12-24T11:42:50	Plug some leaks
ec7e680c	2014-11-20T12:07:55	Fix for misleading "missing delta bases" error - Fix #2721.
ea66215d	2014-10-26T10:29:19	Removed some useless variable assignments
e640a77c	2014-09-25T15:29:03	Silence uninitialized warning
5cd81bb3	2014-09-03T01:01:25	Several CppCat warnings fixed
b3d3459f	2014-08-26T15:09:47	pack: return the correct final offset The callers of git_packfile_unpack() expect the obj_offset argument to be set to the beginning of the next object. We were mistakenly returning the the offset of the object's data, which causes the CRC function to try to use the wrong offset. Set obj_offset to curpos instead of elem->offset to point to the next element and bring back expected behaviour.
5e0f47c3	2014-06-25T21:20:39	pack: free the new pack struct if we fail to insert If we fail to insert the packfile in the map, make sure to free it. This makes the free function only attempt to remove its mwindows from the global list if we have opened the packfile to avoid accessing the list unlocked.
b3b66c57	2014-06-18T17:13:12	Share packs across repository instances Opening the same repository multiple times will currently open the same file multiple times, as well as map the same region of the file multiple times. This is not necessary, as the packfile data is immutable. Instead of opening and closing packfiles directly, introduce an indirection and allocate packfiles globally. This does mean locking on each packfile open, but we already use this lock for the global mwindow list so it doesn't introduce a new contention point.
649214be	2014-05-15T19:59:05	pack: init the cache on packfile alloc When running multithreaded, it is not enough to check for the offmap allocation. Move the call to cache_init() to packfile allocation so we can be sure it is always allocated free of races. This fixes #2355.
c968ce2c	2014-05-12T02:01:05	pack: don't forget to cache the base object The base object is a good cache candidate, so we shouldn't forget to add it to the cache.
15bcced2	2014-05-11T05:31:22	pack: use stack allocation for smaller delta chains This avoid allocating the array on the heap for relatively small chains. The expected performance increase is sadly not really noticeable.
a3ffbf23	2014-05-11T03:50:34	pack: expose a cached delta base directly Instead of going through a special entry in the chain, let's pass it as an output parameter.
9dbd150f	2014-05-09T09:36:09	pack: simplify delta chain code The switch makes the loop somewhat unwieldy. Let's assume it's fine and perform the check when we're accessing the data. This makes our code look a lot more like git's.
86d5810b	2014-05-06T16:20:14	pack: remove misleading comment
b2559f47	2014-05-08T17:14:59	pack: preallocate a 64-element chain Dependency chains are often large and require a few reallocations. Allocate a 64-element chain before doing anything else to avoid allocations during the loop. This value comes from the stack-allocated one git uses. We still allocate this on the heap, but it does help performance a little bit.
e6d10c58	2014-05-08T16:24:54	pack: make sure not to leak the dep chain
a332e91c	2014-05-06T23:37:28	pack: use a cache for delta bases when unpacking Bring back the use of the delta base cache for unpacking objects. When generating the delta chain, we stop when we find a delta base in the pack's cache and use that as the starting point.
2acdf4b8	2014-05-06T19:20:33	pack: unpack using a loop We currently make use of recursive function calls to unpack an object, resolving the deltas as we come back down the chain. This means that we have unbounded stack growth as we look up objects in a pack. This is now done in two steps: first we figure out what the dependency chain is by looking up the delta bases until we reach a non-delta object, pushing the information we need onto a stack and then we pop from that stack and apply the deltas until there are no more left. This version of the code does not make use of the delta base cache so it is slower than what's in the mainline. A later commit will reintroduce it.
ae081739	2014-05-06T21:21:04	pack: do not repeat the same error message four times Repeating this error message makes it harder to find out where we actually are finding the error, and they don't really describe what we're trying to do.
8610487c	2014-01-23T23:28:28	Drop parsing pack filename SHA1 part, no one cares the filename
26c1cb91	2013-12-09T09:44:03	One more rename/cleanup for callback err functions
c7b3e1b3	2013-12-06T15:42:20	Some callback error check style cleanups I find this easier to read...

95117d47

2021-10-31T09:45:46

path: separate git-specific path functions from util Introduce `git_fs_path`, which operates on generic filesystem paths. `git_path` will be kept for only git-specific path functionality (for example, checking for `.git` in a path).

f0e693b1

2021-09-07T17:53:49

str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.

50b3c2d5

2021-08-30T08:31:50

pack: don't assert in the lock

9d117e38

2020-02-17T21:28:13

midx: Add a way to write multi-pack-index files This change adds the git_midx_writer_* functions to allow to write and create `multi-pack-index` files from `.idx`/`.pack` files. Part of: #5399

366115e0

2021-08-27T04:06:31

Review feedback

fff209c4

2020-02-17T21:28:13

midx: Add a way to write multi-pack-index files This change adds the git_midx_writer_* functions to allow to write and create `multi-pack-index` files from `.idx`/`.pack` files. Part of: #5399

ff6f6754

2021-01-07T05:44:16

Use `p_pwrite`/`p_pread` consistently throughout the codebase This change stops using the seek+read/write combo to perform I/O with an offset, since this is faster by one system call (and also more atomic and therefore safer).

93f61c5a

2020-12-15T23:03:03

pack: continue zlib while we can make progress Continue the zlib stream as long as we can make progress; stop when we stop getting output _or_ when zlib stops taking input from us.

37763d38

2020-12-05T15:26:59

threads: rename git_atomic to git_atomic32 Clarify the `git_atomic` type and functions now that we have a 64 bit version as well (`git_atomic64`).

322c15ee

2020-08-01T18:24:41

Make the pack and mwindow implementations data-race-free This change fixes a packfile heap corruption that can happen when interacting with multiple packfiles concurrently across multiple threads. This is exacerbated by setting a lower mwindow open file limit. This change: * Renames most of the internal methods in pack.c to clearly indicate that they expect to be called with a certain lock held, making reasoning about the state of locks a bit easier. * Splits the `git_pack_file` lock in two: the one in `git_pack_file` only protects the `index_map`. The protection to `git_mwindow_file` is now in that struct. * Explicitly checks for freshness of the `git_pack_file` in `git_packfile_unpack_header`: this allows the mwindow implementation to close files whenever there is enough cache pressure, and `git_packfile_unpack_header` will reopen the packfile if needed. * After a call to `p_munmap()`, the `data` and `len` fields are poisoned with `NULL` to make use-after-frees more evident and crash rather than being open to the possibility of heap corruption. * Adds a test case to prevent this from regressing in the future. Fixes: #5591

4ae41f9c

2020-08-02T16:26:25

Make the odb race-free This change adds all the necessary locking to the odb to avoid races in the backends. Part of: #5592

f847fa7b

2020-02-16T02:00:56

midx: Support multi-pack-index files in odb_pack.c This change adds support for reading multi-pack-index files from the packfile odb backend. This also makes git_pack_file objects open their backing failes lazily in more scenarios, since the multi-pack-index can avoid having to open them in some cases (yay!). This change also refreshes the documentation found in src/odb_pack.c to match the updated code. Part of: #5399

7cd0bf65

2020-04-05T18:26:52

pack: use GIT_ASSERT

005e7715

2020-02-23T22:28:52

multipack: Introduce a parser for multi-pack-index files This change is the first in a series to add support for git's multi-pack-index. This should speed up large repositories significantly. Part of: #5399

4d4befac

2020-08-05T10:07:23

pack: check pack_window_open return

9bb61bad

2020-08-05T09:42:52

zstream: handle Z_BUF_ERROR appropriately in get_output_chunk Our processing loop in git_zstream_get_output_chunk does not handle `Z_BUF_ERROR` appropriately at the end of a compressed window. From the zlib manual, inflate will return: > Z_BUF_ERROR if no progress was possible or if there was not enough > room in the output buffer when Z_FINISH is used. Note that Z_BUF_ERROR > is not fatal, and inflate() can be called again with more input and > more output space to continue decompressing. In our loop, we were waiting until we got the expected size, then ensuring that we were at `Z_STREAM_END`. We are not guaranteed to be, since zlib may be in the `Z_BUF_ERROR` state where it has consumed a full window's worth of data, but it doesn't know that it's really at the end of the stream. There _could_ be more compressed data, but it doesn't _know_ that there's not until we make a subsequent call. We can change the loop to look for the end of stream instead of our expected size. This allows us to call inflate one last time when we are at the end of a window (and in the `Z_BUF_ERROR` state), allowing it to recognize the end of the stream, and move from the `Z_BUF_ERROR` state to the `Z_STREAM_END` state. If we do this, we need another exit condition: when `bytes == 0`, then no progress could be made and we should stop trying to inflate. This will be an error case, caught by the size and/or end-of-stream test.

4d4c8e0a

2020-04-02T07:34:55

Re-adding the "delta offset is zero" error case

ba59a4a2

2020-04-01T12:34:16

Making get_delta_base() conform to the general error-handling pattern This makes get_delta_base() return the error code as the return value and the delta base as an out-parameter.

f3273725

2020-02-25T20:58:09

pack: Improve error handling for get_delta_base() This change moves the responsibility of setting the error upon failures of get_delta_base() to get_delta_base() instead of its callers. That way, the caller chan always check if the return value is negative and mark the whole operation as an error instead of using garbage values, which can lead to crashes if the .pack files are malformed.

f0f1cd1d

2020-02-07T10:51:17

sha1_lookup: inline its only function into "pack.c" The file "sha1_lookup.c" contains a single function `sha1_position` only which is used only in the packfile implementation. As the function is comparatively small, to enable the compiler to optimize better and to remove symbol visibility, move it into "pack.c".

0edc26c8

2019-12-13T18:54:13

pack: refactor streams to use `git_zstream` While we do have a `git_zstream` abstraction that encapsulates all the calls to zlib as well as its error handling, we do not use it in our pack file code. Refactor it to make the code a lot easier to understand.

d8f6fee3

2019-12-13T14:57:53

pack: refactor unpacking of raw objects to use `git_zstream` While we do have a zstream abstraction that encapsulates all the calls to zlib as well as its error handling, we do not use it in our pack file code. Refactor it to make the code a lot easier to understand.

6460e8ab

2019-06-23T18:13:29

internal: use off64_t instead of git_off_t Prefer `off64_t` internally.

e54343a4

2019-06-29T09:17:32

fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.

b9d0b664

2018-12-17T09:10:53

offmap: introduce high-level setter for key/value pairs Currently, there is only one caller that adds entries into an offset map, and this caller first uses `git_offmap_put` to add a key and then set the value at the returned index by using `git_offmap_set_value_at`. This is just too tighlty coupled with implementation details of the map as it exposes the index of inserted entries, which we really do not care about at all. Introduce a new function `git_offmap_set`, which takes as parameters the map, key and value and directly returns an error code. Convert the caller to make use of it instead.

aa245623

2018-11-30T18:28:05

offmap: introduce high-level getter for values The current way of looking up an entry from a map is tightly coupled with the map implementation, as one first has to look up the index of the key and then retrieve the associated value by using the index. As a caller, you usually do not care about any indices at all, though, so this is more complicated than really necessary. Furthermore, it invites for errors to happen if the correct error checking sequence is not being followed. Introduce a new high-level function `git_offmap_get` that takes a map and a key and returns a pointer to the associated value if such a key exists. Otherwise, a `NULL` pointer is returned. Adjust all callers that can trivially be converted.

9694ef20

2018-12-17T09:01:53

oidmap: introduce high-level getter for values The current way of looking up an entry from a map is tightly coupled with the map implementation, as one first has to look up the index of the key and then retrieve the associated value by using the index. As a caller, you usually do not care about any indices at all, though, so this is more complicated than really necessary. Furthermore, it invites for errors to happen if the correct error checking sequence is not being followed. Introduce a new high-level function `git_oidmap_get` that takes a map and a key and returns a pointer to the associated value if such a key exists. Otherwise, a `NULL` pointer is returned. Adjust all callers that can trivially be converted.

351eeff3

2019-01-23T10:42:46

maps: use uniform lifecycle management functions Currently, the lifecycle functions for maps (allocation, deallocation, resize) are not named in a uniform way and do not have a uniform function signature. Rename the functions to fix that, and stick to libgit2's naming scheme of saying `git_foo_new`. This results in the following new interface for allocation: - `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an error code if we ran out of memory - `void git_<t>map_free(git_<t>map *map)` to free a map - `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map This commit also fixes all existing callers.

004a3398

2019-01-28T18:31:21

Allow bypassing check '.keep' files using libgit2 option 'GIT_OPT_IGNORE_PACK_KEEP_FILE_CHECK'

f673e232

2018-12-27T13:47:34

git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.

cd350852

2019-01-17T10:40:13

object_type: GIT_OBJECT_BAD is now GIT_OBJECT_INVALID We use the term "invalid" to refer to bad or malformed data, eg `GIT_REF_INVALID` and `GIT_EINVALIDSPEC`. Since we're changing the names of the `git_object_t`s in this release, update it to be `GIT_OBJECT_INVALID` instead of `BAD`.

168fe39b

2018-11-28T14:26:57

object_type: use new enumeration names Use the new object_type enumeration names within the codebase.

852bc9f4

2018-11-23T19:26:24

khash: remove intricate knowledge of khash types Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types, simply use `size_t` instead. This decouples code from the khash stuff and makes it possible to move the khash includes into the implementation files.

ecf4f33a

2018-02-08T11:14:48

Convert usage of `git_buf_free` to new `git_buf_dispose`

c8ee5270

2017-12-08T09:05:58

pack: rename `git_packfile_stream_free` The function `git_packfile_stream_free` frees all state of the packfile stream without freeing the structure itself. This naming makes it hard to spot whether it will try to free the pointer itself or not, causing potential future errors. Due to this reason, we have decided to name a function freeing state without freeing the actual struture a "dispose" function. Rename `git_packfile_stream_free` to `git_packfile_stream_dispose` as a first example following this rule.

c3514b0b

2017-12-23T14:59:07

Fix unpack double free If an element has been cached, but then the call to packfile_unpack_compressed() fails, the very next thing that happens is that its data is freed and then the element is not removed from the cache, which frees the data again. This change sets obj->data to NULL to avoid the double-free. It also stops trying to resolve deltas after two continuous failed rounds of resolution, and adds a test for this.

53f2c6b1

2017-12-15T15:01:50

Simplified overflow condition

e7fac2af

2017-12-09T05:26:27

Using unsigned instead

28662c13

2017-12-08T06:00:27

libFuzzer: Prevent a potential shift overflow The type of |base_offset| in get_delta_base() is `git_off_t`, which is a signed `long`. That means that we need to make sure that the 8 most significant bits are zero (instead of 7) to avoid an overflow when it is shifted by 7 bits. Found using libFuzzer.

1560b580

2017-08-15T10:35:47

Merge pull request #4288 from pks-t/pks/include-fixups Include fixups

9842b327

2017-08-09T16:47:14

sha1_lookup: drop sha1_entry_pos function This was pulled over from git.git, and is an experiment in making binary-searching lists of sha1s faster. It was never compiled by default (nor was it used upstream by default without a special environment variable). Unfortunately, it is actually slower in practice, and upstream is planning to drop it in git/git@f1068efefe6dd3beaa89484db5e2db730b094e0b (which has some timing results). It's worth doing the same here for simplicity.

0c7f49dd

2017-06-30T13:39:01

Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.

a693b873

2017-06-07T10:20:44

buffer: use `git_buf_init` with length The `git_buf_init` function has an optional length parameter, which will cause the buffer to be initialized and allocated in one step. This can be used instead of static initialization with `GIT_BUF_INIT` followed by a `git_buf_grow`. This patch does so for two functions where it is applicable.

97eb5ef0

2017-06-07T10:05:54

buffer: rely on `GITERR_OOM` set by `git_buf_try_grow` The function `git_buf_try_grow` consistently calls `giterr_set_oom` whenever growing the buffer fails due to insufficient memory being available. So in fact, we do not have to do this ourselves when a call to any buffer-growing function has failed due to an OOM situation. But we still do so in two functions, which this patch cleans up.

685f2251

2017-02-22T09:29:00

pack: fix looping over cache entries Fixes a regression from #4092. This is a crash on 32-bit and I assume that it doesn't do the right thing on 64-bit either. MSVC emits a warning for this, but of course, it's easy to get lost among all of the similar 'possible loss of data' warnings.

8f5fe903

2017-02-02T11:58:48

offmap: remove GIT__USE_OFFMAP macro

0d716905

2017-01-27T15:23:15

oidmap: remove GIT__USE_OIDMAP macro

85d2748c

2017-01-27T14:05:10

khash: avoid using `kh_key`/`kh_val` as lvalue

f31cb45a

2017-01-25T15:31:12

khash: avoid using `kh_put` directly

a8cd560b

2017-01-25T14:41:17

khash: avoid using `kh_del` directly

cb18386f

2017-01-25T14:26:58

khash: avoid using `kh_val`/`kh_value` directly

a853c527

2017-01-25T14:14:32

khash: avoid using `kh_get` directly

64e46dc3

2017-01-25T14:14:12

khash: avoid using `kh_end` directly

036daa59

2017-01-25T14:11:42

khash: use `git_map_exists` where applicable

9694d9ba

2017-01-25T14:09:17

khash: avoid using `kh_foreach`/`kh_foreach_value` directly

bf339ab0

2017-01-21T14:51:31

indexer: introduce `git_packfile_close` Encapsulation!

909d5494

2016-12-29T12:25:15

giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one

903955f7

2016-12-19T17:26:09

Merge pull request #4027 from pks-t/pks/pack-deref-cache-on-error pack: dereference cached pack entry on error

ff5eea06

2016-12-12T09:36:15

pack: dereference cached pack entry on error When trying to uncompress deltas in a packfile's delta chain, we try to add object bases to the packfile cache, subsequently decrementing its reference count if it has been added successfully. This may lead to a mismatched reference count in the case where we exit the loop early due to an encountered error. Fix the issue by decrementing the reference count in error cleanup.

34b32053

2016-11-25T15:02:34

Fix potential use of uninitialized values

0cf15e39

2016-11-02T12:23:12

pack: fix race in pack_entry_find_offset In `pack_entry_find_offset`, we try to find the offset of a certain object in the pack file. To do so, we first assert if the packfile has already been opened and open it if not. Opening the packfile is guarded with a mutex, so concurrent access to this is in fact safe. What is not thread-safe though is our calculation of offsets inside the packfile. Assume two threads calling `pack_entry_find_offset` at the same time. We first calculate the offset and index location and only then determine if the pack has already been opened. If so, we re-calculate the offset and index address. Now the case for two threads: thread 1 first calculates the addresses and is subsequently suspended. The second thread will now call `pack_index_open` and initialize the pack file, calculating its addresses correctly. When the first thread is resumed now, he'll see that the pack file has already been initialized and will happily proceed with the addresses it has already calculated before the check. As the pack file was not initialized before, these addresses are bogus. Fix the issue by only calculating the addresses after having checked if the pack file is open.

6a2d2f8a

2015-06-17T06:42:20

delta: move delta application to delta.c Move the delta application functions into `delta.c`, next to the similar delta creation functions. Make the `git__delta_apply` functions adhere to other naming and parameter style within the library.

a97b769a

2016-04-27T12:00:31

odb: avoid inflating the full delta to read the header When we read the header, we want to know the size and type of the object. We're currently inflating the full delta in order to read the first few bytes. This can mean hundreds of kB needlessly inflated for large objects. Instead use a packfile stream to read just enough so we can read the two varints in the header and avoid inflating most of the delta.

d53cc13e

2016-03-31T04:12:46

Merge pull request #3575 from pmq20/master-13jan16 Remove duplicated calls to git_mwindow_close

e10144ae

2016-03-04T01:18:30

odb: improved not found error messages When looking up an abbreviated oid, show the actual (abbreviated) oid the caller passed instead of a full (but ambiguously truncated) oid.

6d97beb9

2016-02-25T15:46:59

pack: don't allow a negative offset

ea9e00cb

2016-02-23T18:15:43

pack: make sure we don't go out of bounds for extended entries A corrupt index might have data that tells us to go look past the end of the file for data. Catch these cases and return an appropriate error message.

a53d2e39

2016-02-09T09:58:56

pack: do not free passed in poiter on error The function `git_packfile_stream_open` tries to free the passed in stream when an error occurs. The only call site is `git_indexer_append`, though, which passes in the address of a stream struct which has not been allocated on the heap. Fix the issue by simply removing the call to free. In case of an error we did not allocate any memory yet and otherwise it should be the caller's responsibility to manage it's object's lifetime.

d4e4f272

2016-01-13T11:07:14

Remove duplicated calls to git_mwindow_close

b644e223

2016-01-13T11:02:38

Make packfile_unpack_compressed a private API

c369b379

2015-07-31T16:23:11

Remove extra semicolon outside of a function Without this change, compiling with gcc and pedantic generates warning: ISO C does not allow extra ‘;’ outside of a function.

878293f7

2015-06-10T10:44:14

pack: use git_buf when building the index name The way we currently do it depends on the subtlety of strlen vs sizeof and the fact that .pack is one longer than .idx. Let's use a git_buf so we can express the manipulation we want much more clearly.

38c10ecd

2015-05-16T19:00:50

indexer: don't look for the index we're creating When creating an index, know that we do not have an index for our own packfile, preventing some unnecessary file opens and error reporting.

b63b76e0

2014-10-12T11:42:31

Reorder some khash declarations Keep the definitions in the headers, while putting the declarations in the C files. Putting the function definitions in headers causes them to be duplicated if you include two headers with them.

5091aff7

2015-02-20T08:40:40

Merge pull request #2907 from jasonhaslam/git_packfile_unpack_race Fix race in git_packfile_unpack.

8588cb0c

2015-02-14T23:43:26

Fix race in git_packfile_unpack. Increment refcount of newly added cache entries just like existing entries looked up from the cache. Otherwise the new entry can be evicted from the cache and destroyed while it's still in use.

f1453c59

2015-02-12T12:19:37

Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.

392702ee

2015-02-09T23:41:13

allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.

6f73e026

2014-12-24T11:42:50

Plug some leaks

ec7e680c

2014-11-20T12:07:55

Fix for misleading "missing delta bases" error - Fix #2721.

ea66215d

2014-10-26T10:29:19

Removed some useless variable assignments

e640a77c

2014-09-25T15:29:03

Silence uninitialized warning

5cd81bb3

2014-09-03T01:01:25

Several CppCat warnings fixed

b3d3459f

2014-08-26T15:09:47

pack: return the correct final offset The callers of git_packfile_unpack() expect the obj_offset argument to be set to the beginning of the next object. We were mistakenly returning the the offset of the object's data, which causes the CRC function to try to use the wrong offset. Set obj_offset to curpos instead of elem->offset to point to the next element and bring back expected behaviour.

5e0f47c3

2014-06-25T21:20:39

pack: free the new pack struct if we fail to insert If we fail to insert the packfile in the map, make sure to free it. This makes the free function only attempt to remove its mwindows from the global list if we have opened the packfile to avoid accessing the list unlocked.

b3b66c57

2014-06-18T17:13:12

Share packs across repository instances Opening the same repository multiple times will currently open the same file multiple times, as well as map the same region of the file multiple times. This is not necessary, as the packfile data is immutable. Instead of opening and closing packfiles directly, introduce an indirection and allocate packfiles globally. This does mean locking on each packfile open, but we already use this lock for the global mwindow list so it doesn't introduce a new contention point.

649214be

2014-05-15T19:59:05

pack: init the cache on packfile alloc When running multithreaded, it is not enough to check for the offmap allocation. Move the call to cache_init() to packfile allocation so we can be sure it is always allocated free of races. This fixes #2355.

c968ce2c

2014-05-12T02:01:05

pack: don't forget to cache the base object The base object is a good cache candidate, so we shouldn't forget to add it to the cache.

15bcced2

2014-05-11T05:31:22

pack: use stack allocation for smaller delta chains This avoid allocating the array on the heap for relatively small chains. The expected performance increase is sadly not really noticeable.

a3ffbf23

2014-05-11T03:50:34

pack: expose a cached delta base directly Instead of going through a special entry in the chain, let's pass it as an output parameter.

9dbd150f

2014-05-09T09:36:09

pack: simplify delta chain code The switch makes the loop somewhat unwieldy. Let's assume it's fine and perform the check when we're accessing the data. This makes our code look a lot more like git's.

86d5810b

2014-05-06T16:20:14

pack: remove misleading comment

b2559f47

2014-05-08T17:14:59

pack: preallocate a 64-element chain Dependency chains are often large and require a few reallocations. Allocate a 64-element chain before doing anything else to avoid allocations during the loop. This value comes from the stack-allocated one git uses. We still allocate this on the heap, but it does help performance a little bit.

e6d10c58

2014-05-08T16:24:54

pack: make sure not to leak the dep chain

a332e91c

2014-05-06T23:37:28

pack: use a cache for delta bases when unpacking Bring back the use of the delta base cache for unpacking objects. When generating the delta chain, we stop when we find a delta base in the pack's cache and use that as the starting point.

2acdf4b8

2014-05-06T19:20:33

pack: unpack using a loop We currently make use of recursive function calls to unpack an object, resolving the deltas as we come back down the chain. This means that we have unbounded stack growth as we look up objects in a pack. This is now done in two steps: first we figure out what the dependency chain is by looking up the delta bases until we reach a non-delta object, pushing the information we need onto a stack and then we pop from that stack and apply the deltas until there are no more left. This version of the code does not make use of the delta base cache so it is slower than what's in the mainline. A later commit will reintroduce it.

ae081739

2014-05-06T21:21:04

pack: do not repeat the same error message four times Repeating this error message makes it harder to find out where we actually are finding the error, and they don't really describe what we're trying to do.

8610487c

2014-01-23T23:28:28

Drop parsing pack filename SHA1 part, no one cares the filename

26c1cb91

2013-12-09T09:44:03

One more rename/cleanup for callback err functions

c7b3e1b3

2013-12-06T15:42:20

Some callback error check style cleanups I find this easier to read...

thodg/libgit2/src/pack.c

src/pack.c

Log