kmx git

Commit	Date	Message
c5d41d46	2020-08-03T09:55:22	Merge pull request #5563 from pks-t/pks/worktree-heads Access HEAD via the refdb backends
52ccbc5d	2020-08-03T09:52:30	Merge pull request #5582 from libgit2/pks-config-map-optimization config_entries: Avoid excessive map operations
f2400a9c	2020-07-13T20:56:08	config_entries: Avoid excessive map operations When appending config entries, we currently always first get the currently existing map entry and then afterwards update the map to contain the current config value. In the common scenario where keys aren't being overridden, this is the best we can do. But in case a key gets set multiple times, then we'll also perform these two map operations. In extreme cases, hashing the map keys will thus start to dominate performance. Let's optimize the pattern by using a separately allocated map entry. Currently, we always put the current list entry into the map and update it to get any overridden multivar. As these list entries are also used to iterate config entries, we cannot update them in-place in the map and are thus forced to always set the map to contain the new entry. But with a separately allocated map entry, we can now create one once per config key and insert it into the map. Whenever appending a new config value with the same key, we can now just update the map entry in-place instead of having to replace the map entry completely. This reduces calls to the hashing function by half and trades the improved runtime for one more allocation per unique config key. Given that the refactoring arguably improves code readability by splitting concerns of the `config_entry_list` type and not having to track it in two different structures, this alone would already be reason enough to take the trade. Given a pathological case of a gitconfig with 100.000 repeated keys and a section of length 10.000 characters, this reduces runtime by half from approximately 14 seconds to 7 seconds as expected.
a83fd510	2020-07-12T21:26:59	Merge pull request #5396 from lhchavez/mwindow-file-limit mwindow: set limit on number of open files
92d42eb3	2020-07-12T09:53:10	Minor nits and style formatting
7216b048	2020-06-17T14:23:15	refs: update HEAD references via refdb When renaming a reference, we need to iterate over every HEAD and potentially update it in case it is a symbolic reference pointing to the previous name of the renamed reference. Most importantly, this doesn't only include HEADs from the repo we're renaming the reference in, but we also need to iterate over HEADs from linked worktrees. In order to update the HEADs, we directly read them from the worktree's gitdir and thus assume that both repository and worktrees use the filesystem-based reference backend. But this breaks as soon as one got a repository with a different refdb and breaks our own abstractions. So let's instead update HEAD references via the refdb by first opening each worktree as a repository and then using the usual functions to read and update HEADs. This is a lot less efficient than the current code, but it's not like we can really help this: going via the refdb is mandatory.
2fcb4f28	2020-06-17T14:09:04	repository: introduce new function to iterate over all worktrees Given a Git repository, it's non-trivial to iterate over all worktrees that are associated with it, including the "main" repository. This commit adds a new internal function `git_repository_foreach_worktree` that does this for us.
5434f9a3	2020-06-17T14:57:13	refs: remove function to read HEAD directly With the last user of `git_reference__read_head` gone, let's remove it as it's been reading references without consulting the refdb backends.
65895410	2020-06-17T14:56:36	repository: retrieve worktree HEAD via refdb The function `git_repository_head_for_worktree` currently uses `git_reference__read_head` to directly read a given worktree's HEAD from the filesystem. This is broken in case the repository uses a different refdb implementation than the filesystem-based one, so let's instead open the worktree as a real repository and use `git_reference_lookup`. This also fixes the case where the worktree's HEAD is not a symref, but a detached HEAD, which would have resulted in an error previously.
d1f210fc	2020-06-17T15:09:49	repository: remove function to iterate over HEADs The function `git_repository_foreach_head` is broken, as it directly interacts with the on-disk representation of the reference database, thus assuming that no other refdb is used for the given repository. As this is an internal function only and all users have been replaced, let's remove this function.
ac5fbe31	2020-06-17T14:43:27	branch: determine whether a branch is checked out via refdb We currently determine whether a branch is checked out via `git_repository_foreach_head`. As this function reads references directly from the disk, it breaks our refdb abstraction in case the repository uses a different reference backend implementation than the filesystem-based one. So let's use `git_repository_foreach_worktree` instead -- while it's less efficient, it is at least correct in all corner cases.
26b9e489	2020-07-12T17:04:29	Merge pull request #5570 from libgit2/pks/refdb-refactorings refdb: a set of preliminary refactorings for the reftable backend
34987447	2020-06-30T10:13:26	refdb: avoid unlimited spinning in case of symref cycles To determine whether another reflog entry needs to be written for HEAD on a reference update, we need to see whether HEAD directly or indirectly points to the reference we're updating. The resolve logic is currently completely unbounded except an error occurs, which effectively means that we'd be spinning forever in case we have a symref loop in the repository refdb. Let's fix the issue by using `git_refdb_resolve` instead, which is always bounded.
b895547c	2020-06-30T09:35:21	refs: replace reimplementation of reference resolver The refs code currently has a second implementation that resolves references in order to find any final symbolic reference pointing to a nonexistent target branch. As we've just extended `git_refdb_resolve` to also return such references, let's use that one instead in order to reduce code duplication.
cf7dd05b	2020-06-30T13:26:05	refdb: return resolved symbolic refs pointing to nonexistent refs In some cases, resolving references requires us to also know about the final symbolic reference that's pointing to a nonexistent branch, e.g. in an empty repository where the main branch is yet unborn but HEAD already points to it. Right now, the resolving logic is thus split up into two, where one is the new refdb implementation and the second one is an ad-hoc implementation inside "refs.c". Let's extend `git_refdb_resolve` to also return such final dangling references pointing to nonexistent branches so we can deduplicate the resolving logic.
c54f40e4	2020-06-30T09:28:12	refs: move resolving of references into the refdb Resolving of symbolic references is currently implemented inside the "refs" layer. As a result, it's hard to call this function from low-level parts that only have a refdb available, but no repository, as the "refs" layer always operates on the repository-level. So let's move the function into the generic "refdb" implementation to lift this restriction.
1f39593b	2020-06-30T08:53:59	refdb: extract function to check whether to append HEAD to the reflog The logic to determine whether a reflog entry should be for the HEAD reference is non-trivial. Currently, the only user of this is the filesystem-based refdb, but with the advent of the reftable refdb we're going to add a second user that's interested in having the same behaviour. Let's pull out a new function that checks whether a given reference should cause a entry to be written to the HEAD reflog as a preparatory step.
e02478b1	2020-06-05T08:17:03	refdb: extract function to check whether a reflog should be written The logic to determine whether a reflog should be written is non-trivial. Currently, the only user of this is the filesystem-based refdb, but with the advent of the reftable refdb we're going to add a second user that's interested in having the same behaviour. Let's pull out a new function that checks whether a given reference should cause a reflog to be written as a preparatory step.
4218403e	2020-06-05T10:49:09	cmake: use target-specific compile definitions We set up some compile definitions as part of our src/CMakeLists.txt. While the definitions are global, we really only need them as part of the git2internal target which compiles all the objects. Let's thus use `target_compile_definitions` instead of `add_definitions`.
53911edd	2020-06-05T10:24:30	cmake: use git2internal target to populate sources Modern CMake is usually target-driven in that a target is first defined and then the likes of `target_sources`, `target_include_directories` etc. are used to further populate the target. We still use old-style CMake, where we first set up a set of variables and then populate the target in a single call. Let's migrate to modern CMake usage by starting to populate the sources of our git2internal target piece-by-piece. While this is a small step, it allows us to convert to target-based build instructions piece-by-piece.
19eb1e4b	2020-06-05T10:07:33	cmake: specify project version We currently do not set up a project version within CMake, meaning that it can't be use by other projects including libgit2 as a sub-project and also not by other tools like IDEs. This commit changes this to always set up a project version, but instead of extracting it from the "version.h" header we now set it up directly. This is mostly to avoid mis-use of the previous `LIBGIT2_VERSION` variables, as we should now always use the `libgit2_VERSION` ones that are set up by CMake if one provides the "VERSION" keyword to the `project()` call. While this is one more moving target we need to adjust on releases, this commit also adjusts our release script to verify that the project version was incremented as expected.
325375e3	2020-07-09T23:12:58	Merge pull request #5568 from lhchavez/ubsan Make the tests run cleanly under UndefinedBehaviorSanitizer
2ffa426e	2020-07-09T23:02:05	Merge pull request #5567 from lhchavez/msan Make the tests pass cleanly with MemorySanitizer
dc1deb3b	2020-07-01T15:41:38	Use __GNUC__ macro in the resource script Fix the default LIBGIT2_FILENAME for GNU windres
71000441	2020-06-16T18:58:07	Review: Rename the stringize macro
5c40456b	2020-06-16T13:19:02	Enable building git2.rc resource script with GCC
3a197ea7	2020-06-27T12:33:32	Make the tests pass cleanly with MemorySanitizer This change: * Initializes a few variables that were being read before being initialized. * Includes https://github.com/madler/zlib/pull/393. As such, it only works reliably with `-DUSE_BUNDLED_ZLIB=ON`.
d0656ac8	2020-06-27T12:15:26	Make the tests run cleanly under UndefinedBehaviorSanitizer This change makes the tests run cleanly under `-fsanitize=undefined,nullability` and comprises of: * Avoids some arithmetic with NULL pointers (which UBSan does not like). * Avoids an overflow in a shift, due to an uint8_t being implicitly converted to a signed 32-bit signed integer after being shifted by a 32-bit signed integer. * Avoids a unaligned read in libgit2. * Ignores unaligned reads in the SHA1 library, since it only happens on Intel processors, where it is _still_ undefined behavior, but the semantics are moderately well-understood. Of notable omission is `-fsanitize=integer`, since there are lots of warnings in zlib and the SHA1 library which probably don't make sense to fix and I could not figure out how to silence easily. libgit2 itself also has ~100s of warnings which are mostly innocuous (e.g. use of enum constants that only fit on an `uint32_t`, but there is no way to do that in a simple fashion because the data type chosen for enumerated types is implementation-defined), and investigating whether there are worrying warnings would need reducing the noise significantly.
eab2b044	2020-06-26T16:10:30	Review feedback * Change the default of the file limit to 0 (unlimited). * Changed the heuristic to close files to be the file that contains the least-recently-used window such that the window is the most-recently-used in the file, and the file does not have in-use windows. * Parameterized the filelimit test to check for a limit of 1 and 100 open windows.
9679df57	2020-02-08T20:47:24	mwindow: set limit on number of open files There are some cases in which repositories accrue a large number of packfiles. The existing mwindow limit applies only to the total size of mmap'd files, not on their number. This leads to a situation in which having lots of small packfiles could exhaust the allowed number of open files, particularly on macOS, where the default ulimit is very low (256). This change adds a new configuration parameter (GIT_OPT_SET_MWINDOW_FILE_LIMIT) that sets the maximum number of open packfiles, with a default of 128. This is low enough so that even macOS users should not hit it during normal use. Based on PR #5386, originally written by @josharian. Fixes: #2758
6256d023	2020-06-15T14:34:29	diff_print: adjust code to match current coding style
490d0c9c	2020-06-15T14:26:13	diff_print: return out-of-memory situation when printing binary We currently don't check for out-of-memory situations on exiting `format_binary` and, as a result, may return a partially filled buffer. Fix this by checking the buffer via `git_buf_oom`.
bea5fd9f	2020-06-15T13:26:18	diff_print: do not call abort(3P) Calling abort(3P) in a library is rather rude and shouldn't happen, as we effectively prohibit any corrective actions made by the application linking to it. We thus shouldn't call it at all, but instead use our new `GIT_ASSERT` macros. Remove the call to abort(3P) in case a diff delta has an unexpected type to fix this.
0cf1f444	2020-06-15T13:19:44	diff_print: handle errors when printing to file When printing the diff to a `FILE *` handle, we neither check the return value of fputc(3P) nor the one of fwrite(3P). As a result, we'll silently return successful even if we didn't print anything at all. Futhermore, the arguments to fwrite(3P) are reversed: we have one item of length `content_len`, and not `content_len` items of one byte. Fix both issues by checking return values as well as reversing the arguments to fwrite(3P).
74520b91	2020-06-13T19:38:11	Merge pull request #5552 from libgit2/pks/small-fixes Random code cleanups and fixes
03c4f86c	2020-06-08T12:42:59	cmake: enable warnings for missing function declarations Over time, we have accumulated quite a lot of functions with missing prototypes, missing `static` keywords or which were completely unused. It's easy to miss these mistakes, but luckily GCC and Clang both have the `-Wmissing-declarations` warning. Enabling this will cause them to emit warnings for every not-static function that doesn't have a previous declaration. This is a very sane thing to enable, and with the preceding commits all these new warnings have been fixed. So let's always enable this warning so we won't introduce new instances of them.
fd1f0940	2020-06-08T12:42:26	refs: add missing function declaration The function `git_reference__is_note` is not declared anywhere. Let's add the declaration to avoid having non-static functions without declaration.
c6184f0c	2020-06-08T21:07:36	tree-wide: do not compile deprecated functions with hard deprecation When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h" header to be empty. As a result, no function declarations are made available to callers, but the implementations are still available to link against. This has the problem that function declarations also aren't visible to the implementations, meaning that the symbol's visibility will not be set up correctly. As a result, the resulting library may not expose those deprecated symbols at all on some platforms and thus cause linking errors. Fix the issue by conditionally compiling deprecated functions, only. While it becomes impossible to link against such a library in case one uses deprecated functions, distributors of libgit2 aren't expected to pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real" hard deprecation still makes sense in the context of CI to test we don't use deprecated symbols ourselves and in case a dependant uses libgit2 in a vendored way and knows it won't ever use any of the deprecated symbols anyway.
6e1efcd6	2020-06-08T12:46:04	tree-wide: add missing header includes We're missing some header includes leading to missing function prototypes. While we currently don't warn about these, we should have their respective headers included in order to detect the case where a function signature change results in an incompatibility.
a6c9e0b3	2020-06-08T12:40:47	tree-wide: mark local functions as static We've accumulated quite some functions which are never used outside of their respective code unit, but which are lacking the `static` keyword. Add it to reduce their linkage scope and allow the compiler to optimize better.
7c499b54	2020-06-08T12:39:09	tree-wide: remove unused functions We have some functions which aren't used anywhere. Let's remove them to get rid of unneeded baggage.
46637b5e	2020-06-08T14:47:01	checkout: remove unused code for deferred removals With commit 05f690122 (checkout: remove blocking dir when FORCEd, 2015-03-31), the last case was removde that actually queued a deferred removal. This is now more than five years in the past and nobody complained, so we can rest quite assured that the deferred removal is not really needed at all. Let's remove all related code to simplify the already complicated checkout logic.
45901d3e	2020-06-08T12:57:16	revparse: remove superfluous tab character
c146374c	2020-06-08T12:54:26	revparse: detect out-of-memory cases when parsing curly brace contents When extracting curly braces (e.g. the "upstream" part in "HEAD@{upstream}"), we put the curly braces' contents into a `git_buf` structure, but don't check the return value of `git_buf_putc`. So when we run out-of-memory, we'll use a partially filled buffer without noticing. Let's fix this issue by checking `git_buf_putc`'s return value.
53a8f463	2020-06-03T07:40:59	Merge pull request #5536 from libgit2/ethomson/http httpclient: support googlesource
6de8aa7f	2020-06-02T12:21:22	Merge pull request #5532 from joshtriplett/pack-default-path git_packbuilder_write: Allow setting path to NULL to use the default path
22f9a0fc	2020-06-02T12:12:41	Merge pull request #5531 from joshtriplett/mempack-threads mempack: Use threads when building the pack
04c7bdb4	2020-06-01T22:44:14	httpclient: clear the read_buf on new requests The httpclient implementation keeps a `read_buf` that holds the data in the body of the response after the headers have been written. We store that data for subsequent calls to `git_http_client_read_body`. If we want to stop reading body data and send another request, we need to clear that cached data. Clear the cached body data on new requests, just like we read any outstanding data from the socket.
aa8b2c0f	2020-06-01T23:53:55	httpclient: don't read more than the client wants When `git_http_client_read_body` is invoked, it provides the size of the buffer that can be read into. This will be set as the parser context's `output_size` member. Use this as an upper limit on our reads, and ensure that we do not read more than the client requests.
51eff5a5	2020-05-29T13:13:19	strarray: we should `dispose` instead of `free` We _dispose_ the contents of objects; we _free_ objects (and their contents). Update `git_strarray_free` to be `git_strarray_dispose`. `git_strarray_free` remains as a deprecated proxy function.
a9746b30	2020-05-29T11:21:55	strarray: move to its own file
570f0340	2020-06-01T19:10:38	httpclient: read_body should return 0 at EOF When users call `git_http_client_read_body`, it should return 0 at the end of a message. When the `on_message_complete` callback is called, this will set `client->state` to `DONE`. In our read loop, we look for this condition and exit. Without this, when there is no data left except the end of message chunk (`0\r\n`) in the http stream, we would block by reading the three bytes off the stream but not making progress in any `on_body` callbacks. Listening to the `on_message_complete` callback allows us to stop trying to read from the socket when we've read the end of message chunk.
17641f1f	2020-06-01T15:05:51	Merge pull request #5526 from libgit2/ethomson/poolinit git_pool_init: allow the function to fail
0f35efeb	2020-05-23T10:15:51	git_pool_init: handle failure cases Propagate failures caused by pool initialization errors.
1bbdf15d	2020-06-01T13:57:12	Merge pull request #5527 from libgit2/ethomson/config_unreadable Handle unreadable configuration files
d1409f48	2020-05-06T19:57:07	config: ignore unreadable configuration files Modified `config_file_open()` so it returns 0 if the config file is not readable, which happens on global config files under macOS sandboxing (note that for some reason `access(F_OK)` DOES work with sandboxing, but it is lying). Without this read check sandboxed applications on macOS can not open any repository, because `config_file_read()` will return GIT_ERROR when it cannot read the global /Users/username/.gitconfig file, and the upper layers will just completely abort on GIT_ERROR when attempting to load the global config file, so no repositories can be opened.
8c96d56d	2020-05-26T04:53:09	index: write v4: bugfix: prefix path with strip_len, not same_len According to index-format.txt of git, the path of an entry is prefixed with N, where N indicates the length of bytes to be stripped.
5278a006	2020-05-23T16:07:54	git_packbuilder_write: Allow setting path to NULL to use the default path If given a NULL path, write to the object path of the repository. Add tests for the new behavior.
0bc091dd	2020-05-23T15:35:38	git_packbuilder_write: Unify cleanup path Clean up and return via a single label, to avoid duplicate error handling before each return, and to make it easier to extend the set of cleanups needed.
30285a3c	2020-05-23T15:04:19	mempack: Use threads when building the pack The mempack ODB backend creates a packbuilder internally to write out a pack; call git_packbuilder_set_threads on that packbuilder, to use threads for packing if available.
27cb4e0e	2020-05-23T11:02:07	Merge pull request #5522 from pks-t/pks/openssl-cert-memleak OpenSSL certificate memory leak
abfdb8a6	2020-05-23T10:15:37	git_pool_init: return an int Let `git_pool_init` return an int so that it could fail.
e4bdba56	2020-05-23T09:57:22	Merge pull request #5515 from pks-t/pks/flaky-checkout-test tests: checkout: fix flaky test due to mtime race
3b7b4d27	2020-05-23T09:40:55	Merge pull request #5523 from libgit2/pks/cmake-sort-reproducible-builds cmake: Sort source files for reproducible builds
3f201f75	2020-05-16T13:48:04	checkout: fix file being treated as unmodified due to racy index When trying to determine whether a file changed, we try to avoid heavy operations by fist taking a look at the index, seeing whether the index entry is modified already. This doesn't seem to cut it, though, as we currently have the racy checkout::index::can_disable_pathspec_match test case: sometimes the files get restored to their original contents, sometimes they aren't. The issue is caused by a racy index [1]: in case we modify a file, add it to the index and then modify it again in-place without changing its file, then we may end up with a modified file that has the same stat(3P) info as we've currently got it in its corresponding index entry. The mitigation for this is to treat files with the same mtime as the index are treated as racily modified. We already have this logic in place for the index, but not when doing a checkout. Fix the issue by only consulting the index entry in case it has an older mtime as the index. Previously, the following script reliably had at least 20 failures, while now there is no failure to be observed anymore: ```bash j=0 for i in $(seq 100) do if ! ./libgit2_clar -scheckout::index::can_disable_pathspec_match >/dev/null then j=$(($j + 1)) fi done echo "Failures: $j" ``` [1]: https://git-scm.com/docs/racy-git
b85eefb4	2020-05-15T19:52:40	cmake: Sort source files for reproducible builds We currently use `FILE(GLOB ...)` in most places to find source and header files. This is problematic in that the order of files returned depends on the operating system's directory iteration order and may thus not be deterministic. As a result, we link object files in unspecified order, which may cause the linker to emit different code across runs. Fix this issue by sorting all code used as input to the libgit2 library to improve the reliability of reproducible builds.
b43a9e66	2020-05-15T17:46:24	streams: openssl: fix memleak due to us not free'ing certs When creating a `git_cert` from the OpenSSL X509 certificate of a given stream, we do not call `X509_free()` on the certificate, leading to a memory leak as soon as the certificate is requested e.g. by the certificate check callback. Fix the issue by properly calling `X509_free()`.
a2eca682	2020-05-12T21:35:07	futils: fix order of declared parameters for `git_futils_fake_symlink` While the function `git_futils_fake_symlink` is declared with arguments `new, old`, the implementation uses the reverse order `old, new`. Let's fix the ordering issues to be `new, old` for both, which matches what symlink(3P) has. While at it, we also rename these parameters: `old` and `new` doesn't really make a lot of sense in the context of symlinks, which is why this commit renames them to be called `target` and `path`.
cbae1c21	2020-04-01T22:12:07	assert: allow non-int returning functions to assert Include GIT_ASSERT_WITH_RETVAL and GIT_ASSERT_ARG_WITH_RETVAL so that functions that do not return int (or more precisely, where `-1` would not be an error code) can assert. This allows functions that return, eg, NULL on an error code to do that by passing the return value (in this example, `NULL`) as a second parameter to the GIT_ASSERT_WITH_RETVAL functions.
a95096ba	2020-01-12T10:31:07	assert: optionally fall-back to assert(3) Fall back to the system assert(3) in debug builds, which may aide in debugging. "Safe" assertions can be enabled in debug builds by setting GIT_ASSERT_HARD=0. Similarly, hard assertions can be enabled in release builds by setting GIT_ASSERT_HARD to nonzero.
abe2efe1	2019-12-09T12:37:34	Introduce GIT_ASSERT macros Provide macros to replace usages of `assert`. A true `assert` is punishing as a library. Instead we should do our best to not crash. GIT_ASSERT_ARG(x) will now assert that the given argument complies to some format and sets an error message and returns `-1` if it does not. GIT_ASSERT(x) is for internal usage, and available as an internal consistency check. It will set an error message and return `-1` in the event of failure.
56c95cf6	2020-05-10T21:43:38	Fix uninitialized stack memory and NULL ptr dereference in stash_to_index Caught by static analysis.
d62e44cb	2019-06-03T18:35:08	checkout: Fix removing untracked files by path in subdirectories The checkout code didn't iterate into a subdir if it didn't match the pathspec, but since the pathspec might match files in the subdir we should recurse into it (In contrast to gitignore handling). Fixes #5089
63de2128	2020-02-02T20:20:19	checkout: filter pathspecs for _all_ checkout types We were previously applying the pathspec filter for the baseline iterator during checkout, as well as the target tree. This was an oversight; in fact, we should apply the pathspec filter to _all_ checkout targets, not just trees. Add a helper function to set the iterator pathspecs from the given checkout pathspecs, and call it everywhere.
898caead	2020-05-10T19:03:10	Merge pull request #5431 from libgit2/ethomson/hexdump git__hexdump: better mimic `hexdump -C`
9830ab3d	2020-01-29T02:00:04	blame: add option to ignore whitespace changes
e9b0cfc0	2020-04-05T13:24:13	Merge pull request #5485 from libgit2/ethomson/sysdir_unused sysdir: remove unused git_sysdir_get_str
b6f18db9	2020-04-05T11:16:29	sysdir: remove unused git_sysdir_get_str
ce2ab78f	2020-04-04T16:35:33	Fix typo causing removal of symbol 'git_worktree_prune_init_options' Commit 0b5ba0d replaced this function with an "option_init" equivallent, but misspelled the replacement function. As a result, this symbol has been missing from libgit2.so ever since.
ad341eb7	2020-04-04T13:40:14	Merge pull request #5425 from lhchavez/fix-get-delta-base pack: Improve error handling for get_delta_base()
966db47d	2020-04-04T13:21:02	Merge pull request #5477 from pks-t/pks/rename-detection-negative-caches merge: cache negative cache results for similarity metrics
4d4c8e0a	2020-04-02T07:34:55	Re-adding the "delta offset is zero" error case
dfd7fcc4	2020-04-02T13:26:13	Merge pull request #5388 from bk2204/repo-format-v1 Handle repository format v1
b8eec0b2	2020-04-01T22:22:38	Merge pull request #5461 from pks-t/pks/refdb-fs-unused-header refdb_fs: remove unused header file
5d37128d	2020-03-01T10:34:15	git__hexdump: better mimic `hexdump -C`
ba59a4a2	2020-04-01T12:34:16	Making get_delta_base() conform to the general error-handling pattern This makes get_delta_base() return the error code as the return value and the delta base as an out-parameter.
f3273725	2020-02-25T20:58:09	pack: Improve error handling for get_delta_base() This change moves the responsibility of setting the error upon failures of get_delta_base() to get_delta_base() instead of its callers. That way, the caller chan always check if the return value is negative and mark the whole operation as an error instead of using garbage values, which can lead to crashes if the .pack files are malformed.
4dfcc50f	2020-04-01T15:16:18	merge: cache negative cache results for similarity metrics When computing renames, we cache the hash signatures for each of the potentially conflicting entries so that we do not need to repeatedly read the file and can at least halfway efficiently determine whether two files are similar enough to be deemed a rename. In order to make the hash signatures meaningful, we require at least four lines of data to be present, resulting in at least four different hashes that can be compared. Files that are deemed too small are not cached at all and will thus be repeatedly re-hashed, which is usually not a huge issue. The issue with above heuristic is in case a file does _not_ have at least four lines, where a line is anything separated by a consecutive run of "\n" or "\0" characters. For example "a\nb" is two lines, but "a\0\0b" is also just two lines. Taken to the extreme, a file that has megabytes of consecutive space- or NUL-only may also be deemed as too small and thus not get cached. As a result, we will repeatedly load its blob, calculate its hash signature just to finally throw it away as we notice it's not of any value. When you've got a comparitively big file that you compare against a big set of potentially renamed files, then the cost simply expodes. The issue can be trivially fixed by introducing negative cache entries. Whenever we determine that a given blob does not have a meaningful representation via a hash signature, we store this negative cache marker and will from then on not hash it again, but also ignore it as a potential rename target. This should help the "normal" case already where you have a lot of small files as rename candidates, but in the above scenario it's savings are extraordinarily high. To verify we do not hit the issue anymore with described solution, this commit adds a test that uses the exact same setup described above with one 50 megabyte blob of '\0' characters and 1000 other files that get renamed. Without the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 11m48.377s user 11m11.576s sys 0m35.187s And with the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 0m1.972s user 0m1.851s sys 0m0.118s So this represents a ~350-fold performance improvement, but it obviously depends on how many files you have and how big the blob is. The test number were chosen in a way that one will immediately notice as soon as the bug resurfaces.
5f47cb48	2020-03-26T14:16:41	patch: correctly handle mode changes for renames When generating a patch for a renamed file whose mode bits have changed in addition to the rename, then we currently fail to parse the generated patch. Furthermore, when generating a diff we output mode bits after the similarity metric, which is different to how upstream git handles it. Fix both issues by adding another state transition that allows similarity indices after mode changes and by printing mode changes before the similarity index.
bba9599a	2020-03-26T11:56:10	Merge pull request #5445 from lhchavez/fix-5443 Fix segfault when calling git_blame_buffer()
e7a1fd88	2020-03-26T11:42:47	Fix spelling error Signed-off-by: Utkarsh Gupta <utkarsh@debian.org>
74e0489a	2020-03-24T19:42:10	refdb_fs: remove unused header file The "refdb_fs.h" header contains a single struct `git_refcache` that is not used anywhere. As a result, we can just delete the header altogether as it doesn't have any purpose and may confuse readers.
62d59467	2020-03-08T02:13:11	Fix segfault when calling git_blame_buffer() This change makes sure that the hunk is not null before trying to dereference it. This avoids segfaults, especially when blaming against a modified buffer (i.e. the index). Fixes: #5443
a2d3316a	2020-03-13T23:01:11	refdb_fs: initialize backend version While the `git_refdb_backend()` struct has a version, we do not initialize it correctly when calling `git_refdb_backend_fs()`. Fix this by adding the call to `git_refdb_init_backend()`.
9a102446	2020-03-21T16:49:44	Merge pull request #5455 from pks-t/pks/cmake-install-dirs cmake: use install directories provided via GNUInstallDirs
87fc539f	2020-03-13T22:08:19	cmake: use install directories provided via GNUInstallDirs We currently hand-code logic to configure where to install our artifacts via the `LIB_INSTALL_DIR`, `INCLUDE_INSTALL_DIR` and `BIN_INSTALL_DIR` variables. This is reinventing the wheel, as CMake already provide a way to do that via `CMAKE_INSTALL_<DIR>` paths, e.g. `CMAKE_INSTALL_LIB`. This requires users of libgit2 to know about the discrepancy and will require special hacks for any build systems that handle these variables in an automated way. One such example is Gentoo Linux, which sets up these paths in both the cmake and cmake-utils eclass. So let's stop doing that: the GNUInstallDirs module handles it in a better way for us, especially so as the actual values are dependent on CMAKE_INSTALL_PREFIX. This commit removes our own set of variables and instead refers users to use the standard ones. As a second benefit, this commit also fixes our pkgconfig generation to use the GNUInstallDirs module. We had a bug there where we ignored the CMAKE_INSTALL_PREFIX when configuring the libdir and includedir keys, so if libdir was set to "lib64", then libdir would be an invalid path. With GNUInstallDirs, we can now use `CMAKE_INSTALL_FULL_LIBDIR`, which handles the prefix for us.
b1f6481f	2020-03-10T22:07:35	cmake: ignore deprecation notes for Secure Transport The Secure Transport interface we're currently using has been deprecated with macOS 10.15. As we're currently in code freeze, we cannot migrate to newer interfaces. As such, let's disable deprecation warnings for our "schannel.c" stream.
43d7a42b	2020-03-08T18:14:09	win32: don't canonicalize symlink targets Don't canonicalize symlink targets; our win32 path canonicalization routines expect an absolute path. In particular, using the path canonicalization routines for symlink targets (introduced in commit 7d55bee6d, "win32: fix relative symlinks pointing into dirs", 2020-01-10). Now, use the utf8 -> utf16 relative path handling functions, so that paths like "../foo" will be translated to "..\foo".
f2b114ba	2020-03-08T18:11:45	win32: introduce relative path handling function Add a function that takes a (possibly) relative UTF-8 path and emits a UTF-16 path with forward slashes translated to backslashes. If the given path is, in fact, absolute, it will be translated to absolute path handling rules.
fb7da154	2020-03-08T16:34:23	win32: clarify usage of path canonicalization funcs The path canonicalization functions on win32 are intended to canonicalize absolute paths; those with prefixes. In other words, things start with drive letters (`C:\`), share names (`\\server\share`), or other prefixes (`\\?\`). This function removes leading `..` that occur after the prefix but before the directory/file portion (eg, turning `C:\..\..\..\foo` into `C:\foo`). This translation is not appropriate for local paths.

c5d41d46

2020-08-03T09:55:22

Merge pull request #5563 from pks-t/pks/worktree-heads Access HEAD via the refdb backends

52ccbc5d

2020-08-03T09:52:30

Merge pull request #5582 from libgit2/pks-config-map-optimization config_entries: Avoid excessive map operations

f2400a9c

2020-07-13T20:56:08

config_entries: Avoid excessive map operations When appending config entries, we currently always first get the currently existing map entry and then afterwards update the map to contain the current config value. In the common scenario where keys aren't being overridden, this is the best we can do. But in case a key gets set multiple times, then we'll also perform these two map operations. In extreme cases, hashing the map keys will thus start to dominate performance. Let's optimize the pattern by using a separately allocated map entry. Currently, we always put the current list entry into the map and update it to get any overridden multivar. As these list entries are also used to iterate config entries, we cannot update them in-place in the map and are thus forced to always set the map to contain the new entry. But with a separately allocated map entry, we can now create one once per config key and insert it into the map. Whenever appending a new config value with the same key, we can now just update the map entry in-place instead of having to replace the map entry completely. This reduces calls to the hashing function by half and trades the improved runtime for one more allocation per unique config key. Given that the refactoring arguably improves code readability by splitting concerns of the `config_entry_list` type and not having to track it in two different structures, this alone would already be reason enough to take the trade. Given a pathological case of a gitconfig with 100.000 repeated keys and a section of length 10.000 characters, this reduces runtime by half from approximately 14 seconds to 7 seconds as expected.

a83fd510

2020-07-12T21:26:59

Merge pull request #5396 from lhchavez/mwindow-file-limit mwindow: set limit on number of open files

92d42eb3

2020-07-12T09:53:10

Minor nits and style formatting

7216b048

2020-06-17T14:23:15

refs: update HEAD references via refdb When renaming a reference, we need to iterate over every HEAD and potentially update it in case it is a symbolic reference pointing to the previous name of the renamed reference. Most importantly, this doesn't only include HEADs from the repo we're renaming the reference in, but we also need to iterate over HEADs from linked worktrees. In order to update the HEADs, we directly read them from the worktree's gitdir and thus assume that both repository and worktrees use the filesystem-based reference backend. But this breaks as soon as one got a repository with a different refdb and breaks our own abstractions. So let's instead update HEAD references via the refdb by first opening each worktree as a repository and then using the usual functions to read and update HEADs. This is a lot less efficient than the current code, but it's not like we can really help this: going via the refdb is mandatory.

2fcb4f28

2020-06-17T14:09:04

repository: introduce new function to iterate over all worktrees Given a Git repository, it's non-trivial to iterate over all worktrees that are associated with it, including the "main" repository. This commit adds a new internal function `git_repository_foreach_worktree` that does this for us.

5434f9a3

2020-06-17T14:57:13

refs: remove function to read HEAD directly With the last user of `git_reference__read_head` gone, let's remove it as it's been reading references without consulting the refdb backends.

65895410

2020-06-17T14:56:36

repository: retrieve worktree HEAD via refdb The function `git_repository_head_for_worktree` currently uses `git_reference__read_head` to directly read a given worktree's HEAD from the filesystem. This is broken in case the repository uses a different refdb implementation than the filesystem-based one, so let's instead open the worktree as a real repository and use `git_reference_lookup`. This also fixes the case where the worktree's HEAD is not a symref, but a detached HEAD, which would have resulted in an error previously.

d1f210fc

2020-06-17T15:09:49

repository: remove function to iterate over HEADs The function `git_repository_foreach_head` is broken, as it directly interacts with the on-disk representation of the reference database, thus assuming that no other refdb is used for the given repository. As this is an internal function only and all users have been replaced, let's remove this function.

ac5fbe31

2020-06-17T14:43:27

branch: determine whether a branch is checked out via refdb We currently determine whether a branch is checked out via `git_repository_foreach_head`. As this function reads references directly from the disk, it breaks our refdb abstraction in case the repository uses a different reference backend implementation than the filesystem-based one. So let's use `git_repository_foreach_worktree` instead -- while it's less efficient, it is at least correct in all corner cases.

26b9e489

2020-07-12T17:04:29

Merge pull request #5570 from libgit2/pks/refdb-refactorings refdb: a set of preliminary refactorings for the reftable backend

34987447

2020-06-30T10:13:26

refdb: avoid unlimited spinning in case of symref cycles To determine whether another reflog entry needs to be written for HEAD on a reference update, we need to see whether HEAD directly or indirectly points to the reference we're updating. The resolve logic is currently completely unbounded except an error occurs, which effectively means that we'd be spinning forever in case we have a symref loop in the repository refdb. Let's fix the issue by using `git_refdb_resolve` instead, which is always bounded.

b895547c

2020-06-30T09:35:21

refs: replace reimplementation of reference resolver The refs code currently has a second implementation that resolves references in order to find any final symbolic reference pointing to a nonexistent target branch. As we've just extended `git_refdb_resolve` to also return such references, let's use that one instead in order to reduce code duplication.

cf7dd05b

2020-06-30T13:26:05

refdb: return resolved symbolic refs pointing to nonexistent refs In some cases, resolving references requires us to also know about the final symbolic reference that's pointing to a nonexistent branch, e.g. in an empty repository where the main branch is yet unborn but HEAD already points to it. Right now, the resolving logic is thus split up into two, where one is the new refdb implementation and the second one is an ad-hoc implementation inside "refs.c". Let's extend `git_refdb_resolve` to also return such final dangling references pointing to nonexistent branches so we can deduplicate the resolving logic.

c54f40e4

2020-06-30T09:28:12

refs: move resolving of references into the refdb Resolving of symbolic references is currently implemented inside the "refs" layer. As a result, it's hard to call this function from low-level parts that only have a refdb available, but no repository, as the "refs" layer always operates on the repository-level. So let's move the function into the generic "refdb" implementation to lift this restriction.

1f39593b

2020-06-30T08:53:59

refdb: extract function to check whether to append HEAD to the reflog The logic to determine whether a reflog entry should be for the HEAD reference is non-trivial. Currently, the only user of this is the filesystem-based refdb, but with the advent of the reftable refdb we're going to add a second user that's interested in having the same behaviour. Let's pull out a new function that checks whether a given reference should cause a entry to be written to the HEAD reflog as a preparatory step.

e02478b1

2020-06-05T08:17:03

refdb: extract function to check whether a reflog should be written The logic to determine whether a reflog should be written is non-trivial. Currently, the only user of this is the filesystem-based refdb, but with the advent of the reftable refdb we're going to add a second user that's interested in having the same behaviour. Let's pull out a new function that checks whether a given reference should cause a reflog to be written as a preparatory step.

4218403e

2020-06-05T10:49:09

cmake: use target-specific compile definitions We set up some compile definitions as part of our src/CMakeLists.txt. While the definitions are global, we really only need them as part of the git2internal target which compiles all the objects. Let's thus use `target_compile_definitions` instead of `add_definitions`.

53911edd

2020-06-05T10:24:30

cmake: use git2internal target to populate sources Modern CMake is usually target-driven in that a target is first defined and then the likes of `target_sources`, `target_include_directories` etc. are used to further populate the target. We still use old-style CMake, where we first set up a set of variables and then populate the target in a single call. Let's migrate to modern CMake usage by starting to populate the sources of our git2internal target piece-by-piece. While this is a small step, it allows us to convert to target-based build instructions piece-by-piece.

19eb1e4b

2020-06-05T10:07:33

cmake: specify project version We currently do not set up a project version within CMake, meaning that it can't be use by other projects including libgit2 as a sub-project and also not by other tools like IDEs. This commit changes this to always set up a project version, but instead of extracting it from the "version.h" header we now set it up directly. This is mostly to avoid mis-use of the previous `LIBGIT2_VERSION` variables, as we should now always use the `libgit2_VERSION` ones that are set up by CMake if one provides the "VERSION" keyword to the `project()` call. While this is one more moving target we need to adjust on releases, this commit also adjusts our release script to verify that the project version was incremented as expected.

325375e3

2020-07-09T23:12:58

Merge pull request #5568 from lhchavez/ubsan Make the tests run cleanly under UndefinedBehaviorSanitizer

2ffa426e

2020-07-09T23:02:05

Merge pull request #5567 from lhchavez/msan Make the tests pass cleanly with MemorySanitizer

dc1deb3b

2020-07-01T15:41:38

Use __GNUC__ macro in the resource script Fix the default LIBGIT2_FILENAME for GNU windres

71000441

2020-06-16T18:58:07

Review: Rename the stringize macro

5c40456b

2020-06-16T13:19:02

Enable building git2.rc resource script with GCC

3a197ea7

2020-06-27T12:33:32

Make the tests pass cleanly with MemorySanitizer This change: * Initializes a few variables that were being read before being initialized. * Includes https://github.com/madler/zlib/pull/393. As such, it only works reliably with `-DUSE_BUNDLED_ZLIB=ON`.

d0656ac8

2020-06-27T12:15:26

Make the tests run cleanly under UndefinedBehaviorSanitizer This change makes the tests run cleanly under `-fsanitize=undefined,nullability` and comprises of: * Avoids some arithmetic with NULL pointers (which UBSan does not like). * Avoids an overflow in a shift, due to an uint8_t being implicitly converted to a signed 32-bit signed integer after being shifted by a 32-bit signed integer. * Avoids a unaligned read in libgit2. * Ignores unaligned reads in the SHA1 library, since it only happens on Intel processors, where it is _still_ undefined behavior, but the semantics are moderately well-understood. Of notable omission is `-fsanitize=integer`, since there are lots of warnings in zlib and the SHA1 library which probably don't make sense to fix and I could not figure out how to silence easily. libgit2 itself also has ~100s of warnings which are mostly innocuous (e.g. use of enum constants that only fit on an `uint32_t`, but there is no way to do that in a simple fashion because the data type chosen for enumerated types is implementation-defined), and investigating whether there are worrying warnings would need reducing the noise significantly.

eab2b044

2020-06-26T16:10:30

Review feedback * Change the default of the file limit to 0 (unlimited). * Changed the heuristic to close files to be the file that contains the least-recently-used window such that the window is the most-recently-used in the file, and the file does not have in-use windows. * Parameterized the filelimit test to check for a limit of 1 and 100 open windows.

9679df57

2020-02-08T20:47:24

mwindow: set limit on number of open files There are some cases in which repositories accrue a large number of packfiles. The existing mwindow limit applies only to the total size of mmap'd files, not on their number. This leads to a situation in which having lots of small packfiles could exhaust the allowed number of open files, particularly on macOS, where the default ulimit is very low (256). This change adds a new configuration parameter (GIT_OPT_SET_MWINDOW_FILE_LIMIT) that sets the maximum number of open packfiles, with a default of 128. This is low enough so that even macOS users should not hit it during normal use. Based on PR #5386, originally written by @josharian. Fixes: #2758

6256d023

2020-06-15T14:34:29

diff_print: adjust code to match current coding style

490d0c9c

2020-06-15T14:26:13

diff_print: return out-of-memory situation when printing binary We currently don't check for out-of-memory situations on exiting `format_binary` and, as a result, may return a partially filled buffer. Fix this by checking the buffer via `git_buf_oom`.

bea5fd9f

2020-06-15T13:26:18

diff_print: do not call abort(3P) Calling abort(3P) in a library is rather rude and shouldn't happen, as we effectively prohibit any corrective actions made by the application linking to it. We thus shouldn't call it at all, but instead use our new `GIT_ASSERT` macros. Remove the call to abort(3P) in case a diff delta has an unexpected type to fix this.

0cf1f444

2020-06-15T13:19:44

diff_print: handle errors when printing to file When printing the diff to a `FILE *` handle, we neither check the return value of fputc(3P) nor the one of fwrite(3P). As a result, we'll silently return successful even if we didn't print anything at all. Futhermore, the arguments to fwrite(3P) are reversed: we have one item of length `content_len`, and not `content_len` items of one byte. Fix both issues by checking return values as well as reversing the arguments to fwrite(3P).

74520b91

2020-06-13T19:38:11

Merge pull request #5552 from libgit2/pks/small-fixes Random code cleanups and fixes

03c4f86c

2020-06-08T12:42:59

cmake: enable warnings for missing function declarations Over time, we have accumulated quite a lot of functions with missing prototypes, missing `static` keywords or which were completely unused. It's easy to miss these mistakes, but luckily GCC and Clang both have the `-Wmissing-declarations` warning. Enabling this will cause them to emit warnings for every not-static function that doesn't have a previous declaration. This is a very sane thing to enable, and with the preceding commits all these new warnings have been fixed. So let's always enable this warning so we won't introduce new instances of them.

fd1f0940

2020-06-08T12:42:26

refs: add missing function declaration The function `git_reference__is_note` is not declared anywhere. Let's add the declaration to avoid having non-static functions without declaration.

c6184f0c

2020-06-08T21:07:36

tree-wide: do not compile deprecated functions with hard deprecation When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h" header to be empty. As a result, no function declarations are made available to callers, but the implementations are still available to link against. This has the problem that function declarations also aren't visible to the implementations, meaning that the symbol's visibility will not be set up correctly. As a result, the resulting library may not expose those deprecated symbols at all on some platforms and thus cause linking errors. Fix the issue by conditionally compiling deprecated functions, only. While it becomes impossible to link against such a library in case one uses deprecated functions, distributors of libgit2 aren't expected to pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real" hard deprecation still makes sense in the context of CI to test we don't use deprecated symbols ourselves and in case a dependant uses libgit2 in a vendored way and knows it won't ever use any of the deprecated symbols anyway.

6e1efcd6

2020-06-08T12:46:04

tree-wide: add missing header includes We're missing some header includes leading to missing function prototypes. While we currently don't warn about these, we should have their respective headers included in order to detect the case where a function signature change results in an incompatibility.

a6c9e0b3

2020-06-08T12:40:47

tree-wide: mark local functions as static We've accumulated quite some functions which are never used outside of their respective code unit, but which are lacking the `static` keyword. Add it to reduce their linkage scope and allow the compiler to optimize better.

7c499b54

2020-06-08T12:39:09

tree-wide: remove unused functions We have some functions which aren't used anywhere. Let's remove them to get rid of unneeded baggage.

46637b5e

2020-06-08T14:47:01

checkout: remove unused code for deferred removals With commit 05f690122 (checkout: remove blocking dir when FORCEd, 2015-03-31), the last case was removde that actually queued a deferred removal. This is now more than five years in the past and nobody complained, so we can rest quite assured that the deferred removal is not really needed at all. Let's remove all related code to simplify the already complicated checkout logic.

45901d3e

2020-06-08T12:57:16

revparse: remove superfluous tab character

c146374c

2020-06-08T12:54:26

revparse: detect out-of-memory cases when parsing curly brace contents When extracting curly braces (e.g. the "upstream" part in "HEAD@{upstream}"), we put the curly braces' contents into a `git_buf` structure, but don't check the return value of `git_buf_putc`. So when we run out-of-memory, we'll use a partially filled buffer without noticing. Let's fix this issue by checking `git_buf_putc`'s return value.

53a8f463

2020-06-03T07:40:59

Merge pull request #5536 from libgit2/ethomson/http httpclient: support googlesource

6de8aa7f

2020-06-02T12:21:22

Merge pull request #5532 from joshtriplett/pack-default-path git_packbuilder_write: Allow setting path to NULL to use the default path

22f9a0fc

2020-06-02T12:12:41

Merge pull request #5531 from joshtriplett/mempack-threads mempack: Use threads when building the pack

04c7bdb4

2020-06-01T22:44:14

httpclient: clear the read_buf on new requests The httpclient implementation keeps a `read_buf` that holds the data in the body of the response after the headers have been written. We store that data for subsequent calls to `git_http_client_read_body`. If we want to stop reading body data and send another request, we need to clear that cached data. Clear the cached body data on new requests, just like we read any outstanding data from the socket.

aa8b2c0f

2020-06-01T23:53:55

httpclient: don't read more than the client wants When `git_http_client_read_body` is invoked, it provides the size of the buffer that can be read into. This will be set as the parser context's `output_size` member. Use this as an upper limit on our reads, and ensure that we do not read more than the client requests.

51eff5a5

2020-05-29T13:13:19

strarray: we should `dispose` instead of `free` We _dispose_ the contents of objects; we _free_ objects (and their contents). Update `git_strarray_free` to be `git_strarray_dispose`. `git_strarray_free` remains as a deprecated proxy function.

a9746b30

2020-05-29T11:21:55

strarray: move to its own file

570f0340

2020-06-01T19:10:38

httpclient: read_body should return 0 at EOF When users call `git_http_client_read_body`, it should return 0 at the end of a message. When the `on_message_complete` callback is called, this will set `client->state` to `DONE`. In our read loop, we look for this condition and exit. Without this, when there is no data left except the end of message chunk (`0\r\n`) in the http stream, we would block by reading the three bytes off the stream but not making progress in any `on_body` callbacks. Listening to the `on_message_complete` callback allows us to stop trying to read from the socket when we've read the end of message chunk.

17641f1f

2020-06-01T15:05:51

Merge pull request #5526 from libgit2/ethomson/poolinit git_pool_init: allow the function to fail

0f35efeb

2020-05-23T10:15:51

git_pool_init: handle failure cases Propagate failures caused by pool initialization errors.

1bbdf15d

2020-06-01T13:57:12

Merge pull request #5527 from libgit2/ethomson/config_unreadable Handle unreadable configuration files

d1409f48

2020-05-06T19:57:07

config: ignore unreadable configuration files Modified `config_file_open()` so it returns 0 if the config file is not readable, which happens on global config files under macOS sandboxing (note that for some reason `access(F_OK)` DOES work with sandboxing, but it is lying). Without this read check sandboxed applications on macOS can not open any repository, because `config_file_read()` will return GIT_ERROR when it cannot read the global /Users/username/.gitconfig file, and the upper layers will just completely abort on GIT_ERROR when attempting to load the global config file, so no repositories can be opened.

8c96d56d

2020-05-26T04:53:09

index: write v4: bugfix: prefix path with strip_len, not same_len According to index-format.txt of git, the path of an entry is prefixed with N, where N indicates the length of bytes to be stripped.

5278a006

2020-05-23T16:07:54

git_packbuilder_write: Allow setting path to NULL to use the default path If given a NULL path, write to the object path of the repository. Add tests for the new behavior.

0bc091dd

2020-05-23T15:35:38

git_packbuilder_write: Unify cleanup path Clean up and return via a single label, to avoid duplicate error handling before each return, and to make it easier to extend the set of cleanups needed.

30285a3c

2020-05-23T15:04:19

mempack: Use threads when building the pack The mempack ODB backend creates a packbuilder internally to write out a pack; call git_packbuilder_set_threads on that packbuilder, to use threads for packing if available.

27cb4e0e

2020-05-23T11:02:07

Merge pull request #5522 from pks-t/pks/openssl-cert-memleak OpenSSL certificate memory leak

abfdb8a6

2020-05-23T10:15:37

git_pool_init: return an int Let `git_pool_init` return an int so that it could fail.

e4bdba56

2020-05-23T09:57:22

Merge pull request #5515 from pks-t/pks/flaky-checkout-test tests: checkout: fix flaky test due to mtime race

3b7b4d27

2020-05-23T09:40:55

Merge pull request #5523 from libgit2/pks/cmake-sort-reproducible-builds cmake: Sort source files for reproducible builds

3f201f75

2020-05-16T13:48:04

checkout: fix file being treated as unmodified due to racy index When trying to determine whether a file changed, we try to avoid heavy operations by fist taking a look at the index, seeing whether the index entry is modified already. This doesn't seem to cut it, though, as we currently have the racy checkout::index::can_disable_pathspec_match test case: sometimes the files get restored to their original contents, sometimes they aren't. The issue is caused by a racy index [1]: in case we modify a file, add it to the index and then modify it again in-place without changing its file, then we may end up with a modified file that has the same stat(3P) info as we've currently got it in its corresponding index entry. The mitigation for this is to treat files with the same mtime as the index are treated as racily modified. We already have this logic in place for the index, but not when doing a checkout. Fix the issue by only consulting the index entry in case it has an older mtime as the index. Previously, the following script reliably had at least 20 failures, while now there is no failure to be observed anymore: ```bash j=0 for i in $(seq 100) do if ! ./libgit2_clar -scheckout::index::can_disable_pathspec_match >/dev/null then j=$(($j + 1)) fi done echo "Failures: $j" ``` [1]: https://git-scm.com/docs/racy-git

b85eefb4

2020-05-15T19:52:40

cmake: Sort source files for reproducible builds We currently use `FILE(GLOB ...)` in most places to find source and header files. This is problematic in that the order of files returned depends on the operating system's directory iteration order and may thus not be deterministic. As a result, we link object files in unspecified order, which may cause the linker to emit different code across runs. Fix this issue by sorting all code used as input to the libgit2 library to improve the reliability of reproducible builds.

b43a9e66

2020-05-15T17:46:24

streams: openssl: fix memleak due to us not free'ing certs When creating a `git_cert` from the OpenSSL X509 certificate of a given stream, we do not call `X509_free()` on the certificate, leading to a memory leak as soon as the certificate is requested e.g. by the certificate check callback. Fix the issue by properly calling `X509_free()`.

a2eca682

2020-05-12T21:35:07

futils: fix order of declared parameters for `git_futils_fake_symlink` While the function `git_futils_fake_symlink` is declared with arguments `new, old`, the implementation uses the reverse order `old, new`. Let's fix the ordering issues to be `new, old` for both, which matches what symlink(3P) has. While at it, we also rename these parameters: `old` and `new` doesn't really make a lot of sense in the context of symlinks, which is why this commit renames them to be called `target` and `path`.

cbae1c21

2020-04-01T22:12:07

assert: allow non-int returning functions to assert Include GIT_ASSERT_WITH_RETVAL and GIT_ASSERT_ARG_WITH_RETVAL so that functions that do not return int (or more precisely, where `-1` would not be an error code) can assert. This allows functions that return, eg, NULL on an error code to do that by passing the return value (in this example, `NULL`) as a second parameter to the GIT_ASSERT_WITH_RETVAL functions.

a95096ba

2020-01-12T10:31:07

assert: optionally fall-back to assert(3) Fall back to the system assert(3) in debug builds, which may aide in debugging. "Safe" assertions can be enabled in debug builds by setting GIT_ASSERT_HARD=0. Similarly, hard assertions can be enabled in release builds by setting GIT_ASSERT_HARD to nonzero.

abe2efe1

2019-12-09T12:37:34

Introduce GIT_ASSERT macros Provide macros to replace usages of `assert`. A true `assert` is punishing as a library. Instead we should do our best to not crash. GIT_ASSERT_ARG(x) will now assert that the given argument complies to some format and sets an error message and returns `-1` if it does not. GIT_ASSERT(x) is for internal usage, and available as an internal consistency check. It will set an error message and return `-1` in the event of failure.

56c95cf6

2020-05-10T21:43:38

Fix uninitialized stack memory and NULL ptr dereference in stash_to_index Caught by static analysis.

d62e44cb

2019-06-03T18:35:08

checkout: Fix removing untracked files by path in subdirectories The checkout code didn't iterate into a subdir if it didn't match the pathspec, but since the pathspec might match files in the subdir we should recurse into it (In contrast to gitignore handling). Fixes #5089

63de2128

2020-02-02T20:20:19

checkout: filter pathspecs for _all_ checkout types We were previously applying the pathspec filter for the baseline iterator during checkout, as well as the target tree. This was an oversight; in fact, we should apply the pathspec filter to _all_ checkout targets, not just trees. Add a helper function to set the iterator pathspecs from the given checkout pathspecs, and call it everywhere.

898caead

2020-05-10T19:03:10

Merge pull request #5431 from libgit2/ethomson/hexdump git__hexdump: better mimic `hexdump -C`

9830ab3d

2020-01-29T02:00:04

blame: add option to ignore whitespace changes

e9b0cfc0

2020-04-05T13:24:13

Merge pull request #5485 from libgit2/ethomson/sysdir_unused sysdir: remove unused git_sysdir_get_str

b6f18db9

2020-04-05T11:16:29

sysdir: remove unused git_sysdir_get_str

ce2ab78f

2020-04-04T16:35:33

Fix typo causing removal of symbol 'git_worktree_prune_init_options' Commit 0b5ba0d replaced this function with an "option_init" equivallent, but misspelled the replacement function. As a result, this symbol has been missing from libgit2.so ever since.

ad341eb7

2020-04-04T13:40:14

Merge pull request #5425 from lhchavez/fix-get-delta-base pack: Improve error handling for get_delta_base()

966db47d

2020-04-04T13:21:02

Merge pull request #5477 from pks-t/pks/rename-detection-negative-caches merge: cache negative cache results for similarity metrics

4d4c8e0a

2020-04-02T07:34:55

Re-adding the "delta offset is zero" error case

dfd7fcc4

2020-04-02T13:26:13

Merge pull request #5388 from bk2204/repo-format-v1 Handle repository format v1

b8eec0b2

2020-04-01T22:22:38

Merge pull request #5461 from pks-t/pks/refdb-fs-unused-header refdb_fs: remove unused header file

5d37128d

2020-03-01T10:34:15

git__hexdump: better mimic `hexdump -C`

ba59a4a2

2020-04-01T12:34:16

Making get_delta_base() conform to the general error-handling pattern This makes get_delta_base() return the error code as the return value and the delta base as an out-parameter.

f3273725

2020-02-25T20:58:09

pack: Improve error handling for get_delta_base() This change moves the responsibility of setting the error upon failures of get_delta_base() to get_delta_base() instead of its callers. That way, the caller chan always check if the return value is negative and mark the whole operation as an error instead of using garbage values, which can lead to crashes if the .pack files are malformed.

4dfcc50f

2020-04-01T15:16:18

merge: cache negative cache results for similarity metrics When computing renames, we cache the hash signatures for each of the potentially conflicting entries so that we do not need to repeatedly read the file and can at least halfway efficiently determine whether two files are similar enough to be deemed a rename. In order to make the hash signatures meaningful, we require at least four lines of data to be present, resulting in at least four different hashes that can be compared. Files that are deemed too small are not cached at all and will thus be repeatedly re-hashed, which is usually not a huge issue. The issue with above heuristic is in case a file does _not_ have at least four lines, where a line is anything separated by a consecutive run of "\n" or "\0" characters. For example "a\nb" is two lines, but "a\0\0b" is also just two lines. Taken to the extreme, a file that has megabytes of consecutive space- or NUL-only may also be deemed as too small and thus not get cached. As a result, we will repeatedly load its blob, calculate its hash signature just to finally throw it away as we notice it's not of any value. When you've got a comparitively big file that you compare against a big set of potentially renamed files, then the cost simply expodes. The issue can be trivially fixed by introducing negative cache entries. Whenever we determine that a given blob does not have a meaningful representation via a hash signature, we store this negative cache marker and will from then on not hash it again, but also ignore it as a potential rename target. This should help the "normal" case already where you have a lot of small files as rename candidates, but in the above scenario it's savings are extraordinarily high. To verify we do not hit the issue anymore with described solution, this commit adds a test that uses the exact same setup described above with one 50 megabyte blob of '\0' characters and 1000 other files that get renamed. Without the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 11m48.377s user 11m11.576s sys 0m35.187s And with the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 0m1.972s user 0m1.851s sys 0m0.118s So this represents a ~350-fold performance improvement, but it obviously depends on how many files you have and how big the blob is. The test number were chosen in a way that one will immediately notice as soon as the bug resurfaces.

5f47cb48

2020-03-26T14:16:41

patch: correctly handle mode changes for renames When generating a patch for a renamed file whose mode bits have changed in addition to the rename, then we currently fail to parse the generated patch. Furthermore, when generating a diff we output mode bits after the similarity metric, which is different to how upstream git handles it. Fix both issues by adding another state transition that allows similarity indices after mode changes and by printing mode changes before the similarity index.

bba9599a

2020-03-26T11:56:10

Merge pull request #5445 from lhchavez/fix-5443 Fix segfault when calling git_blame_buffer()

e7a1fd88

2020-03-26T11:42:47

Fix spelling error Signed-off-by: Utkarsh Gupta <utkarsh@debian.org>

74e0489a

2020-03-24T19:42:10

refdb_fs: remove unused header file The "refdb_fs.h" header contains a single struct `git_refcache` that is not used anywhere. As a result, we can just delete the header altogether as it doesn't have any purpose and may confuse readers.

62d59467

2020-03-08T02:13:11

Fix segfault when calling git_blame_buffer() This change makes sure that the hunk is not null before trying to dereference it. This avoids segfaults, especially when blaming against a modified buffer (i.e. the index). Fixes: #5443

a2d3316a

2020-03-13T23:01:11

refdb_fs: initialize backend version While the `git_refdb_backend()` struct has a version, we do not initialize it correctly when calling `git_refdb_backend_fs()`. Fix this by adding the call to `git_refdb_init_backend()`.

9a102446

2020-03-21T16:49:44

Merge pull request #5455 from pks-t/pks/cmake-install-dirs cmake: use install directories provided via GNUInstallDirs

87fc539f

2020-03-13T22:08:19

cmake: use install directories provided via GNUInstallDirs We currently hand-code logic to configure where to install our artifacts via the `LIB_INSTALL_DIR`, `INCLUDE_INSTALL_DIR` and `BIN_INSTALL_DIR` variables. This is reinventing the wheel, as CMake already provide a way to do that via `CMAKE_INSTALL_<DIR>` paths, e.g. `CMAKE_INSTALL_LIB`. This requires users of libgit2 to know about the discrepancy and will require special hacks for any build systems that handle these variables in an automated way. One such example is Gentoo Linux, which sets up these paths in both the cmake and cmake-utils eclass. So let's stop doing that: the GNUInstallDirs module handles it in a better way for us, especially so as the actual values are dependent on CMAKE_INSTALL_PREFIX. This commit removes our own set of variables and instead refers users to use the standard ones. As a second benefit, this commit also fixes our pkgconfig generation to use the GNUInstallDirs module. We had a bug there where we ignored the CMAKE_INSTALL_PREFIX when configuring the libdir and includedir keys, so if libdir was set to "lib64", then libdir would be an invalid path. With GNUInstallDirs, we can now use `CMAKE_INSTALL_FULL_LIBDIR`, which handles the prefix for us.

b1f6481f

2020-03-10T22:07:35

cmake: ignore deprecation notes for Secure Transport The Secure Transport interface we're currently using has been deprecated with macOS 10.15. As we're currently in code freeze, we cannot migrate to newer interfaces. As such, let's disable deprecation warnings for our "schannel.c" stream.

43d7a42b

2020-03-08T18:14:09

win32: don't canonicalize symlink targets Don't canonicalize symlink targets; our win32 path canonicalization routines expect an absolute path. In particular, using the path canonicalization routines for symlink targets (introduced in commit 7d55bee6d, "win32: fix relative symlinks pointing into dirs", 2020-01-10). Now, use the utf8 -> utf16 relative path handling functions, so that paths like "../foo" will be translated to "..\foo".

f2b114ba

2020-03-08T18:11:45

win32: introduce relative path handling function Add a function that takes a (possibly) relative UTF-8 path and emits a UTF-16 path with forward slashes translated to backslashes. If the given path is, in fact, absolute, it will be translated to absolute path handling rules.

fb7da154

2020-03-08T16:34:23

win32: clarify usage of path canonicalization funcs The path canonicalization functions on win32 are intended to canonicalize absolute paths; those with prefixes. In other words, things start with drive letters (`C:\`), share names (`\\server\share`), or other prefixes (`\\?\`). This function removes leading `..` that occur after the prefix but before the directory/file portion (eg, turning `C:\..\..\..\foo` into `C:\foo`). This translation is not appropriate for local paths.

thodg/libgit2/src

src

Log