kmx git

Commit	Date	Message
8db9fd3b	2019-02-02T19:00:41	refdb: documentation
5fc27aac	2019-08-27T13:38:08	Merge pull request #5208 from mkostyuk/apply-removed-new-file apply: git_apply_to_tree fails to apply patches that add new files
6de48085	2019-08-27T11:29:24	Merge pull request #5189 from libgit2/ethomson/attrs_from_head Optionally read `.gitattributes` from HEAD
aaa48d06	2019-08-27T11:26:50	Merge pull request #5196 from pks-t/pks/config-include-onbranch config: implement "onbranch" conditional
4e20c7b1	2019-08-25T22:11:39	Merge pull request #5213 from boardwalk/dskorupski/fix_include_case Fix include casing for case-sensitive filesystems.
44d5e47d	2019-08-24T10:39:56	Fix include casing for case-sensitive filesystems.
60319788	2019-08-23T09:58:15	Merge pull request #5054 from tniessen/util-use-64-bit-timer util: use 64 bit timer on Windows
8cbef12d	2019-08-08T11:52:54	util: do not perform allocations in insertsort Our hand-rolled fallback sorting function `git__insertsort_r` does an in-place sort of the given array. As elements may not necessarily be pointers, it needs a way of swapping two values of arbitrary size, which is currently implemented by allocating a temporary buffer of the element's size. This is problematic, though, as the emulated `qsort` interface doesn't provide any return values and thus cannot signal an error if allocation of that temporary buffer has failed. Convert the function to swap via a temporary buffer allocated on the stack. Like this, it can `memcpy` contents of both elements in small batches without requiring a heap allocation. The buffer size has been chosen such that in most cases, a single iteration of copying will suffice. Most importantly, it can fully contain `git_oid` structures and pointers. Add a bunch of tests for the `git__qsort_r` interface to verify nothing breaks. Furthermore, this removes the declaration of `git__insertsort_r` and makes it static as it is not used anywhere else.
d4fe402b	2019-08-08T10:36:33	merge: check return value of `git_commit_list_insert` The function `git_commit_list_insert` dynamically allocates memory and may thus fail to insert a given commit, but we didn't check for that in several places in "merge.c". Convert surrounding functions to return error codes and check whether `git_commit_list_insert` was successful, returning an error if not.
c0486188	2019-08-08T10:28:09	blame_git: detect memory allocation errors The code in "blame_git.c" was mostly imported from git.git with only minor changes. One of these changes was to use our own allocators instead of git's `xmalloc`, but there's a subtle difference: `xmalloc` would abort the program if unable to allocate any memory, bit `git__malloc` doesn't. As we didn't check for memory allocation errors in some places, we might inadvertently dereference a `NULL` pointer in out-of-memory situations. Convert multiple functions to return proper error codes and add calls to `GIT_ERROR_CHECK_ALLOC` to fix this.
f3b3e543	2019-08-08T11:34:01	xdiff: catch memory allocation errors The xdiff code contains multiple call sites where the results of `xdl_malloc` are not being checked for memory allocation errors. Add checks to fix possible segfaults due to `NULL` pointer accesses.
c2dd895a	2019-08-08T10:47:29	transports: http: check for memory allocation failures When allocating a chunk that is used to write to HTTP streams, we do not check for memory allocation errors. This may lead us to write to a `NULL` pointer and thus cause a segfault. Fix this by adding a call to `GIT_ERROR_CHECK_ALLOC`.
08699541	2019-08-08T10:46:42	trailer: check for memory allocation errors The "trailer.c" code has been copied mostly verbatim from git.git with minor adjustments, only. As git.git's `xmalloc` function, which aborts on memory allocation errors, has been swapped out for `git_malloc`, which doesn't abort, we may inadvertently access `NULL` pointers. Add checks to fix this.
8c7d9761	2019-08-08T10:45:12	posix: fix direct use of `malloc` In "posix.c" there are multiple callsites which execute `malloc` instead of `git__malloc`. Thus, users of library are not able to track these allocations with a custom allocator. Convert these call sites to use `git__malloc` instead.
a477bff1	2019-08-08T10:44:57	indexer: catch OOM when adding expected OIDs When adding OIDs to the indexer's map of yet-to-be-seen OIDs to verify that packfiles are complete, we do so by first allocating a new OID and then calling `git_oidmap_set` on it. There was no check for memory allocation errors in place, though, leading to possible segfaults due to trying to copy data to a `NULL` pointer. Verify the result of `git__malloc` with `GIT_ERROR_CHECK_ALLOC` to fix the issue.
de4bc2bd	2019-08-20T03:29:45	apply: git_apply_to_tree fails to apply patches that add new files git_apply_to_tree() cannot be used apply patches with new files. An attempt to apply such a patch fails because git_apply_to_tree() tries to remove a non-existing file from an old index. The solution is to modify git_apply_to_tree() to git_index_remove() when the patch states that the modified files is removed.
071750a3	2019-08-15T14:18:26	cmake: move _WIN32_WINNT definitions to root
0f40e68e	2019-08-14T09:05:07	Merge pull request #5187 from ianhattendorf/fix/clone-whitespace clone: don't decode URL percent encodings
57a9ccd5	2019-06-21T15:53:54	commit_list: fix possible buffer overflow in `commit_quick_parse` The function `commit_quick_parse` provides a way to quickly parse parts of a commit without storing or verifying most of its metadata. The first thing it does is calculating the number of parents by skipping "parent " lines until it finds the first non-parent line. Afterwards, this parent count is passed to `alloc_parents`, which will allocate an array to store all the parent. To calculate the amount of storage required for the parents array, `alloc_parents` simply multiplicates the number of parents with the respective elements's size. This already screams "buffer overflow", and in fact this problem is getting worse by the result being cast to an `uint32_t`. In fact, triggering this is possible: git-hash-object(1) will happily write a commit with multiple millions of parents for you. I've stopped at 67,108,864 parents as git-hash-object(1) unfortunately soaks up the complete object without streaming anything to disk and thus will cause an OOM situation at a later point. The point here is: this commit was about 4.1GB of size but compressed down to 24MB and thus easy to distribute. The above doesn't yet trigger the buffer overflow, thus. As the array's elements are all pointers which are 8 bytes on 64 bit, we need a total of 536,870,912 parents to trigger the overflow to `0`. The effect is that we're now underallocating the array and do an out-of-bound writes. As the buffer is kindly provided by the adversary, this may easily result in code execution. Extrapolating from the test file with 67m commits to the one with 536m commits results in a factor of 8. Thus the uncompressed contents would be about 32GB in size and the compressed ones 192MB. While still easily distributable via the network, only servers will have that amount of RAM and not cause an out-of-memory condition previous to triggering the overflow. This at least makes this attack not an easy vector for client-side use of libgit2.
cb1439c9	2019-06-19T12:59:27	config: validate ownership of C:\ProgramData\Git\config before using it When the VirtualStore feature is in effect, it is safe to let random users write into C:\ProgramData because other users won't see those files. This seemed to be the case when we introduced support for C:\ProgramData\Git\config. However, when that feature is not in effect (which seems to be the case in newer Windows 10 versions), we'd rather not use those files unless they come from a trusted source, such as an administrator. This change imitates the strategy chosen by PowerShell's native OpenSSH port to Windows regarding host key files: if a system file is owned neither by an administrator, a system account, or the current user, it is ignored.
5774b2b1	2019-08-11T23:42:45	Merge pull request #5113 from pks-t/pks/stash-perf stash: avoid recomputing tree when committing worktree
42bacbc6	2019-08-11T21:06:19	Merge pull request #5121 from pks-t/pks/variadic-errors Variadic macros
fba3bf79	2019-07-21T14:15:12	blob: optionally read attributes from repository When `GIT_BLOB_FILTER_ATTTRIBUTES_FROM_HEAD` is passed to `git_blob_filter`, read attributes from `gitattributes` files that are checked in to the repository at the HEAD revision. This passes the flag `GIT_FILTER_ATTRIBUTES_FROM_HEAD` to the filter functions.
f0f27c1c	2019-07-21T14:13:25	filter: optionally read attributes from repository When `GIT_FILTER_ATTRIBUTES_FROM_HEAD` is specified, configure the filter to read filter attributes from `gitattributes` files that are checked in to the repository at the HEAD revision. This passes the flag `GIT_ATTR_CHECK_INCLUDE_HEAD` to the attribute reading functions.
4fd5748c	2019-07-21T14:11:03	attr: optionally read attributes from repository When `GIT_ATTR_CHECK_INCLUDE_HEAD` is specified, read `gitattribute` files that are checked into the repository at the HEAD revision.
a5392eae	2019-07-21T12:13:07	blob: allow blob filtering to ignore system gitattributes Introduce `GIT_BLOB_FILTER_NO_SYSTEM_ATTRIBUTES`, which tells `git_blob_filter` to ignore the system-wide attributes file, usually `/etc/gitattributes`. This simply passes the appropriate flag to the attribute loading code.
22eb12af	2019-07-21T12:12:05	filter: add GIT_FILTER_NO_SYSTEM_ATTRIBUTES option Allow system-wide attributes (the ones specified in `/etc/gitattributes`) to be ignored if the flag `GIT_FILTER_NO_SYSTEM_ATTRIBUTES` is specified.
fa1a4c77	2019-07-21T11:03:01	blob: deprecate `git_blob_filtered_content` Users should now use `git_blob_filter`.
a32ab076	2019-07-21T10:56:42	blob: introduce git_blob_filter Provide a function to filter blobs that allows for more functionality than the existing `git_blob_filtered_content` function.
b0692d6b	2019-08-09T09:01:56	Merge pull request #4913 from implausible/feature/signing-rebase-commits Add sign capability to git_rebase_commit
998f9c15	2019-08-07T07:21:27	fixup: strange indentation
f627ba6c	2019-08-02T13:18:07	Merge pull request #5197 from pks-t/pks/remote-ifdeffed-block remote: remove unused block of code
e23c0b18	2019-08-02T07:52:58	remote: remove unused block of code In "remote.c", we have a chunk of code that is #ifdef'fed out via `#if 0` with a comment that we could export it as a helper function. The code was implemented in 2013 and ifdef'fed in 2014, which shows that there's clearly no interest in having such a helper at all. As this block has recently created some confusion about `p_getenv` due to it containing the only reference to that function in our codebase, let's remove this block altogether.
d588de7c	2019-08-02T07:51:02	Merge pull request #5191 from eaigner/master config: check if we are running in a sandboxed environment
952fbbfb	2019-08-01T20:04:11	config: check if we are running in a sandboxed environment On macOS the $HOME environment variable returns the path to the sandbox container instead of the actual user $HOME for sandboxed apps. To get the correct path, we have to get it from the password file entry.
722ba93f	2019-08-01T15:14:06	config: implement "onbranch" conditional With Git v2.23.0, the conditional include mechanism gained another new conditional "onbranch". As the name says, it will cause a file to be included if the "onbranch" pattern matches the currently checked out branch. Implement this new condition and add a bunch of tests.
1721ab04	2019-06-16T11:25:47	unix: posix: avoid use of variadic macro `p_snprintf` The macro `p_snprintf` is implemented as a variadic macro that calls `snprintf` directly with `__VA_ARGS__`. In C89, variadic macros are not allowed, but as the arguments of `p_snprintf` and `snprintf` are matching 1:1, we can fix this by simply removing the parameter list from `p_snprintf`.
63d8cd18	2019-06-16T11:17:17	apply: remove use of variadic error macro The macro `apply_err` is implemented as a variadic macro, which are not defined by C89. Convert it to a variadic function, instead.
27b8b31e	2019-08-01T11:57:03	parse: remove use of variadic macros which are not C89 compliant The macro `git_parse_error` is implemented in a variadic way so that it's possible to pass printf-style parameters. Unfortunately, variadic macros are not defined by C89 and thus we cannot use that functionality. But as we have implemented `git_error_vset` in the previous commit, we can now just use that instead. Convert `git_parse_error` to a variadic function and use `git_error_vset` to fix the compliance violation. While at it, move the function to "patch_parse.c".
c8e63812	2019-06-16T11:03:08	errors: introduce `git_error_vset` function Right now, we only provide a `git_error_set` that has a variadic function signature. It's impossible to drive this function in a C89-compliant way from other functions that have a variadic signature, though, like for example `git_parse_error`. Implement a new `git_error_vset` function that gets a `va_list` as parameter, fixing the above problem.
e8f63411	2019-08-01T11:29:58	Merge pull request #5186 from pks-t/pks/config-snapshot-separation config: separate file and snapshot backends
fb0730f1	2019-04-16T23:49:16	util: use 64 bit timer on Windows git__timer was originally implemented using a 32 bit timer since Windows XP did not support GetTickCount64. Windows XP was discontinued five years ago, so it should be safe to use the new API. As a benefit, we do not need to care about overflows for the next 585 million years.
c8e249b0	2019-07-29T10:51:22	object: deprecate git_object__size for removal In #5118 we remove the double-underscore to make it a normally-named public function. However, this is not an interesting function outside of the library and it takes up a name for something that could be more useful. Remove the single-underscore version as we have not done any releases with it.
37ebe9ad	2019-07-24T18:49:08	config_backend: rename internal structures The internal backend structures are kind-of legacy and do not really speak for themselves. Rename them accordingly to make them easier to understand.
2bff84ba	2019-07-26T21:02:56	config_file: separate out read-only backend To further distinguish the file writeable and readonly backends, separate the readonly backend into its own "config_snapshot.c" implementation. The snapshot backend can be generically used to snapshot any type of backend.
f0b10066	2019-07-24T18:37:14	config_file: fix cast of readonly backend In `backend_readonly_free`, the passed in config backend is being cast to a `diskfile_backend` instead of to a `diskfile_readonly_backend`. While this works out just fine because we only access its header values, which were shared between both backends, it is undefined behaviour. Use the correct type to fix this.
a3159df8	2019-07-24T18:31:43	config_file: remove shared `diskfile_header` struct The `diskfile_header` structure is shared between both `diskfile_backend` and `diskfile_readonly_backend`. The separation and resulting casting is confusing at times and a source for programming errors. Remove the shared structure and inline them directly.
271e5fba	2019-07-24T18:18:18	config_file: duplicate accessors for readonly backend While most functions of the readonly configuration backend are implemented separately from the writeable configuration backend, the two functions `config_iterator_new` and `config_get` are shared between both. This sharing makes it necessary to have some shared data structures, which is the `diskfile_header` structure. Unfortunately, this makes the backends harder to grasp than necessary due to all the casting between structs and also quite error prone. Reimplement those functions for the readonly backends. As readonly backends cannot be refreshed anyway, we can remove the calls to `config_refresh` in there.
4e7ce1fb	2019-07-24T18:13:52	config_file: reimplement `config_readonly_open` generically The `config_readonly_open` function currently receives as input a diskfile backend and will copy its entries to a new snapshot. This is rather intimate, as we need to assume that the source config backend is in fact a diskfile entry. We can do better than this though by using generic methods to copy contents of the provided backend, e.g. by using a config iterator. This also allows us to decouple the read-only backend from the read-write backend.
76182e84	2019-07-24T18:04:38	config_entries: fix possible segfault when duplicating entries When duplicating a configuration entry, we allocate a new entry but do not verify that we get a valid pointer back. As we're dereferencing the pointer afterwards, we might thus run into a segfault in out-of-memory situations. Extract a new function `git_config_entries_dup_entry` that handles the complete entry duplication. Fix the error by using `GIT_ERROR_CHECK_ALLOC`.
ba2885da	2019-07-24T18:05:28	git_net_url_parse: don't git_buf_decode_percent for path
2766b92d	2019-07-21T15:10:34	config_file: refresh when creating an iterator When creating a new iterator for a config file backend, then we should always make sure that we're up to date by calling `config_refresh`. Otherwise, we might not notice when another process has modified the configuration file and thus will represent outdated values. Add two tests to config::stress that verify that we get up-to-date values when reading configuration entries via `git_config_iterator`.
9fac8b78	2019-07-21T15:08:22	config_file: do not refresh read-only backends If calling `config_refresh` on a read-only configuration file backend, then we will segfault when comparing the timestamp of the file due to `path` being uninitialized. As a read-only snapshot should not be refreshed anyway and stay consistent, we can simply return early when calling `config_refresh` on a read-only snapshot.
28d11b59	2019-07-21T14:41:21	config_file: consistently use `GIT_CONTAINER_OF`
6be5ac23	2019-07-11T15:30:51	checkout: postpone creation of symlinks to the end On most platforms it's fine to create symlinks to nonexisting files. Not so on Windows, where the type of a symlink (file or directory) needs to be set at creation time. So depending on whether the target file exists or not, we may end up with different symlink types. This creates a problem when performing checkouts, where we simply iterate over all blobs that need to be updated without treating symlinks any special. If the target file of the symlink is going to be checked out after the symlink itself, then the symlink will be created as directory symlink and not as file symlink. Fix the issue by iterating over blobs twice: once to perform postponed deletions and updates to non-symlink blobs, and once to perform updates to symlink blobs.
50194dcd	2019-07-11T15:14:42	win32: fix symlinks to relative file targets When creating a symlink in Windows, one needs to tell Windows whether the symlink should be a file or directory symlink. To determine which flag to pass, we call `GetFileAttributesW` on the target file to see whether it is a directory and then pass the flag accordingly. The problem though is if create a symlink with a relative target path, then we will check that relative path while not necessarily being inside of the working directory where the symlink is to be created. Thus, getting its attributes will either fail or return attributes of the wrong target. Fix this by resolving the target path relative to the directory in which the symlink is to be created.
a00842c4	2019-06-29T09:59:14	win32: correctly unlink symlinks to directories When deleting a symlink on Windows, then the way to delete it depends on whether it is a directory symlink or a file symlink. In the first case, we need to use `DeleteFile`, in the second `RemoveDirectory`. Right now, `p_unlink` will only ever try to use `DeleteFile`, though, and thus fail to remove directory symlinks. This mismatches how unlink(3P) is expected to behave, though, as it shall remove any symlink disregarding whether it is a file or directory symlink. In order to correctly unlink a symlink, we thus need to check what kind of file this is. If we were to first query file attributes of every file upon calling `p_unlink`, then this would penalize the common case though. Instead, we can try to first delete the file with `DeleteFile` and only if the error returned is `ERROR_ACCESS_DENIED` will we query file attributes and determine whether it is a directory symlink to use `RemoveDirectory` instead.
ded77bb1	2019-06-29T09:58:34	path: extract function to check whether a path supports symlinks When initializing a repository, we need to check whether its working directory supports symlinks to correctly set the initial value of the "core.symlinks" config variable. The code to check the filesystem is reusable in other parts of our codebase, like for example in our tests to determine whether certain tests can be expected to succeed or not. Extract the code into a new function `git_path_supports_symlinks` to avoid duplicate implementations. Remove a duplicate implementation in the repo test helper code.
e54343a4	2019-06-29T09:17:32	fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.
a7d32d60	2019-07-20T18:46:32	stash: avoid recomputing tree when committing worktree When creating a new stash, we need to create there separate commits storing differences stored in the index, untracked changes as well as differences in the working directory. The first two will only be done conditionally if the equivalent options "git stash --keep-index --include-untracked" are being passed to `git_stash_save`, but even when only creating a stash of worktree changes we're much slower than git.git. Using our new stash example: $ time git stash Saved working directory and index state WIP on (no branch): 2f7d9d47575e Linux 5.1.7 real 0m0.528s user 0m0.309s sys 0m0.381s $ time lg2 stash real 0m27.165s user 0m13.645s sys 0m6.403s As can be seen, libgit2 is more than 50x slower than git.git! When creating the stash commit that includes all worktree changes, we create a completely new index to prepare for the new commit and populate it with the entries contained in the index' tree. Here comes the catch: by populating the index with a tree's contents, we do not have any stat caches in the index. This means that we have to re-validate every single file from the worktree and see whether it has changed. The issue can be fixed by populating the new index with the repo's existing index instead of with the tree. This retains all stat cache information, and thus we really only need to check files that have changed stat information. This is semantically equivalent to what we previously did: previously, we used the tree of the commit computed from the index. Now we're just using the index directly. And, in fact, the cache is doing wonders: time lg2 stash real 0m1.836s user 0m1.166s sys 0m0.663s We're now performing 15x faster than before and are only 3x slower than git.git now.
a613832e	2019-07-20T18:49:48	patch_parse: fix segfault due to line containing static contents With commit dedf70ad2 (patch_parse: do not depend on parsed buffer's lifetime, 2019-07-05), all lines of the patch are allocated with `strdup` to make lifetime of the parsed patch independent of the buffer that is currently being parsed. In patch b08932824 (patch_parse: ensure valid patch output with EOFNL, 2019-07-11), we introduced another code location where we add lines to the parsed patch. But as that one was implemented via a separate pull request, it wasn't converted to use `strdup`, as well. As a consequence, we generate a segfault when trying to deallocate the potentially static buffer that's now in some of the lines. Use `git__strdup` to fix the issue.
e07dbc92	2019-07-20T11:26:00	Merge pull request #5173 from pks-t/pks/gitignore-wildmatch-error ignore: fix determining whether a shorter pattern negates another
fd7a384b	2019-07-20T11:24:37	Merge pull request #5159 from pks-t/pks/patch-parse-old-missing-nl patch_parse: handle missing newline indicator in old file
f33ca472	2019-07-20T11:06:23	Merge pull request #5158 from pks-t/pks/patch-parsed-lifetime patch_parse: do not depend on parsed buffer's lifetime
d78a1b18	2019-07-20T11:04:53	Merge pull request #5174 from pks-t/pks/winhttp-hash sha1: fix compilation of WinHTTP backend
964c1c60	2019-07-20T11:02:30	Merge pull request #5176 from pks-t/pks/repo-template-head repository: do not initialize HEAD if it's provided by templates
9d46f167	2019-07-19T10:50:51	repository: do not initialize HEAD if it's provided by templates When using templates to initialize a git repository, then git-init(1) will copy over all contents of the template directory. These will be preferred over the default ones created by git-init(1). While we mostly do the same, there is the exception of "HEAD". While we do copy over the template's HEAD file, afterwards we'll immediately re-initialize its contents with either the default "ref: refs/origin/master" or the init option's `initial_head` field. Let's fix the inconsistency with upstream git-init(1) by not overwriting the template HEAD, but only if the user hasn't set `opts.initial_head`. If the `initial_head` field has been supplied, we should use that indifferent from whether the template contained a HEAD file or not. Add tests to verify we correctly use the template directory's HEAD file and that `initial_head` overrides the template.
f3134a84	2019-07-19T10:41:10	repository: update error handling in `init_ext` Update `git_repository_init_ext` to use our typical style of error handling. The function had multiple statements which didn't `goto out` immediately but instead deferred it to later calls combined with `if` statements.
869ae5a3	2019-07-19T10:15:43	repository: avoid swallowing error codes in `create_head` The error handling in `git_repository_create_head` completely swallows all error codes. While probably not too much of a problem, this also violates our usual coding style. Refactor the code to use a local `error` variable with the typical `goto out` statements.
3424c210	2019-07-19T08:00:13	Merge pull request #5138 from libgit2/ethomson/cvar configuration: cvar -> configmap
a33c0de2	2019-07-18T19:17:40	Merge pull request #5172 from bk2204/cache-efficient-eviction Evict cache items more efficiently
658022c4	2019-07-18T13:53:41	configuration: cvar -> configmap `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.
343fb83a	2019-07-18T13:50:47	Merge pull request #5156 from pks-t/pks/attr-macros-in-subdir gitattributes: ignore macros defined in subdirectories
7574564e	2019-07-18T13:40:34	sha1: win32: fix compilation due to unknown type In commit bbf034ab9 (hash: move `git_hash_prov` into Win32 backend, 2019-02-22), the `git_hash_prov`'s structure name has been removed in favour of its typedef'ed name. But as we have no CI that compiles with the WinHTTPS hashing backend right now, it wasn't noticed that the implementation that uses this struct wasn't changed correctly. Fix the struct type to make it compile again.
6f6340af	2019-07-18T11:57:55	ignore: fix determining whether a shorter pattern negates another When computing whether we need to store a negative pattern, we iterate through all previously known patterns and check whether the negative pattern undoes any of the previous ones. In doing so we call `wildmatch` and check it's return for any negative error values. If there was a negative return, we will abort and bubble up that error to the caller. In fact, this check for negative values stems from the time where we still used `fnmatch` instead of `wildmatch`. For `fnmatch`, negative values indicate a "real" error, while for `wildmatch` a negative value may be returned if the matching was prematurely aborted. A premature abort may for example also happen if the pattern matches a prefix of the haystack if the pattern is shorter. Returning an error in that case is the wrong thing to do. Fix the code to compare for equality with `WM_MATCH`, only. Negative values returned by `wildmatch` are perfectly fine and thus should be ignored. Add a test that verifies we do not see the error.
770b91b1	2019-07-17T15:59:54	cache: evict items more efficiently When our object cache is full, we pick eight items (or the whole cache, if there are fewer) and evict them. For small cache sizes, this is fine, but when we're dealing with a large number of objects, we can repeatedly exhaust the cache and spend a large amount of time in git_oidmap_iterate trying to find items to evict. Instead, let's assume that if the cache gets full, we have a large number of objects that we're handling, and be more aggressive about evicting items. Let's remove one item for every 2048 items, but not less than 8. This causes us to scale our evictions in proportion to the size of the cache and significantly reduces the time we spend in git_oidmap_iterate. Before this change, a full pack of all the non-blob objects in the Linux repository took in excess of 30 minutes and spent 62.3% of total runtime in odb_read_1 and its children, and 44.3% of the time in git_oidmap_iterate. With this change, the same operation now takes 14 minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total time, whereas git_oidmap_iterate consists of 6.2%. Note that we do spend a little more time inflating objects and a decent amount more time in memcmp. However, overall, the time taken is significantly improved, and time in pack building is now dominated by git_delta_create_from_index (33.7%), which is what we would expect.
c4df926b	2019-07-16T21:54:10	pack-objects: allocate memory more efficiently The packbuilder code allocates memory in chunks. When it needs to allocate, it tries to add 1024 to the number of objects and multiply by 3/2. However, it actually multiplies by 1 instead, since it performs an integral division in the expression "3 / 2" and only then multiplies by the increased number of objects. The current behavior causes the code to waste massive amounts of time copying memory when it reallocates, causing inserting all non-blob objects in the Linux repository into a new pack to take some indeterminate time greater than 5 minutes instead of 52 seconds. Correct this error by first dividing by two, and only then multiplying by 3. We still check for overflow for the multiplication, which is the only part that can overflow. This appears to be the only place in the code base which has this problem.
f8346905	2019-07-12T09:03:33	attr_file: ignore macros defined in subdirectories Right now, we are unconditionally applying all macros found in a gitatttributes file. But quoting gitattributes(5): Custom macro attributes can be defined only in top-level gitattributes files ($GIT_DIR/info/attributes, the .gitattributes file at the top level of the working tree, or the global or system-wide gitattributes files), not in .gitattributes files in working tree subdirectories. The built-in macro attribute "binary" is equivalent to: So gitattribute files in subdirectories of the working tree may explicitly _not_ contain macro definitions, but we do not currently enforce this limitation. This patch introduces a new parameter to the gitattributes parser that tells whether macros are allowed in the current file or not. If set to `false`, we will still parse macros, but silently ignore them instead of adding them to the list of defined macros. Update all callers to correctly determine whether the to-be-parsed file may contain macros or not. Most importantly, when walking up the directory hierarchy, we will only set it to `true` once it reaches the root directory of the repo itself. Add a test that verifies that we are indeed not applying macros from subdirectories. Previous to these changes, the test would've failed.
97968529	2019-07-05T08:05:16	attr_file: refactor `parse_buffer` function The gitattributes code is one of our oldest and most-untouched codebases in libgit2, and as such its code style doesn't quite match our current best practices. Refactor the function `git_attr_file__parse_buffer` to better match them.
dbc7e4b1	2019-07-05T07:53:02	attr_file: refactor `load_standalone` function The gitattributes code is one of our oldest and most-untouched codebases in libgit2, and as such its code style doesn't quite match our current best practices. Refactor the function `git_attr_file__lookup_standalone` to better match them.
be8f9bb1	2019-07-05T13:33:10	attrcache: fix memory leak if inserting invalid macro to cache A macro without any assignments is considered an invalid macro by the attributes cache and is thus not getting added to the macro map at all. But as `git_attr_cache__insert_macro` returns success with neither free'ing nor adopting the macro into its map, this will cause a memory leak. Fix this by freeing the macro in the function if it's not going to be added. This is perfectly fine to do, as callers assume that the attrcache will have the macro adopted on success anyway.
7277bf83	2019-07-05T13:33:05	attrcache: fix multiple memory leaks when inserting macros The function `git_attr_cache__insert_macro` is responsible for adopting macros in the per-repo macro cache. When adding a macro that replaces an already existing macro (e.g. because of re-parsing gitattributes files), then we do not free the previous macro and thus cause a memory leak. Fix this leak by first checking if the cache already has a macro defined with the same name. If so, free it before replacing the cache entry with the new instance.
5ae22a63	2019-06-21T08:13:31	fileops: fix creation of directory in filesystem root In commit 45f24e787 (git_repository_init: stop traversing at windows root, 2019-04-12), we have fixed `git_futils_mkdir` to correctly handle the case where we create a directory in Windows-style filesystem roots like "C:\repo". The problem here is an off-by-one: previously, to that commit, we've been checking wether the parent directory's length is equal to the root directory's length incremented by one. When we call the function with "/example", then the parent directory's length ("/") is 1, but the root directory offset is 0 as the path is directly rooted without a drive prefix. This resulted in `1 == 0 + 1`, which was true. With the change, we've stopped incrementing the root directory length, and thus now compare `1 <= 0`, which is false. The previous way of doing it was kind of finicky any non-obvious, which is also why the error was introduced. So instead of just re-adding the increment, let's explicitly add a condition that aborts finding the parent if the current parent path is "/". Making this change causes Azure Pipelines to fail the testcase repo::init::nonexistent_paths on Unix-based systems. This is because we have just fixed creating directories in the filesystem root, which previously didn't work. As Docker-based tests are running as root user, we are thus able to create the non-existing path and will now succeed to create the repository that was expected to actually fail. Let's split this up into three different tests: - A test to verify that we do not create repos in a non-existing parent directoy if the flag `GIT_REPOSITORY_INIT_MKPATH` is not set. - A test to verify that we fail if the root directory does not exist. As there is a common root directory on Unix-based systems that always exist, we can only test for this on Windows-based systems. - A test to verify that we fail if trying to create a repository in an unwriteable parent directory. We can only test this if not running tests as root user, as CAP_DAC_OVERRIDE will cause us to ignore permissions when creating files.
b0893282	2019-07-11T12:12:04	patch_parse: ensure valid patch output with EOFNL
3f855fe8	2019-07-05T11:06:33	patch_parse: handle missing newline indicator in old file When either the old or new file contents have no newline at the end of the file, then git-diff(1) will print out a "\ No newline at end of file" indicator. While we do correctly handle this in the case where the new file has this indcator, we fail to parse patches where the old file is missing a newline at EOF. Fix this bug by handling and missing newline indicators in the old file. Add tests to verify that we can parse such files.
b30dab8f	2019-07-11T12:10:48	apply: refactor to use a switch statement
001d76e1	2019-07-11T11:34:40	diff: ignore EOFNL for computing patch IDs The patch ID is supposed to be mostly context-insignificant and thus only includes added or deleted lines. As such, we shouldn't honor end-of-file-without-newline markers in diffs. Ignore such lines to fix how we compute the patch ID for such diffs.
dbeadf8a	2019-07-11T10:56:05	config_parse: provide parser init and dispose functions Right now, all configuration file backends are expected to directly mess with the configuration parser's internals in order to set it up. Let's avoid doing that by implementing both a `git_config_parser_init` and `git_config_parser_dispose` function to clearly define the interface between configuration backends and the parser. Ideally, we would make the `git_config_parser` structure definition private to its implementation. But as that would require an additional memory allocation that was not required before we just live with it being visible to others.
32157526	2019-07-11T11:10:02	config_file: refactor error handling in `config_write` Error handling in `config_write` is rather convoluted and does not match our current code style. Refactor it to make it easier to understand.
820fa1a3	2019-07-11T11:04:33	config_file: internalize `git_config_file` struct With the previous commits, we have finally separated the config parsing logic from the specific configuration file backend. Due to that, we can now move the `git_config_file` structure into the config file backend's implementation so that no other code may accidentally start using it again. Furthermore, we rename the structure to `diskfile` to make it obvious that it is internal, only, and to unify it with naming scheme of the other diskfile structures.
6e6da75f	2019-07-11T11:00:05	config_parse: remove use of `git_config_file` The config parser code needs to keep track of the current parsed file's name so that we are able to provide proper error messages to the user. Right now, we do that by storing a `git_config_file` in the parser structure, but as that is a specific backend and the parser aims to be generic, it is a layering violation. Switch over to use a simple string to fix that.
54d350e0	2019-06-21T12:53:43	config_file: embed file in diskfile parse data The config file code needs to keep track of the actual `git_config_file` structure, as it not only contains the path of the current configuration file, but it also keeps tracks of all includes of that file. Right now, we keep track of that structure via the `git_config_parser`, but as that's supposed to be a backend generic implementation of configuration parsing it's a layering violation to have it in there. Switch over the config file backend to use its own config file structure that's embedded in the backend parse data. This allows us to switch over the generic config parser to avoid using the `git_config_file` structure.
76749dfb	2019-06-21T12:33:31	config_parse: rename `data` parameter to `payload` for clarity By convention, parameters that get passed to callbacks are usually named `payload` in our codebase. Rename the `data` parameters in the configuration parser callbacks to `payload` to avoid confusion.
2ba7020f	2019-06-27T09:23:59	config_file: avoid re-reading files on write When we rewrite the configuration file due to any of its values being modified, we call `config_refresh` to update the in-memory representation of our config file backend. This is needlessly wasteful though, as `config_refresh` will always open the on-disk representation to reads the file contents while we already know the complete file contents at this point in time as we have just written it to disk. Implement a new function `config_refresh_from_buffer` that will refresh the backend's config entries from a buffer instead of from the config file itself. Note that this will thus _not_ update the backend's timestamp, which will cause us to re-read the buffer when performing a read operation on it. But this is still an improvement as we now lazily re-read the contents, and most importantly we will avoid constantly re-reading the contents if we perform multiple write operations. The following strace demonstrates this if we're re-writing a key multiple times. It uses our config example with `config_set` changed to update the file 10 times with different keys: $ strace lg2 config x.x z \|& grep '^open.config' open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/home/pks/.config/git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 And now with the optimization of `config_refresh_from_buffer`: $ strace lg2 config x.x z \|& grep '^open.config' open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/home/pks/.config/git/config", O_RDONLY\|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 4 As can be seen, this is quite a lot of `open` calls less.
a0dc3027	2019-06-27T08:54:51	config_file: split out function that sets config entries Updating a config file backend's config entries is a bit more involved, as it requires clearing of the old config entries as well as handling locking correctly. As we will need this functionality in a future patch to refresh config entries from a buffer, let's extract this into its own function `config_set_entries`.
985f5cdf	2019-06-27T08:41:16	config_file: split out function that reads entries from a buffer The `config_read` function currently performs both reading the on-disk config file as well as parsing the retrieved buffer contents. To optimize how we refresh our config entries from an in-memory buffer, we need to be able to directly parse buffers, though, without involving any on-disk files at all. Extract a new function `config_read_buffer` that sets up the parsing logic and then parses config entries from a buffer, only. Have `config_read` use it to avoid duplicated logic.
3e1c137a	2019-06-27T08:24:21	config_file: move refresh into `write` function We are quite lazy in how we refresh our config file backend when updating any of its keys: instead of just updating our in-memory representation of the keys, we just discard the old set of keys and then re-read the config file contents from disk. This refresh currently happens separately at every callsite of `config_write`, but it is clear that we _always_ want to refresh if we have written the config file to disk. If we didn't, then we'd run around with an outdated config file backend that does not represent what we have on disk. By moving the refresh into `config_write`, we are also able to optimize the case where the config file is currently locked. Before, we would've tried to re-read the file even if we have only updated its cached contents without touching the on-disk file. Thus we'd have unnecessarily stat'd the file, even though we know that it shouldn't have been modified in the meantime due to its lock.
d7f58eab	2019-06-21T11:55:21	config_file: implement stat cache to avoid repeated rehashing To decide whether a config file has changed, we always hash its complete contents. This is unnecessarily expensive, as well-behaved filesystems will always update stat information for files which have changed. So before computing the hash, we should first check whether the stat info has actually changed for either the configuration file or any of its includes. This avoids having to re-read the configuration file and its includes every time when we check whether it's been modified. Tracing the for-each-ref example previous to this commit, one can see that we repeatedly re-open both the repo configuration as well as the global configuration: $ strace lg2 for-each-ref \|& grep config access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) access("/home/pks/.config/git/config", F_OK) = 0 access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 access("/tmp/repo/.git/config", F_OK) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05290) = -1 ENOENT (No such file or directory) access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 access("/home/pks/.config/git/config", F_OK) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c051f0) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY\|O_CLOEXEC) = 3 With the change, we only do stats for those files and open them a single time, only: $ strace lg2 for-each-ref \|& grep config access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) access("/home/pks/.config/git/config", F_OK) = 0 access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 access("/tmp/repo/.git/config", F_OK) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffe70540d20) = -1 ENOENT (No such file or directory) access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 access("/home/pks/.config/git/config", F_OK) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY\|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540ca0) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540c80) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG\|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG\|0644, st_size=1154, ...}) = 0 The following benchmark has been performed with and without the stat cache in a best-of-ten run: ``` int lg2_repro(git_repository repo, int argc, char argv) { git_config cfg; int32_t dummy; int i; UNUSED(argc); UNUSED(argv); check_lg2(git_repository_config(&cfg, repo), "Could not obtain config", NULL); for (i = 1; i < 100000; ++i) git_config_get_int32(&dummy, cfg, "foo.bar"); git_config_free(cfg); return 0; } ``` Without stat cache: $ time lg2 repro real 0m1.528s user 0m0.568s sys 0m0.944s With stat cache: $ time lg2 repro real 0m0.526s user 0m0.268s sys 0m0.258s This benchmark shows a nearly three-fold performance improvement. This change requires that we check our configuration stress tests as we're now in fact becoming more racy. If somebody is writing a configuration file at nearly the same time (there is a window of 100ns on Windows-based systems), then it might be that we realize that this file has actually changed and thus may not re-read it. This will only happen if either an external process is rewriting the configuration file or if the same process has multiple `git_config` structures pointing to the same time, where one of both is being used to write and the other one is used to read values.
d0868646	2019-06-21T11:43:09	config: use `git_config_file` in favor of `struct config_file`
398412cc	2019-07-05T11:56:16	Merge pull request #5143 from libgit2/ethomson/warnings ci: build with ENABLE_WERROR on Windows

8db9fd3b

2019-02-02T19:00:41

refdb: documentation

5fc27aac

2019-08-27T13:38:08

Merge pull request #5208 from mkostyuk/apply-removed-new-file apply: git_apply_to_tree fails to apply patches that add new files

6de48085

2019-08-27T11:29:24

Merge pull request #5189 from libgit2/ethomson/attrs_from_head Optionally read `.gitattributes` from HEAD

aaa48d06

2019-08-27T11:26:50

Merge pull request #5196 from pks-t/pks/config-include-onbranch config: implement "onbranch" conditional

4e20c7b1

2019-08-25T22:11:39

Merge pull request #5213 from boardwalk/dskorupski/fix_include_case Fix include casing for case-sensitive filesystems.

44d5e47d

2019-08-24T10:39:56

Fix include casing for case-sensitive filesystems.

60319788

2019-08-23T09:58:15

Merge pull request #5054 from tniessen/util-use-64-bit-timer util: use 64 bit timer on Windows

8cbef12d

2019-08-08T11:52:54

util: do not perform allocations in insertsort Our hand-rolled fallback sorting function `git__insertsort_r` does an in-place sort of the given array. As elements may not necessarily be pointers, it needs a way of swapping two values of arbitrary size, which is currently implemented by allocating a temporary buffer of the element's size. This is problematic, though, as the emulated `qsort` interface doesn't provide any return values and thus cannot signal an error if allocation of that temporary buffer has failed. Convert the function to swap via a temporary buffer allocated on the stack. Like this, it can `memcpy` contents of both elements in small batches without requiring a heap allocation. The buffer size has been chosen such that in most cases, a single iteration of copying will suffice. Most importantly, it can fully contain `git_oid` structures and pointers. Add a bunch of tests for the `git__qsort_r` interface to verify nothing breaks. Furthermore, this removes the declaration of `git__insertsort_r` and makes it static as it is not used anywhere else.

d4fe402b

2019-08-08T10:36:33

merge: check return value of `git_commit_list_insert` The function `git_commit_list_insert` dynamically allocates memory and may thus fail to insert a given commit, but we didn't check for that in several places in "merge.c". Convert surrounding functions to return error codes and check whether `git_commit_list_insert` was successful, returning an error if not.

c0486188

2019-08-08T10:28:09

blame_git: detect memory allocation errors The code in "blame_git.c" was mostly imported from git.git with only minor changes. One of these changes was to use our own allocators instead of git's `xmalloc`, but there's a subtle difference: `xmalloc` would abort the program if unable to allocate any memory, bit `git__malloc` doesn't. As we didn't check for memory allocation errors in some places, we might inadvertently dereference a `NULL` pointer in out-of-memory situations. Convert multiple functions to return proper error codes and add calls to `GIT_ERROR_CHECK_ALLOC` to fix this.

f3b3e543

2019-08-08T11:34:01

xdiff: catch memory allocation errors The xdiff code contains multiple call sites where the results of `xdl_malloc` are not being checked for memory allocation errors. Add checks to fix possible segfaults due to `NULL` pointer accesses.

c2dd895a

2019-08-08T10:47:29

transports: http: check for memory allocation failures When allocating a chunk that is used to write to HTTP streams, we do not check for memory allocation errors. This may lead us to write to a `NULL` pointer and thus cause a segfault. Fix this by adding a call to `GIT_ERROR_CHECK_ALLOC`.

08699541

2019-08-08T10:46:42

trailer: check for memory allocation errors The "trailer.c" code has been copied mostly verbatim from git.git with minor adjustments, only. As git.git's `xmalloc` function, which aborts on memory allocation errors, has been swapped out for `git_malloc`, which doesn't abort, we may inadvertently access `NULL` pointers. Add checks to fix this.

8c7d9761

2019-08-08T10:45:12

posix: fix direct use of `malloc` In "posix.c" there are multiple callsites which execute `malloc` instead of `git__malloc`. Thus, users of library are not able to track these allocations with a custom allocator. Convert these call sites to use `git__malloc` instead.

a477bff1

2019-08-08T10:44:57

indexer: catch OOM when adding expected OIDs When adding OIDs to the indexer's map of yet-to-be-seen OIDs to verify that packfiles are complete, we do so by first allocating a new OID and then calling `git_oidmap_set` on it. There was no check for memory allocation errors in place, though, leading to possible segfaults due to trying to copy data to a `NULL` pointer. Verify the result of `git__malloc` with `GIT_ERROR_CHECK_ALLOC` to fix the issue.

de4bc2bd

2019-08-20T03:29:45

apply: git_apply_to_tree fails to apply patches that add new files git_apply_to_tree() cannot be used apply patches with new files. An attempt to apply such a patch fails because git_apply_to_tree() tries to remove a non-existing file from an old index. The solution is to modify git_apply_to_tree() to git_index_remove() when the patch states that the modified files is removed.

071750a3

2019-08-15T14:18:26

cmake: move _WIN32_WINNT definitions to root

0f40e68e

2019-08-14T09:05:07

Merge pull request #5187 from ianhattendorf/fix/clone-whitespace clone: don't decode URL percent encodings

57a9ccd5

2019-06-21T15:53:54

commit_list: fix possible buffer overflow in `commit_quick_parse` The function `commit_quick_parse` provides a way to quickly parse parts of a commit without storing or verifying most of its metadata. The first thing it does is calculating the number of parents by skipping "parent " lines until it finds the first non-parent line. Afterwards, this parent count is passed to `alloc_parents`, which will allocate an array to store all the parent. To calculate the amount of storage required for the parents array, `alloc_parents` simply multiplicates the number of parents with the respective elements's size. This already screams "buffer overflow", and in fact this problem is getting worse by the result being cast to an `uint32_t`. In fact, triggering this is possible: git-hash-object(1) will happily write a commit with multiple millions of parents for you. I've stopped at 67,108,864 parents as git-hash-object(1) unfortunately soaks up the complete object without streaming anything to disk and thus will cause an OOM situation at a later point. The point here is: this commit was about 4.1GB of size but compressed down to 24MB and thus easy to distribute. The above doesn't yet trigger the buffer overflow, thus. As the array's elements are all pointers which are 8 bytes on 64 bit, we need a total of 536,870,912 parents to trigger the overflow to `0`. The effect is that we're now underallocating the array and do an out-of-bound writes. As the buffer is kindly provided by the adversary, this may easily result in code execution. Extrapolating from the test file with 67m commits to the one with 536m commits results in a factor of 8. Thus the uncompressed contents would be about 32GB in size and the compressed ones 192MB. While still easily distributable via the network, only servers will have that amount of RAM and not cause an out-of-memory condition previous to triggering the overflow. This at least makes this attack not an easy vector for client-side use of libgit2.

cb1439c9

2019-06-19T12:59:27

config: validate ownership of C:\ProgramData\Git\config before using it When the VirtualStore feature is in effect, it is safe to let random users write into C:\ProgramData because other users won't see those files. This seemed to be the case when we introduced support for C:\ProgramData\Git\config. However, when that feature is not in effect (which seems to be the case in newer Windows 10 versions), we'd rather not use those files unless they come from a trusted source, such as an administrator. This change imitates the strategy chosen by PowerShell's native OpenSSH port to Windows regarding host key files: if a system file is owned neither by an administrator, a system account, or the current user, it is ignored.

5774b2b1

2019-08-11T23:42:45

Merge pull request #5113 from pks-t/pks/stash-perf stash: avoid recomputing tree when committing worktree

42bacbc6

2019-08-11T21:06:19

Merge pull request #5121 from pks-t/pks/variadic-errors Variadic macros

fba3bf79

2019-07-21T14:15:12

blob: optionally read attributes from repository When `GIT_BLOB_FILTER_ATTTRIBUTES_FROM_HEAD` is passed to `git_blob_filter`, read attributes from `gitattributes` files that are checked in to the repository at the HEAD revision. This passes the flag `GIT_FILTER_ATTRIBUTES_FROM_HEAD` to the filter functions.

f0f27c1c

2019-07-21T14:13:25

filter: optionally read attributes from repository When `GIT_FILTER_ATTRIBUTES_FROM_HEAD` is specified, configure the filter to read filter attributes from `gitattributes` files that are checked in to the repository at the HEAD revision. This passes the flag `GIT_ATTR_CHECK_INCLUDE_HEAD` to the attribute reading functions.

4fd5748c

2019-07-21T14:11:03

attr: optionally read attributes from repository When `GIT_ATTR_CHECK_INCLUDE_HEAD` is specified, read `gitattribute` files that are checked into the repository at the HEAD revision.

a5392eae

2019-07-21T12:13:07

blob: allow blob filtering to ignore system gitattributes Introduce `GIT_BLOB_FILTER_NO_SYSTEM_ATTRIBUTES`, which tells `git_blob_filter` to ignore the system-wide attributes file, usually `/etc/gitattributes`. This simply passes the appropriate flag to the attribute loading code.

22eb12af

2019-07-21T12:12:05

filter: add GIT_FILTER_NO_SYSTEM_ATTRIBUTES option Allow system-wide attributes (the ones specified in `/etc/gitattributes`) to be ignored if the flag `GIT_FILTER_NO_SYSTEM_ATTRIBUTES` is specified.

fa1a4c77

2019-07-21T11:03:01

blob: deprecate `git_blob_filtered_content` Users should now use `git_blob_filter`.

a32ab076

2019-07-21T10:56:42

blob: introduce git_blob_filter Provide a function to filter blobs that allows for more functionality than the existing `git_blob_filtered_content` function.

b0692d6b

2019-08-09T09:01:56

Merge pull request #4913 from implausible/feature/signing-rebase-commits Add sign capability to git_rebase_commit

998f9c15

2019-08-07T07:21:27

fixup: strange indentation

f627ba6c

2019-08-02T13:18:07

Merge pull request #5197 from pks-t/pks/remote-ifdeffed-block remote: remove unused block of code

e23c0b18

2019-08-02T07:52:58

remote: remove unused block of code In "remote.c", we have a chunk of code that is #ifdef'fed out via `#if 0` with a comment that we could export it as a helper function. The code was implemented in 2013 and ifdef'fed in 2014, which shows that there's clearly no interest in having such a helper at all. As this block has recently created some confusion about `p_getenv` due to it containing the only reference to that function in our codebase, let's remove this block altogether.

d588de7c

2019-08-02T07:51:02

Merge pull request #5191 from eaigner/master config: check if we are running in a sandboxed environment

952fbbfb

2019-08-01T20:04:11

config: check if we are running in a sandboxed environment On macOS the $HOME environment variable returns the path to the sandbox container instead of the actual user $HOME for sandboxed apps. To get the correct path, we have to get it from the password file entry.

722ba93f

2019-08-01T15:14:06

config: implement "onbranch" conditional With Git v2.23.0, the conditional include mechanism gained another new conditional "onbranch". As the name says, it will cause a file to be included if the "onbranch" pattern matches the currently checked out branch. Implement this new condition and add a bunch of tests.

1721ab04

2019-06-16T11:25:47

unix: posix: avoid use of variadic macro `p_snprintf` The macro `p_snprintf` is implemented as a variadic macro that calls `snprintf` directly with `__VA_ARGS__`. In C89, variadic macros are not allowed, but as the arguments of `p_snprintf` and `snprintf` are matching 1:1, we can fix this by simply removing the parameter list from `p_snprintf`.

63d8cd18

2019-06-16T11:17:17

apply: remove use of variadic error macro The macro `apply_err` is implemented as a variadic macro, which are not defined by C89. Convert it to a variadic function, instead.

27b8b31e

2019-08-01T11:57:03

parse: remove use of variadic macros which are not C89 compliant The macro `git_parse_error` is implemented in a variadic way so that it's possible to pass printf-style parameters. Unfortunately, variadic macros are not defined by C89 and thus we cannot use that functionality. But as we have implemented `git_error_vset` in the previous commit, we can now just use that instead. Convert `git_parse_error` to a variadic function and use `git_error_vset` to fix the compliance violation. While at it, move the function to "patch_parse.c".

c8e63812

2019-06-16T11:03:08

errors: introduce `git_error_vset` function Right now, we only provide a `git_error_set` that has a variadic function signature. It's impossible to drive this function in a C89-compliant way from other functions that have a variadic signature, though, like for example `git_parse_error`. Implement a new `git_error_vset` function that gets a `va_list` as parameter, fixing the above problem.

e8f63411

2019-08-01T11:29:58

Merge pull request #5186 from pks-t/pks/config-snapshot-separation config: separate file and snapshot backends

fb0730f1

2019-04-16T23:49:16

util: use 64 bit timer on Windows git__timer was originally implemented using a 32 bit timer since Windows XP did not support GetTickCount64. Windows XP was discontinued five years ago, so it should be safe to use the new API. As a benefit, we do not need to care about overflows for the next 585 million years.

c8e249b0

2019-07-29T10:51:22

object: deprecate git_object__size for removal In #5118 we remove the double-underscore to make it a normally-named public function. However, this is not an interesting function outside of the library and it takes up a name for something that could be more useful. Remove the single-underscore version as we have not done any releases with it.

37ebe9ad

2019-07-24T18:49:08

config_backend: rename internal structures The internal backend structures are kind-of legacy and do not really speak for themselves. Rename them accordingly to make them easier to understand.

2bff84ba

2019-07-26T21:02:56

config_file: separate out read-only backend To further distinguish the file writeable and readonly backends, separate the readonly backend into its own "config_snapshot.c" implementation. The snapshot backend can be generically used to snapshot any type of backend.

f0b10066

2019-07-24T18:37:14

config_file: fix cast of readonly backend In `backend_readonly_free`, the passed in config backend is being cast to a `diskfile_backend` instead of to a `diskfile_readonly_backend`. While this works out just fine because we only access its header values, which were shared between both backends, it is undefined behaviour. Use the correct type to fix this.

a3159df8

2019-07-24T18:31:43

config_file: remove shared `diskfile_header` struct The `diskfile_header` structure is shared between both `diskfile_backend` and `diskfile_readonly_backend`. The separation and resulting casting is confusing at times and a source for programming errors. Remove the shared structure and inline them directly.

271e5fba

2019-07-24T18:18:18

config_file: duplicate accessors for readonly backend While most functions of the readonly configuration backend are implemented separately from the writeable configuration backend, the two functions `config_iterator_new` and `config_get` are shared between both. This sharing makes it necessary to have some shared data structures, which is the `diskfile_header` structure. Unfortunately, this makes the backends harder to grasp than necessary due to all the casting between structs and also quite error prone. Reimplement those functions for the readonly backends. As readonly backends cannot be refreshed anyway, we can remove the calls to `config_refresh` in there.

4e7ce1fb

2019-07-24T18:13:52

config_file: reimplement `config_readonly_open` generically The `config_readonly_open` function currently receives as input a diskfile backend and will copy its entries to a new snapshot. This is rather intimate, as we need to assume that the source config backend is in fact a diskfile entry. We can do better than this though by using generic methods to copy contents of the provided backend, e.g. by using a config iterator. This also allows us to decouple the read-only backend from the read-write backend.

76182e84

2019-07-24T18:04:38

config_entries: fix possible segfault when duplicating entries When duplicating a configuration entry, we allocate a new entry but do not verify that we get a valid pointer back. As we're dereferencing the pointer afterwards, we might thus run into a segfault in out-of-memory situations. Extract a new function `git_config_entries_dup_entry` that handles the complete entry duplication. Fix the error by using `GIT_ERROR_CHECK_ALLOC`.

ba2885da

2019-07-24T18:05:28

git_net_url_parse: don't git_buf_decode_percent for path

2766b92d

2019-07-21T15:10:34

config_file: refresh when creating an iterator When creating a new iterator for a config file backend, then we should always make sure that we're up to date by calling `config_refresh`. Otherwise, we might not notice when another process has modified the configuration file and thus will represent outdated values. Add two tests to config::stress that verify that we get up-to-date values when reading configuration entries via `git_config_iterator`.

9fac8b78

2019-07-21T15:08:22

config_file: do not refresh read-only backends If calling `config_refresh` on a read-only configuration file backend, then we will segfault when comparing the timestamp of the file due to `path` being uninitialized. As a read-only snapshot should not be refreshed anyway and stay consistent, we can simply return early when calling `config_refresh` on a read-only snapshot.

28d11b59

2019-07-21T14:41:21

config_file: consistently use `GIT_CONTAINER_OF`

6be5ac23

2019-07-11T15:30:51

checkout: postpone creation of symlinks to the end On most platforms it's fine to create symlinks to nonexisting files. Not so on Windows, where the type of a symlink (file or directory) needs to be set at creation time. So depending on whether the target file exists or not, we may end up with different symlink types. This creates a problem when performing checkouts, where we simply iterate over all blobs that need to be updated without treating symlinks any special. If the target file of the symlink is going to be checked out after the symlink itself, then the symlink will be created as directory symlink and not as file symlink. Fix the issue by iterating over blobs twice: once to perform postponed deletions and updates to non-symlink blobs, and once to perform updates to symlink blobs.

50194dcd

2019-07-11T15:14:42

win32: fix symlinks to relative file targets When creating a symlink in Windows, one needs to tell Windows whether the symlink should be a file or directory symlink. To determine which flag to pass, we call `GetFileAttributesW` on the target file to see whether it is a directory and then pass the flag accordingly. The problem though is if create a symlink with a relative target path, then we will check that relative path while not necessarily being inside of the working directory where the symlink is to be created. Thus, getting its attributes will either fail or return attributes of the wrong target. Fix this by resolving the target path relative to the directory in which the symlink is to be created.

a00842c4

2019-06-29T09:59:14

win32: correctly unlink symlinks to directories When deleting a symlink on Windows, then the way to delete it depends on whether it is a directory symlink or a file symlink. In the first case, we need to use `DeleteFile`, in the second `RemoveDirectory`. Right now, `p_unlink` will only ever try to use `DeleteFile`, though, and thus fail to remove directory symlinks. This mismatches how unlink(3P) is expected to behave, though, as it shall remove any symlink disregarding whether it is a file or directory symlink. In order to correctly unlink a symlink, we thus need to check what kind of file this is. If we were to first query file attributes of every file upon calling `p_unlink`, then this would penalize the common case though. Instead, we can try to first delete the file with `DeleteFile` and only if the error returned is `ERROR_ACCESS_DENIED` will we query file attributes and determine whether it is a directory symlink to use `RemoveDirectory` instead.

ded77bb1

2019-06-29T09:58:34

path: extract function to check whether a path supports symlinks When initializing a repository, we need to check whether its working directory supports symlinks to correctly set the initial value of the "core.symlinks" config variable. The code to check the filesystem is reusable in other parts of our codebase, like for example in our tests to determine whether certain tests can be expected to succeed or not. Extract the code into a new function `git_path_supports_symlinks` to avoid duplicate implementations. Remove a duplicate implementation in the repo test helper code.

e54343a4

2019-06-29T09:17:32

fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.

a7d32d60

2019-07-20T18:46:32

stash: avoid recomputing tree when committing worktree When creating a new stash, we need to create there separate commits storing differences stored in the index, untracked changes as well as differences in the working directory. The first two will only be done conditionally if the equivalent options "git stash --keep-index --include-untracked" are being passed to `git_stash_save`, but even when only creating a stash of worktree changes we're much slower than git.git. Using our new stash example: $ time git stash Saved working directory and index state WIP on (no branch): 2f7d9d47575e Linux 5.1.7 real 0m0.528s user 0m0.309s sys 0m0.381s $ time lg2 stash real 0m27.165s user 0m13.645s sys 0m6.403s As can be seen, libgit2 is more than 50x slower than git.git! When creating the stash commit that includes all worktree changes, we create a completely new index to prepare for the new commit and populate it with the entries contained in the index' tree. Here comes the catch: by populating the index with a tree's contents, we do not have any stat caches in the index. This means that we have to re-validate every single file from the worktree and see whether it has changed. The issue can be fixed by populating the new index with the repo's existing index instead of with the tree. This retains all stat cache information, and thus we really only need to check files that have changed stat information. This is semantically equivalent to what we previously did: previously, we used the tree of the commit computed from the index. Now we're just using the index directly. And, in fact, the cache is doing wonders: time lg2 stash real 0m1.836s user 0m1.166s sys 0m0.663s We're now performing 15x faster than before and are only 3x slower than git.git now.

a613832e

2019-07-20T18:49:48

patch_parse: fix segfault due to line containing static contents With commit dedf70ad2 (patch_parse: do not depend on parsed buffer's lifetime, 2019-07-05), all lines of the patch are allocated with `strdup` to make lifetime of the parsed patch independent of the buffer that is currently being parsed. In patch b08932824 (patch_parse: ensure valid patch output with EOFNL, 2019-07-11), we introduced another code location where we add lines to the parsed patch. But as that one was implemented via a separate pull request, it wasn't converted to use `strdup`, as well. As a consequence, we generate a segfault when trying to deallocate the potentially static buffer that's now in some of the lines. Use `git__strdup` to fix the issue.

e07dbc92

2019-07-20T11:26:00

Merge pull request #5173 from pks-t/pks/gitignore-wildmatch-error ignore: fix determining whether a shorter pattern negates another

fd7a384b

2019-07-20T11:24:37

Merge pull request #5159 from pks-t/pks/patch-parse-old-missing-nl patch_parse: handle missing newline indicator in old file

f33ca472

2019-07-20T11:06:23

Merge pull request #5158 from pks-t/pks/patch-parsed-lifetime patch_parse: do not depend on parsed buffer's lifetime

d78a1b18

2019-07-20T11:04:53

Merge pull request #5174 from pks-t/pks/winhttp-hash sha1: fix compilation of WinHTTP backend

964c1c60

2019-07-20T11:02:30

Merge pull request #5176 from pks-t/pks/repo-template-head repository: do not initialize HEAD if it's provided by templates

9d46f167

2019-07-19T10:50:51

repository: do not initialize HEAD if it's provided by templates When using templates to initialize a git repository, then git-init(1) will copy over all contents of the template directory. These will be preferred over the default ones created by git-init(1). While we mostly do the same, there is the exception of "HEAD". While we do copy over the template's HEAD file, afterwards we'll immediately re-initialize its contents with either the default "ref: refs/origin/master" or the init option's `initial_head` field. Let's fix the inconsistency with upstream git-init(1) by not overwriting the template HEAD, but only if the user hasn't set `opts.initial_head`. If the `initial_head` field has been supplied, we should use that indifferent from whether the template contained a HEAD file or not. Add tests to verify we correctly use the template directory's HEAD file and that `initial_head` overrides the template.

f3134a84

2019-07-19T10:41:10

repository: update error handling in `init_ext` Update `git_repository_init_ext` to use our typical style of error handling. The function had multiple statements which didn't `goto out` immediately but instead deferred it to later calls combined with `if` statements.

869ae5a3

2019-07-19T10:15:43

repository: avoid swallowing error codes in `create_head` The error handling in `git_repository_create_head` completely swallows all error codes. While probably not too much of a problem, this also violates our usual coding style. Refactor the code to use a local `error` variable with the typical `goto out` statements.

3424c210

2019-07-19T08:00:13

Merge pull request #5138 from libgit2/ethomson/cvar configuration: cvar -> configmap

a33c0de2

2019-07-18T19:17:40

Merge pull request #5172 from bk2204/cache-efficient-eviction Evict cache items more efficiently

658022c4

2019-07-18T13:53:41

configuration: cvar -> configmap `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.

343fb83a

2019-07-18T13:50:47

Merge pull request #5156 from pks-t/pks/attr-macros-in-subdir gitattributes: ignore macros defined in subdirectories

7574564e

2019-07-18T13:40:34

sha1: win32: fix compilation due to unknown type In commit bbf034ab9 (hash: move `git_hash_prov` into Win32 backend, 2019-02-22), the `git_hash_prov`'s structure name has been removed in favour of its typedef'ed name. But as we have no CI that compiles with the WinHTTPS hashing backend right now, it wasn't noticed that the implementation that uses this struct wasn't changed correctly. Fix the struct type to make it compile again.

6f6340af

2019-07-18T11:57:55

ignore: fix determining whether a shorter pattern negates another When computing whether we need to store a negative pattern, we iterate through all previously known patterns and check whether the negative pattern undoes any of the previous ones. In doing so we call `wildmatch` and check it's return for any negative error values. If there was a negative return, we will abort and bubble up that error to the caller. In fact, this check for negative values stems from the time where we still used `fnmatch` instead of `wildmatch`. For `fnmatch`, negative values indicate a "real" error, while for `wildmatch` a negative value may be returned if the matching was prematurely aborted. A premature abort may for example also happen if the pattern matches a prefix of the haystack if the pattern is shorter. Returning an error in that case is the wrong thing to do. Fix the code to compare for equality with `WM_MATCH`, only. Negative values returned by `wildmatch` are perfectly fine and thus should be ignored. Add a test that verifies we do not see the error.

770b91b1

2019-07-17T15:59:54

cache: evict items more efficiently When our object cache is full, we pick eight items (or the whole cache, if there are fewer) and evict them. For small cache sizes, this is fine, but when we're dealing with a large number of objects, we can repeatedly exhaust the cache and spend a large amount of time in git_oidmap_iterate trying to find items to evict. Instead, let's assume that if the cache gets full, we have a large number of objects that we're handling, and be more aggressive about evicting items. Let's remove one item for every 2048 items, but not less than 8. This causes us to scale our evictions in proportion to the size of the cache and significantly reduces the time we spend in git_oidmap_iterate. Before this change, a full pack of all the non-blob objects in the Linux repository took in excess of 30 minutes and spent 62.3% of total runtime in odb_read_1 and its children, and 44.3% of the time in git_oidmap_iterate. With this change, the same operation now takes 14 minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total time, whereas git_oidmap_iterate consists of 6.2%. Note that we do spend a little more time inflating objects and a decent amount more time in memcmp. However, overall, the time taken is significantly improved, and time in pack building is now dominated by git_delta_create_from_index (33.7%), which is what we would expect.

c4df926b

2019-07-16T21:54:10

pack-objects: allocate memory more efficiently The packbuilder code allocates memory in chunks. When it needs to allocate, it tries to add 1024 to the number of objects and multiply by 3/2. However, it actually multiplies by 1 instead, since it performs an integral division in the expression "3 / 2" and only then multiplies by the increased number of objects. The current behavior causes the code to waste massive amounts of time copying memory when it reallocates, causing inserting all non-blob objects in the Linux repository into a new pack to take some indeterminate time greater than 5 minutes instead of 52 seconds. Correct this error by first dividing by two, and only then multiplying by 3. We still check for overflow for the multiplication, which is the only part that can overflow. This appears to be the only place in the code base which has this problem.

f8346905

2019-07-12T09:03:33

attr_file: ignore macros defined in subdirectories Right now, we are unconditionally applying all macros found in a gitatttributes file. But quoting gitattributes(5): Custom macro attributes can be defined only in top-level gitattributes files ($GIT_DIR/info/attributes, the .gitattributes file at the top level of the working tree, or the global or system-wide gitattributes files), not in .gitattributes files in working tree subdirectories. The built-in macro attribute "binary" is equivalent to: So gitattribute files in subdirectories of the working tree may explicitly _not_ contain macro definitions, but we do not currently enforce this limitation. This patch introduces a new parameter to the gitattributes parser that tells whether macros are allowed in the current file or not. If set to `false`, we will still parse macros, but silently ignore them instead of adding them to the list of defined macros. Update all callers to correctly determine whether the to-be-parsed file may contain macros or not. Most importantly, when walking up the directory hierarchy, we will only set it to `true` once it reaches the root directory of the repo itself. Add a test that verifies that we are indeed not applying macros from subdirectories. Previous to these changes, the test would've failed.

97968529

2019-07-05T08:05:16

attr_file: refactor `parse_buffer` function The gitattributes code is one of our oldest and most-untouched codebases in libgit2, and as such its code style doesn't quite match our current best practices. Refactor the function `git_attr_file__parse_buffer` to better match them.

dbc7e4b1

2019-07-05T07:53:02

attr_file: refactor `load_standalone` function The gitattributes code is one of our oldest and most-untouched codebases in libgit2, and as such its code style doesn't quite match our current best practices. Refactor the function `git_attr_file__lookup_standalone` to better match them.

be8f9bb1

2019-07-05T13:33:10

attrcache: fix memory leak if inserting invalid macro to cache A macro without any assignments is considered an invalid macro by the attributes cache and is thus not getting added to the macro map at all. But as `git_attr_cache__insert_macro` returns success with neither free'ing nor adopting the macro into its map, this will cause a memory leak. Fix this by freeing the macro in the function if it's not going to be added. This is perfectly fine to do, as callers assume that the attrcache will have the macro adopted on success anyway.

7277bf83

2019-07-05T13:33:05

attrcache: fix multiple memory leaks when inserting macros The function `git_attr_cache__insert_macro` is responsible for adopting macros in the per-repo macro cache. When adding a macro that replaces an already existing macro (e.g. because of re-parsing gitattributes files), then we do not free the previous macro and thus cause a memory leak. Fix this leak by first checking if the cache already has a macro defined with the same name. If so, free it before replacing the cache entry with the new instance.

5ae22a63

2019-06-21T08:13:31

fileops: fix creation of directory in filesystem root In commit 45f24e787 (git_repository_init: stop traversing at windows root, 2019-04-12), we have fixed `git_futils_mkdir` to correctly handle the case where we create a directory in Windows-style filesystem roots like "C:\repo". The problem here is an off-by-one: previously, to that commit, we've been checking wether the parent directory's length is equal to the root directory's length incremented by one. When we call the function with "/example", then the parent directory's length ("/") is 1, but the root directory offset is 0 as the path is directly rooted without a drive prefix. This resulted in `1 == 0 + 1`, which was true. With the change, we've stopped incrementing the root directory length, and thus now compare `1 <= 0`, which is false. The previous way of doing it was kind of finicky any non-obvious, which is also why the error was introduced. So instead of just re-adding the increment, let's explicitly add a condition that aborts finding the parent if the current parent path is "/". Making this change causes Azure Pipelines to fail the testcase repo::init::nonexistent_paths on Unix-based systems. This is because we have just fixed creating directories in the filesystem root, which previously didn't work. As Docker-based tests are running as root user, we are thus able to create the non-existing path and will now succeed to create the repository that was expected to actually fail. Let's split this up into three different tests: - A test to verify that we do not create repos in a non-existing parent directoy if the flag `GIT_REPOSITORY_INIT_MKPATH` is not set. - A test to verify that we fail if the root directory does not exist. As there is a common root directory on Unix-based systems that always exist, we can only test for this on Windows-based systems. - A test to verify that we fail if trying to create a repository in an unwriteable parent directory. We can only test this if not running tests as root user, as CAP_DAC_OVERRIDE will cause us to ignore permissions when creating files.

b0893282

2019-07-11T12:12:04

patch_parse: ensure valid patch output with EOFNL

3f855fe8

2019-07-05T11:06:33

patch_parse: handle missing newline indicator in old file When either the old or new file contents have no newline at the end of the file, then git-diff(1) will print out a "\ No newline at end of file" indicator. While we do correctly handle this in the case where the new file has this indcator, we fail to parse patches where the old file is missing a newline at EOF. Fix this bug by handling and missing newline indicators in the old file. Add tests to verify that we can parse such files.

b30dab8f

2019-07-11T12:10:48

apply: refactor to use a switch statement

001d76e1

2019-07-11T11:34:40

diff: ignore EOFNL for computing patch IDs The patch ID is supposed to be mostly context-insignificant and thus only includes added or deleted lines. As such, we shouldn't honor end-of-file-without-newline markers in diffs. Ignore such lines to fix how we compute the patch ID for such diffs.

dbeadf8a

2019-07-11T10:56:05

config_parse: provide parser init and dispose functions Right now, all configuration file backends are expected to directly mess with the configuration parser's internals in order to set it up. Let's avoid doing that by implementing both a `git_config_parser_init` and `git_config_parser_dispose` function to clearly define the interface between configuration backends and the parser. Ideally, we would make the `git_config_parser` structure definition private to its implementation. But as that would require an additional memory allocation that was not required before we just live with it being visible to others.

32157526

2019-07-11T11:10:02

config_file: refactor error handling in `config_write` Error handling in `config_write` is rather convoluted and does not match our current code style. Refactor it to make it easier to understand.

820fa1a3

2019-07-11T11:04:33

config_file: internalize `git_config_file` struct With the previous commits, we have finally separated the config parsing logic from the specific configuration file backend. Due to that, we can now move the `git_config_file` structure into the config file backend's implementation so that no other code may accidentally start using it again. Furthermore, we rename the structure to `diskfile` to make it obvious that it is internal, only, and to unify it with naming scheme of the other diskfile structures.

6e6da75f

2019-07-11T11:00:05

config_parse: remove use of `git_config_file` The config parser code needs to keep track of the current parsed file's name so that we are able to provide proper error messages to the user. Right now, we do that by storing a `git_config_file` in the parser structure, but as that is a specific backend and the parser aims to be generic, it is a layering violation. Switch over to use a simple string to fix that.

54d350e0

2019-06-21T12:53:43

config_file: embed file in diskfile parse data The config file code needs to keep track of the actual `git_config_file` structure, as it not only contains the path of the current configuration file, but it also keeps tracks of all includes of that file. Right now, we keep track of that structure via the `git_config_parser`, but as that's supposed to be a backend generic implementation of configuration parsing it's a layering violation to have it in there. Switch over the config file backend to use its own config file structure that's embedded in the backend parse data. This allows us to switch over the generic config parser to avoid using the `git_config_file` structure.

76749dfb

2019-06-21T12:33:31

config_parse: rename `data` parameter to `payload` for clarity By convention, parameters that get passed to callbacks are usually named `payload` in our codebase. Rename the `data` parameters in the configuration parser callbacks to `payload` to avoid confusion.

2ba7020f

2019-06-27T09:23:59

config_file: avoid re-reading files on write When we rewrite the configuration file due to any of its values being modified, we call `config_refresh` to update the in-memory representation of our config file backend. This is needlessly wasteful though, as `config_refresh` will always open the on-disk representation to reads the file contents while we already know the complete file contents at this point in time as we have just written it to disk. Implement a new function `config_refresh_from_buffer` that will refresh the backend's config entries from a buffer instead of from the config file itself. Note that this will thus _not_ update the backend's timestamp, which will cause us to re-read the buffer when performing a read operation on it. But this is still an improvement as we now lazily re-read the contents, and most importantly we will avoid constantly re-reading the contents if we perform multiple write operations. The following strace demonstrates this if we're re-writing a key multiple times. It uses our config example with `config_set` changed to update the file 10 times with different keys: $ strace lg2 config x.x z |& grep '^open.*config' open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 And now with the optimization of `config_refresh_from_buffer`: $ strace lg2 config x.x z |& grep '^open.*config' open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 As can be seen, this is quite a lot of `open` calls less.

a0dc3027

2019-06-27T08:54:51

config_file: split out function that sets config entries Updating a config file backend's config entries is a bit more involved, as it requires clearing of the old config entries as well as handling locking correctly. As we will need this functionality in a future patch to refresh config entries from a buffer, let's extract this into its own function `config_set_entries`.

985f5cdf

2019-06-27T08:41:16

config_file: split out function that reads entries from a buffer The `config_read` function currently performs both reading the on-disk config file as well as parsing the retrieved buffer contents. To optimize how we refresh our config entries from an in-memory buffer, we need to be able to directly parse buffers, though, without involving any on-disk files at all. Extract a new function `config_read_buffer` that sets up the parsing logic and then parses config entries from a buffer, only. Have `config_read` use it to avoid duplicated logic.

3e1c137a

2019-06-27T08:24:21

config_file: move refresh into `write` function We are quite lazy in how we refresh our config file backend when updating any of its keys: instead of just updating our in-memory representation of the keys, we just discard the old set of keys and then re-read the config file contents from disk. This refresh currently happens separately at every callsite of `config_write`, but it is clear that we _always_ want to refresh if we have written the config file to disk. If we didn't, then we'd run around with an outdated config file backend that does not represent what we have on disk. By moving the refresh into `config_write`, we are also able to optimize the case where the config file is currently locked. Before, we would've tried to re-read the file even if we have only updated its cached contents without touching the on-disk file. Thus we'd have unnecessarily stat'd the file, even though we know that it shouldn't have been modified in the meantime due to its lock.

d7f58eab

2019-06-21T11:55:21

config_file: implement stat cache to avoid repeated rehashing To decide whether a config file has changed, we always hash its complete contents. This is unnecessarily expensive, as well-behaved filesystems will always update stat information for files which have changed. So before computing the hash, we should first check whether the stat info has actually changed for either the configuration file or any of its includes. This avoids having to re-read the configuration file and its includes every time when we check whether it's been modified. Tracing the for-each-ref example previous to this commit, one can see that we repeatedly re-open both the repo configuration as well as the global configuration: $ strace lg2 for-each-ref |& grep config access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) access("/home/pks/.config/git/config", F_OK) = 0 access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 access("/tmp/repo/.git/config", F_OK) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05290) = -1 ENOENT (No such file or directory) access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 access("/home/pks/.config/git/config", F_OK) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c051f0) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 With the change, we only do stats for those files and open them a single time, only: $ strace lg2 for-each-ref |& grep config access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) access("/home/pks/.config/git/config", F_OK) = 0 access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 access("/tmp/repo/.git/config", F_OK) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffe70540d20) = -1 ENOENT (No such file or directory) access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 access("/home/pks/.config/git/config", F_OK) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540ca0) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540c80) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 The following benchmark has been performed with and without the stat cache in a best-of-ten run: ``` int lg2_repro(git_repository *repo, int argc, char **argv) { git_config *cfg; int32_t dummy; int i; UNUSED(argc); UNUSED(argv); check_lg2(git_repository_config(&cfg, repo), "Could not obtain config", NULL); for (i = 1; i < 100000; ++i) git_config_get_int32(&dummy, cfg, "foo.bar"); git_config_free(cfg); return 0; } ``` Without stat cache: $ time lg2 repro real 0m1.528s user 0m0.568s sys 0m0.944s With stat cache: $ time lg2 repro real 0m0.526s user 0m0.268s sys 0m0.258s This benchmark shows a nearly three-fold performance improvement. This change requires that we check our configuration stress tests as we're now in fact becoming more racy. If somebody is writing a configuration file at nearly the same time (there is a window of 100ns on Windows-based systems), then it might be that we realize that this file has actually changed and thus may not re-read it. This will only happen if either an external process is rewriting the configuration file or if the same process has multiple `git_config` structures pointing to the same time, where one of both is being used to write and the other one is used to read values.

d0868646

2019-06-21T11:43:09

config: use `git_config_file` in favor of `struct config_file`

398412cc

2019-07-05T11:56:16

Merge pull request #5143 from libgit2/ethomson/warnings ci: build with ENABLE_WERROR on Windows

thodg/libgit2/src

src

Log