kmx git

Commit	Date	Message
af33210b	2018-07-10T16:10:03	apply: introduce a delta callback Introduce a callback to the application options that allow callers to add a per-delta callback. The callback can return an error code to stop patch application, or can return a value to skip the application of a particular delta.
37b25ac5	2018-07-08T16:12:58	apply: move location to an argument, not the opts Move the location option to an argument, out of the options structure. This allows the options structure to be re-used for functions that don't need to know the location, since it's implicit in their functionality. For example, `git_apply_tree` should not take a location, but is expected to take all the other options.
2d27ddc0	2018-07-01T21:35:51	apply: use an indexwriter Place the entire `git_apply` operation inside an indexwriter, so that we lock the index before we begin performing patch application. This ensures that there are no other processes modifying things in the working directory.
813f0802	2018-07-01T15:14:36	apply: validate workdir contents match index for BOTH When applying to both the index and the working directory, ensure that the index contents match the working directory. This mirrors the requirement in `git apply --index`. This also means that - along with the prior commit that uses the working directory contents as the checkout baseline - we no longer expect conflicts during checkout. So remove the special-case error handling for checkout conflicts. (Any checkout conflict now would be because the file was actually modified between the start of patch application and the checkout.)
0f4b2f02	2018-07-01T15:13:50	reader: optionally validate index matches workdir When using a workdir reader, optionally validate that the index contents match the working directory contents.
9be89bbd	2018-07-01T11:08:26	reader: apply working directory filters When reading a file from the working directory, ensure that we apply any necessary filters to the item. This ensures that we get the repository-normalized data as the preimage, and further ensures that we can accurately compare the working directory contents to the index contents for accurate safety validation in the `BOTH` case.
5b8d5a22	2018-07-01T13:42:53	apply: use preimage as the checkout baseline Use the preimage as the checkout's baseline. This allows us to support applying patches to files that are modified in the working directory (those that differ from the HEAD and index). Without this, files will be reported as (checkout) conflicts. With this, we expect the on-disk data when we began the patch application (the "preimage") to be on-disk during checkout. We could have also simply used the `FORCE` flag to checkout to accomplish a similar mechanism. However, `FORCE` ignores all differences, while providing a preimage ensures that we will only overwrite the file contents that we actually read. Modify the reader interface to provide the OID to support this.
5b66b667	2018-06-29T12:39:41	apply: when preimage file is missing, return EAPPLYFAIL The preimage file being missing entirely is simply a case of an application failure; return the correct error value for the caller.
dddfff77	2018-06-30T17:12:16	apply: convert checkout conflicts to apply failures When there's a checkout conflict during apply, that means that the working directory was modified in a conflicting manner and the postimage cannot be written. During application, convert this to an application failure for consistency across workdir/index/both applications.
e0224121	2018-06-29T12:09:02	apply: simplify checkout vs index application Separate the concerns of applying via checkout and updating the repository's index. This results in simpler functionality and allows us to not build the temporary collection of paths in the index case.
3b5378c5	2018-06-25T16:27:06	apply: handle file deletions If the file was deleted in the postimage, do not attempt to update the target. Instead, ignore it and simply allow it to stay removed in our computed postimage. Also, test that we can handle file deletions.
f83bbe0a	2018-03-19T19:50:45	apply: introduce `git_apply` Introduce `git_apply`, which will take a `git_diff` and apply it to the working directory (akin to `git apply`), the index (akin to `git apply --cached`), or both (akin to `git apply --index`).
664cda6f	2018-03-19T20:10:38	apply: reimplement `git_apply_tree` with readers The generic `git_reader` interface simplifies `git_apply_tree` somewhat. Reimplement `git_apply_tree` with them.
d73043a2	2018-03-19T20:10:31	reader: a generic way to read files from repos Similar to the `git_iterator` interface, the `git_reader` interface will allow us to read file contents from an arbitrary repository-backed data source (trees, index, or working directory).
20f8a6db	2018-06-28T17:26:21	apply: remove deleted paths from index We update the index with the new_file side of the delta, but we need to explicitly remove the old_file path in the case where an item was deleted or renamed.
c3077ea0	2018-06-25T21:24:49	apply: return a specific exit code on failure Return `GIT_EAPPLYFAIL` on patch application failure so that users can determine that patch application failed due to a malformed/conflicting patch by looking at the error code.
d54aa9ae	2018-06-26T15:25:30	iterator: introduce `git_iterator_foreach` Introduce a `git_iterator_foreach` helper function which invokes a callback on all files for a given iterator.
9c34c996	2018-06-25T17:03:14	apply: handle file additions Don't attempt to read the postimage file during a file addition, simply use an empty buffer as the postimage. Also, test that we can handle file additions.
02b1083a	2018-01-28T23:25:07	apply: introduce `git_apply_tree` Introduce `git_apply_tree`, which will apply a `git_diff` to a given `git_tree`, allowing an in-memory patch application for a repository.
2b12dcf6	2018-03-19T19:45:11	iterator: optionally hash filesystem iterators Optionally hash the contents of files encountered in the filesystem or working directory iterators. This is not expected to be used in production code paths, but may allow us to simplify some test contexts. For working directory iterators, apply filters as appropriate, since we have the context able to do it.
623647af	2018-10-26T12:33:59	Merge pull request #4864 from pks-t/pks/object-parse-fixes Object parse fixes
7655b2d8	2018-10-19T10:29:19	commit: fix reading out of bounds when parsing encoding The commit message encoding is currently being parsed by the `git__prefixcmp` function. As this function does not accept a buffer length, it will happily skip over a buffer's end if it is not `NUL` terminated. Fix the issue by using `git__prefixncmp` instead. Add a test that verifies that we are unable to parse the encoding field if it's cut off by the supplied buffer length.
ee11d47e	2018-10-19T09:47:50	tag: fix out of bounds read when searching for tag message When parsing tags, we skip all unknown fields that appear before the tag message. This skipping is done by using a plain `strstr(buffer, "\n\n")` to search for the two newlines that separate tag fields from tag message. As it is not possible to supply a buffer length to `strstr`, this call may skip over the buffer's end and thus result in an out of bounds read. As `strstr` may return a pointer that is out of bounds, the following computation of `buffer_end - buffer` will overflow and result in an allocation of an invalid length. Fix the issue by using `git__memmem` instead. Add a test that verifies parsing the tag fails not due to the allocation failure but due to the tag having no message.
83e8a6b3	2018-10-18T16:08:46	util: provide `git__memmem` function Unfortunately, neither the `memmem` nor the `strnstr` functions are part of any C standard but are merely extensions of C that are implemented by e.g. glibc. Thus, there is no standardized way to search for a string in a block of memory with a limited size, and using `strstr` is to be considered unsafe in case where the buffer has not been sanitized. In fact, there are some uses of `strstr` in exactly that unsafe way in our codebase. Provide a new function `git__memmem` that implements the `memmem` semantics. That is in a given haystack of `n` bytes, search for the occurrence of a byte sequence of `m` bytes and return a pointer to the first occurrence. The implementation chosen is the "Not So Naive" algorithm from [1]. It was chosen as the implementation is comparably simple while still being reasonably efficient in most cases. Preprocessing happens in constant time and space, searching has a time complexity of O(n*m) with a slightly sub-linear average case. [1]: http://www-igm.univ-mlv.fr/~lecroq/string/
bea65980	2018-10-25T11:21:14	Merge pull request #4851 from pks-t/pks/strtol-removal strtol removal
305e801a	2018-10-21T09:52:32	util: allow callers to reset custom allocators Provide a utility to reset custom allocators back to their default. This is particularly useful for testing.
7c791f3d	2018-10-20T20:25:51	Merge pull request #4852 from libgit2/ethomson/unc_paths Win32 path canonicalization refactoring
6cc14ae3	2018-10-20T20:22:04	Merge pull request #4840 from libgit2/cmn/validity-tree-from-unowned-index Check object existence when creating a tree from an index
a2f9f94b	2018-10-20T20:18:04	Merge branch 'issue-4203'
32b81661	2018-10-20T20:16:32	merge: don't leak the index during reloads
0a4284b1	2018-10-19T14:54:13	Merge pull request #4819 from libgit2/cmn/config-nonewline Configuration variables can appear on the same line as the section header
ea19efc1	2018-10-18T15:08:56	util: fix out of bounds read in error message When an integer that is parsed with `git__strntol32` is too big to fit into an int32, we will generate an error message that includes the actual string that failed to parse. This does not acknowledge the fact that the string may either not be NUL terminated or alternative include additional characters after the number that is to be parsed. We may thus end up printing characters into the buffer that aren't the number or, worse, read out of bounds. Fix the issue by utilizing the `endptr` that was set by `git__strntol64`. This pointer is guaranteed to be set to the first character following the number, and we can thus use it to compute the width of the number that shall be printed. Create a test to verify that we correctly truncate the number.
a34f5b0d	2018-10-18T08:57:27	win32: refactor `git_win32_path_remove_namespace` Update `git_win32_path_remove_namespace` to disambiguate the prefix being removed versus the prefix being added. Now we remove the "namespace", and (may) add a "prefix" in its place. Eg, we remove the `\\?\` namespace. We remove the `\\?\UNC\` namespace, and replace it with the `\\` prefix. This aids readability somewhat. Additionally, use pointer arithmetic instead of offsets, which seems to also help readability.
b2e85f98	2018-10-17T08:48:43	win32: rename `git_win32__canonicalize_path` The internal API `git_win32__canonicalize_path` is far, far too easily confused with the internal API `git_win32_path_canonicalize`. The former removes the namespace prefix from a path (eg, given `\\?\C:\Temp\foo`, it returns `C:\Temp\foo`, and given `\\?\UNC\server\share`, it returns `\\server\share`). As such, rename it to `git_win32_path_remove_namespace`. `git_win32_path_canonicalize` remains unchanged.
b09c1c7b	2018-10-18T14:37:55	util: avoid signed integer overflows in `git__strntol64` While `git__strntol64` tries to detect integer overflows when doing the necessary arithmetics to come up with the final result, it does the detection only after the fact. This check thus relies on undefined behavior of signed integer overflows. Fix this by instead checking up-front whether the multiplications or additions will overflow. Note that a detected overflow will not cause us to abort parsing the current sequence of digits. In the case of an overflow, previous behavior was to still set up the end pointer correctly to point to the first character immediately after the currently parsed number. We do not want to change this now as code may rely on the end pointer being set up correctly even if the parsed number is too big to be represented as 64 bit integer.
8d7fa88a	2018-10-18T12:04:07	util: remove `git__strtol32` The function `git__strtol32` can easily be misused when untrusted data is passed to it that may not have been sanitized with trailing `NUL` bytes. As all usages of this function have now been removed, we can remove this function altogether to avoid future misuse of it.
2613fbb2	2018-10-18T11:58:14	global: replace remaining use of `git__strtol32` Replace remaining uses of the `git__strtol32` function. While these uses are all safe as the strings were either sanitized or from a trusted source, we want to remove `git__strtol32` altogether to avoid future misuse.
21652ee9	2018-10-18T11:43:30	tree-cache: avoid out-of-bound reads when parsing trees We use the `git__strtol32` function to parse the child and entry count of treecaches from the index, which do not accept a buffer length. As the buffer that is being passed in is untrusted data and may thus be malformed and may not contain a terminating `NUL` byte, we can overrun the buffer and thus perform an out-of-bounds read. Fix the issue by uzing `git__strntol32` instead.
68deb2cc	2018-10-18T11:37:10	util: remove unsafe `git__strtol64` function The function `git__strtol64` does not take a maximum buffer length as parameter. This has led to some unsafe usages of this function, and as such we may consider it as being unsafe to use. As we have now eradicated all usages of this function, let's remove it completely to avoid future misuse.
1a2efd10	2018-10-18T11:35:08	config: remove last instance of `git__strntol64` When parsing integers from configuration values, we use `git__strtol64`. This is fine to do, as we always sanitize values and can thus be sure that they'll have a terminating `NUL` byte. But as this is the last call-site of `git__strtol64`, let's just pass in the length explicitly by calling `strlen` on the value to be able to remove `git__strtol64` altogether.
3db9aa6f	2018-10-18T11:32:48	signature: avoid out-of-bounds reads when parsing signature dates We use `git__strtol64` and `git__strtol32` to parse the trailing commit or author date and timezone of signatures. As signatures are usually part of a commit or tag object and thus essentially untrusted data, the buffer may be misformatted and may not be `NUL` terminated. This may lead to an out-of-bounds read. Fix the issue by using `git__strntol64` and `git__strntol32` instead.
600ceadd	2018-10-18T11:29:06	index: avoid out-of-bounds read when reading reuc entry stage We use `git__strtol64` to parse file modes of the index entries, which does not limit the parsed buffer length. As the index can be essentially treated as "untrusted" in that the data stems from the file system, it may be misformatted and may not contain terminating `NUL` bytes. This may lead to out-of-bounds reads when trying to parse index entries with such malformatted modes. Fix the issue by using `git__strntol64` instead.
1a3fa1f5	2018-10-18T11:25:59	commit_list: avoid use of strtol64 without length limit When quick-parsing a commit, we use `git__strtol64` to parse the commit's time. The buffer that's passed to `commit_quick_parse` is the raw data of an ODB object, though, whose data may not be properly formatted and also does not have to be `NUL` terminated. This may lead to out-of-bound reads. Use `git__strntol64` to avoid this problem.
f010b66b	2018-10-17T14:33:47	Merge pull request #4849 from libgit2/cmn/expose-gitfile-check path: export the dotgit-checking functions
7615794c	2018-10-15T18:08:13	Merge pull request #4845 from pks-t/pks/object-fuzzer Object parsing fuzzer
9b6e4081	2018-10-15T17:08:38	Merge commit 'afd10f0' (Follow 308 redirects)
05e54e00	2018-10-15T13:54:17	path: export the dotgit-checking functions These checks are preformed by libgit2 on checkout, but they're also useful for performing checks in applications which do not involve checkout. Expose them under `sys/` as it's still fairly in the weeds even for this library.
1d718fa5	2018-09-28T11:55:34	config: variables might appear on the same line as a section header While rare and a machine would typically not generate such a configuration file, it is nevertheless valid to write [foo "bar"] baz = true and we need to deal with that instead of assuming everything is on its own line.
afd10f0b	2018-10-13T09:31:20	Follow 308 redirects (as used by GitLab)
814e7acb	2018-10-12T12:38:06	Merge pull request #4842 from nelhage/fuzz-config-memory config: Port config_file_fuzzer to the new in-memory backend.
463c21e2	2018-10-11T13:27:06	Apply code review feedback
6562cdda	2018-10-11T12:43:08	object: properly propagate errors on parsing failures When failing to parse a raw object fromits data, we free the partially parsed object but then fail to propagate the error to the caller. This may lead callers to operate on objects with invalid memory, which will sooner or later cause the program to segfault. Fix the issue by passing up the error code returned by `parse_raw`.
2d449a11	2018-10-09T02:42:14	config: Refactor `git_config_backend_from_string` to take a length
fd490d3e	2018-10-08T13:15:31	tree: unify the entry validity checks We have two similar functions, `git_treebuilder_insert` and `append_entry` which are used in different codepaths as part of creating a new tree. The former learnt to check for object existence under strict object creation, but the latter did not. This allowed the creation of a tree from an unowned index to bypass some of the checks and create a tree pointing to a nonexistent object. Extract a single function which performs these checks and call it from both codepaths. In `append_entry` we still do not validate when asked not to, as this is data which is already in the tree and we want to allow users to deal with repositories which already have some invalid data.
0cd976c8	2018-10-07T12:00:06	Merge pull request #4830 from pks-t/pks/diff-stats-rename-common diff_stats: use git's formatting of renames with common directories
475db39b	2018-10-06T12:58:06	ignore unsupported http authentication schemes auth_context_match returns 0 instead of -1 for unknown schemes to not fail in situations where some authentication schemes are supported and others are not. apply_credentials is adjusted to handle auth_context_match returning 0 without producing authentication context.
a8d447f6	2018-10-05T20:13:34	Merge pull request #4837 from pks-t/cmn/reject-option-submodule-url-path submodule: ignore path and url attributes if they look like options
ce8803a2	2018-10-05T20:03:38	Merge pull request #4836 from pks-t/pks/smart-packets Smart packet security fixes
c8ca3cae	2018-10-05T11:47:39	submodule: ignore path and url attributes if they look like options These can be used to inject options in an implementation which performs a recursive clone by executing an external command via crafted url and path attributes such that it triggers a local executable to be run. The library is not vulnerable as we do not rely on external executables but a user of the library might be relying on that so we add this protection. This matches this aspect of git's fix for CVE-2018-17456.
d06d4220	2018-10-05T10:56:02	config_file: properly ignore includes without "path" value In case a configuration includes a key "include.path=" without any value, the generated configuration entry will have its value set to `NULL`. This is unexpected by the logic handling includes, and as soon as we try to calculate the included path we will unconditionally dereference that `NULL` pointer and thus segfault. Fix the issue by returning early in both `parse_include` and `parse_conditional_include` in case where the `file` argument is `NULL`. Add a test to avoid future regression. The issue has been found by the oss-fuzz project, issue 10810.
e5090ee3	2018-10-04T11:19:28	diff_stats: use git's formatting of renames with common directories In cases where a file gets renamed such that the directories containing it previous and after the rename have a common prefix, then git will avoid printing this prefix twice and instead format the rename as "prefix/{old => new}". We currently didn't do anything like that, but simply printed "prefix/old -> prefix/new". Adjust our behaviour to instead match upstream. Adjust the test for this behaviour to expect the new format.
1bc5b05c	2018-10-03T16:17:21	smart_pkt: do not accept callers passing in no line length Right now, we simply ignore the `linelen` parameter of `git_pkt_parse_line` in case the caller passed in zero. But in fact, we never want to assume anything about the provided buffer length and always want the caller to pass in the available number of bytes. And in fact, checking all the callers, one can see that the funciton is never being called in case where the buffer length is zero, and thus we are safe to remove this check.
c05790a8	2018-08-09T11:16:15	smart_pkt: return parsed length via out-parameter The `parse_len` function currently directly returns the parsed length of a packet line or an error code in case there was an error. Instead, convert this to our usual style of using the return value as error code only and returning the actual value via an out-parameter. Thus, we can now convert the output parameter to an unsigned type, as the size of a packet cannot ever be negative. While at it, we also move the check whether the input buffer is long enough into `parse_len` itself. We don't really want to pass around potentially non-NUL-terminated buffers to functions without also passing along the length, as this is dangerous in the unlikely case where other callers for that function get added. Note that we need to make sure though to not mess with `GIT_EBUFS` error codes, as these indicate not an error to the caller but that he needs to fetch more data.
0b3dfbf4	2018-08-09T11:13:59	smart_pkt: reorder and rename parameters of `git_pkt_parse_line` The parameters of the `git_pkt_parse_line` function are quite confusing. First, there is no real indicator what the `out` parameter is actually all about, and it's not really clear what the `bufflen` parameter refers to. Reorder and rename the parameters to make this more obvious.
5fabaca8	2018-08-09T11:04:42	smart_pkt: fix buffer overflow when parsing "unpack" packets When checking whether an "unpack" packet returned the "ok" status or not, we use a call to `git__prefixcmp`. In case where the passed line isn't properly NUL terminated, though, this may overrun the line buffer. Fix this by using `git__prefixncmp` instead.
b5ba7af2	2018-08-09T11:03:37	smart_pkt: fix "ng" parser accepting non-space character When parsing "ng" packets, we blindly assume that the character immediately following the "ng" prefix is a space and skip it. As the calling function doesn't make sure that this is the case, we can thus end up blindly accepting an invalid packet line. Fix the issue by using `git__prefixncmp`, checking whether the line starts with "ng ".
a9f1ca09	2018-08-09T11:01:00	smart_pkt: fix buffer overflow when parsing "ok" packets There are two different buffer overflows present when parsing "ok" packets. First, we never verify whether the line already ends after "ok", but directly go ahead and also try to skip the expected space after "ok". Second, we then go ahead and use `strchr` to scan for the terminating newline character. But in case where the line isn't terminated correctly, this can overflow the line buffer. Fix the issues by using `git__prefixncmp` to check for the "ok " prefix and only checking for a trailing '\n' instead of using `memchr`. This also fixes the issue of us always requiring a trailing '\n'. Reported by oss-fuzz, issue 9749: Crash Type: Heap-buffer-overflow READ {*} Crash Address: 0x6310000389c0 Crash State: ok_pkt git_pkt_parse_line git_smart__store_refs Sanitizer: address (ASAN)
bc349045	2018-08-09T10:38:10	smart_pkt: fix buffer overflow when parsing "ACK" packets We are being quite lenient when parsing "ACK" packets. First, we didn't correctly verify that we're not overrunning the provided buffer length, which we fix here by using `git__prefixncmp` instead of `git__prefixcmp`. Second, we do not verify that the actual contents make any sense at all, as we simply ignore errors when parsing the ACKs OID and any unknown status strings. This may result in a parsed packet structure with invalid contents, which is being silently passed to the caller. This is being fixed by performing proper input validation and checking of return codes.
5edcf5d1	2018-08-09T10:57:06	smart_pkt: adjust style of "ref" packet parsing function While the function parsing ref packets doesn't have any immediately obvious buffer overflows, it's style is different to all the other parsing functions. Instead of checking buffer length while we go, it does a check up-front. This causes the code to seem a lot more magical than it really is due to some magic constants. Refactor the function to instead make use of the style of other packet parser and verify buffer lengths as we go.
786426ea	2018-08-09T10:46:58	smart_pkt: check whether error packets are prefixed with "ERR " In the `git_pkt_parse_line` function, we determine what kind of packet a given packet line contains by simply checking for the prefix of that line. Except for "ERR" packets, we always only check for the immediate identifier without the trailing space (e.g. we check for an "ACK" prefix, not for "ACK "). But for "ERR" packets, we do in fact include the trailing space in our check. This is not really much of a problem at all, but it is inconsistent with all the other packet types and thus causes confusion when the `err_pkt` function just immediately skips the space without checking whether it overflows the line buffer. Adjust the check in `git_pkt_parse_line` to not include the trailing space and instead move it into `err_pkt` for consistency.
40fd84cc	2018-08-09T10:46:26	smart_pkt: explicitly avoid integer overflows when parsing packets When parsing data, progress or error packets, we need to copy the contents of the rest of the current packet line into the flex-array of the parsed packet. To keep track of this array's length, we then assign the remaining length of the packet line to the structure. We do have a mismatch of types here, as the structure's `len` field is a signed integer, while the length that we are assigning has type `size_t`. On nearly all platforms, this shouldn't pose any problems at all. The line length can at most be 16^4, as the line's length is being encoded by exactly four hex digits. But on a platforms with 16 bit integers, this assignment could cause an overflow. While such platforms will probably only exist in the embedded ecosystem, we still want to avoid this potential overflow. Thus, we now simply change the structure's `len` member to be of type `size_t` to avoid any integer promotion.
4a5804c9	2018-08-09T10:36:44	smart_pkt: honor line length when determining packet type When we parse the packet type of an incoming packet line, we do not verify that we don't overflow the provided line buffer. Fix this by using `git__prefixncmp` instead and passing in `len`. As we have previously already verified that `len <= linelen`, we thus won't ever overflow the provided buffer length.
b36cc7a4	2018-09-27T11:18:00	fix check if blob is uninteresting when inserting tree to packbuilder Blobs that have been marked as uninteresting should not be inserted into packbuilder when inserting a tree. The check as to whether a blob was uninteresting looked at the status for the tree itself instead of the blob. This could cause significantly larger packfiles.
8ab11dd5	2018-09-30T16:40:22	Fix issue with path canonicalization for Win32 paths
0530d7d9	2018-09-28T18:04:23	Merge pull request #4767 from pks-t/pks/config-mem In-memory configuration
2be39cef	2018-08-10T19:38:57	config: introduce new read-only in-memory backend Now that we have abstracted away how to store and retrieve config entries, it became trivial to implement a new in-memory backend by making use of this. And thus we do so. This commit implements a new read-only in-memory backend that can parse a chunk of memory into a `git_config_backend` structure.
b78f4ab0	2018-08-16T12:22:03	config_entries: refactor entries iterator memory ownership Right now, the config file code requires us to pass in its backend to the config entry iterator. This is required with the current code, as the config file backend will first create a read-only snapshot which is then passed to the iterator just for that purpose. So after the iterator is getting free'd, the code needs to make sure that the snapshot gets free'd, as well. By now, though, we can easily refactor the code to be more efficient and remove the reverse dependency from iterator to backend. Instead of creating a read-only snapshot (which also requires us to re-parse the complete configuration file), we can simply duplicate the config entries and pass those to the iterator. Like that, the iterator only needs to make sure to free the duplicated config entries, which is trivial to do and clears up memory ownership by a lot.
d49b1365	2018-08-10T19:01:37	config_entries: internalize structure declarations Access to the config entries is now completely done via the modules function interface and no caller messes with the struct's internals. We can thus completely move the structure declarations into the implementation file so that nobody even has a chance to mess with the members.
123e5963	2018-08-10T18:59:59	config_entries: abstract away reference counting Instead of directly calling `git_atomic_inc` in users of the config entries store, provide a `git_config_entries_incref` function to further decouple the interfaces. Convert the refcount to a `git_refcount` structure while at it.
5a7e0b3c	2018-08-10T18:49:38	config_entries: abstract away iteration over entries The nice thing about our `git_config_iterator` interfaces is that nobody needs to know anything about the implementation details. All that is required is to obtain the iterator via any backend and then use it by executing generic functions. We can thus completely internalize all the implementation details of how to iterate over entries into the config entries store and simply create such an iterator in our config file backend when we want to iterate its entries. This further decouples the config file backend from the config entries store.
60ebc137	2018-08-10T14:53:09	config_entries: abstract away retrieval of config entries The code accessing config entries in the `git_config_entries` structure is still much too intimate with implementation details, directly accessing the maps and handling indices. Provide two new functions to get config entries from the internal map structure to decouple the interfaces and use them in the config file code. The function `git_config_entries_get` will simply look up the entry by name and, in the case of a multi-value, return the last occurrence of that entry. The second function, `git_config_entries_get_unique`, will only return an entry if it is unique and not included via another configuration file. This one is required to properly implement write operations for single entries, as we refuse to write to or delete a single entry if it is not clear which one was meant.
fb8a87da	2018-08-10T14:50:15	config_entries: rename functions and structure The previous commit simply moved all code that is required to handle config entries to a new module without yet adjusting any of the function and structure names to help readability. We now rename things accordingly to have a common "git_config_entries" entries instead of the old "diskfile_entries" one.
04f57d51	2018-08-10T13:33:02	config_entries: pull out implementation of entry store The configuration entry store that is used for configuration files needs to keep track of all entries in two different structures: - a singly linked list is being used to be able to iterate through configuration files in the order they have been found - a string map is being used to efficiently look up configuration entries by their key This store is thus something that may be used by other, future backends as well to abstract away implementation details and iteration over the entries. Pull out the necessary functions from "config_file.c" and moves them into their own "config_entries.c" module. For now, this is simply moving over code without any renames and/or refactorings to help reviewing.
d75bbea1	2018-08-10T14:35:23	config_file: remove unnecessary snapshot indirection The implementation for config file snapshots has an unnecessary redirection from `config_snapshot` to `git_config_file__snapshot`. Inline the call to `git_config_file__snapshot` and remove it.
b944e137	2018-08-10T13:03:33	config: rename "config_file.h" to "config_backend.h" The header "config_file.h" has a list of inline-functions to access the contents of a config backend without directly messing with the struct's function pointers. While all these functions are called "git_config_file_", they are in fact completely backend-agnostic and don't care whether it is a file or not. Rename all the function to instead be backend-agnostic versions called "git_config_backend_" and rename the header to match.
1aeff5d7	2018-08-10T12:52:18	config: move function normalizing section names into "config.c" The function `git_config_file_normalize_section` is never being used in any file different than "config.c", but it is implemented in "config_file.c". Move it over and make the symbol static.
a5562692	2018-08-10T12:49:50	config: make names backend-agnostic As a last step to make variables and structures more backend agnostic for our `git_config` structure, rename local variables to not be called `file` anymore.
ba1cd495	2018-09-28T11:10:49	Merge pull request #4784 from tiennou/fix/warnings Some warnings
367f6243	2018-09-28T11:04:06	Merge pull request #4803 from tiennou/fix/4802 index: release the snapshot instead of freeing the index
e0afd1c2	2018-09-26T21:17:39	vector: do not realloc 0-size vectors
fa48d2ea	2018-09-26T19:15:35	vector: do not malloc 0-length vectors on dup
be4717d2	2018-09-18T12:12:06	path: fix "comparison always true" warning
21496c30	2018-03-30T00:20:23	stransport: fix a warning on iOS "warning: values of type 'OSStatus' should not be used as format arguments; add an explicit cast to 'int' instead [-Wformat]"
a54043b7	2018-09-21T15:32:18	Merge pull request #4794 from marcin-krystianc/mkrystianc/prune_perf git_remote_prune to be O(n * logn)
83733aeb	2018-08-10T12:43:21	config: rename `file_internal` and its `file` member Same as with the previous commit, the `file_internal` struct is used to keep track of all the backends that are added to a `git_config` struct. Rename it to `backend_internal` and rename its `file` member to `backend` to make the implementation more backend-agnostic.
633cf40c	2018-08-10T12:39:17	config: rename `files` vector to `backends` Originally, the `git_config` struct is a collection of all the parsed configuration files from different scopes (system-wide config, user-specific config as well as the repo-specific config files). Historically, we didn't and don't yet have any other configuration backends than the one for files, which is why the field holding the config backends is called `files`. But in fact, nothing dictates that the vector of backends actually holds file backends only, as they are generic and custom backends can be implemented by users. Rename the member to be called `backends` to clarify that there is nothing specific to files here.
b9affa32	2018-08-10T19:23:00	config_parse: avoid unused static declared values The variables `git_config_escaped` and `git_config_escapes` are both defined as static const character pointers in "config_parse.h". In case where "config_parse.h" is included but those two variables are not being used, the compiler will thus complain about defined but unused variables. Fix this by declaring them as external and moving the actual initialization to the C file. Note that it is not possible to simply make this a #define, as we are indexing into those arrays.
0b9c68b1	2018-08-16T14:10:58	submodule: fix submodule names depending on config-owned memory When populating the list of submodule names, we use the submodule configuration entry's name as the key in the map of submodule names. This creates a hidden dependency on the liveliness of the configuration that was used to parse the submodule, which is fragile and unexpected. Fix the issue by duplicating the string before writing it into the submodule name map.
e181a649	2018-09-18T03:03:03	Merge pull request #4809 from libgit2/cmn/revwalk-sign-regression Fix revwalk limiting regression
12a1790d	2018-09-17T14:49:46	revwalk: only check the first commit in the list for an earlier timestamp This is not a big deal, but it does make us match git more closely by checking only the first. The lists are sorted already, so there should be no functional difference other than removing a possible check from every iteration in the loop.

af33210b

2018-07-10T16:10:03

apply: introduce a delta callback Introduce a callback to the application options that allow callers to add a per-delta callback. The callback can return an error code to stop patch application, or can return a value to skip the application of a particular delta.

37b25ac5

2018-07-08T16:12:58

apply: move location to an argument, not the opts Move the location option to an argument, out of the options structure. This allows the options structure to be re-used for functions that don't need to know the location, since it's implicit in their functionality. For example, `git_apply_tree` should not take a location, but is expected to take all the other options.

2d27ddc0

2018-07-01T21:35:51

apply: use an indexwriter Place the entire `git_apply` operation inside an indexwriter, so that we lock the index before we begin performing patch application. This ensures that there are no other processes modifying things in the working directory.

813f0802

2018-07-01T15:14:36

apply: validate workdir contents match index for BOTH When applying to both the index and the working directory, ensure that the index contents match the working directory. This mirrors the requirement in `git apply --index`. This also means that - along with the prior commit that uses the working directory contents as the checkout baseline - we no longer expect conflicts during checkout. So remove the special-case error handling for checkout conflicts. (Any checkout conflict now would be because the file was actually modified between the start of patch application and the checkout.)

0f4b2f02

2018-07-01T15:13:50

reader: optionally validate index matches workdir When using a workdir reader, optionally validate that the index contents match the working directory contents.

9be89bbd

2018-07-01T11:08:26

reader: apply working directory filters When reading a file from the working directory, ensure that we apply any necessary filters to the item. This ensures that we get the repository-normalized data as the preimage, and further ensures that we can accurately compare the working directory contents to the index contents for accurate safety validation in the `BOTH` case.

5b8d5a22

2018-07-01T13:42:53

apply: use preimage as the checkout baseline Use the preimage as the checkout's baseline. This allows us to support applying patches to files that are modified in the working directory (those that differ from the HEAD and index). Without this, files will be reported as (checkout) conflicts. With this, we expect the on-disk data when we began the patch application (the "preimage") to be on-disk during checkout. We could have also simply used the `FORCE` flag to checkout to accomplish a similar mechanism. However, `FORCE` ignores all differences, while providing a preimage ensures that we will only overwrite the file contents that we actually read. Modify the reader interface to provide the OID to support this.

5b66b667

2018-06-29T12:39:41

apply: when preimage file is missing, return EAPPLYFAIL The preimage file being missing entirely is simply a case of an application failure; return the correct error value for the caller.

dddfff77

2018-06-30T17:12:16

apply: convert checkout conflicts to apply failures When there's a checkout conflict during apply, that means that the working directory was modified in a conflicting manner and the postimage cannot be written. During application, convert this to an application failure for consistency across workdir/index/both applications.

e0224121

2018-06-29T12:09:02

apply: simplify checkout vs index application Separate the concerns of applying via checkout and updating the repository's index. This results in simpler functionality and allows us to not build the temporary collection of paths in the index case.

3b5378c5

2018-06-25T16:27:06

apply: handle file deletions If the file was deleted in the postimage, do not attempt to update the target. Instead, ignore it and simply allow it to stay removed in our computed postimage. Also, test that we can handle file deletions.

f83bbe0a

2018-03-19T19:50:45

apply: introduce `git_apply` Introduce `git_apply`, which will take a `git_diff` and apply it to the working directory (akin to `git apply`), the index (akin to `git apply --cached`), or both (akin to `git apply --index`).

664cda6f

2018-03-19T20:10:38

apply: reimplement `git_apply_tree` with readers The generic `git_reader` interface simplifies `git_apply_tree` somewhat. Reimplement `git_apply_tree` with them.

d73043a2

2018-03-19T20:10:31

reader: a generic way to read files from repos Similar to the `git_iterator` interface, the `git_reader` interface will allow us to read file contents from an arbitrary repository-backed data source (trees, index, or working directory).

20f8a6db

2018-06-28T17:26:21

apply: remove deleted paths from index We update the index with the new_file side of the delta, but we need to explicitly remove the old_file path in the case where an item was deleted or renamed.

c3077ea0

2018-06-25T21:24:49

apply: return a specific exit code on failure Return `GIT_EAPPLYFAIL` on patch application failure so that users can determine that patch application failed due to a malformed/conflicting patch by looking at the error code.

d54aa9ae

2018-06-26T15:25:30

iterator: introduce `git_iterator_foreach` Introduce a `git_iterator_foreach` helper function which invokes a callback on all files for a given iterator.

9c34c996

2018-06-25T17:03:14

apply: handle file additions Don't attempt to read the postimage file during a file addition, simply use an empty buffer as the postimage. Also, test that we can handle file additions.

02b1083a

2018-01-28T23:25:07

apply: introduce `git_apply_tree` Introduce `git_apply_tree`, which will apply a `git_diff` to a given `git_tree`, allowing an in-memory patch application for a repository.

2b12dcf6

2018-03-19T19:45:11

iterator: optionally hash filesystem iterators Optionally hash the contents of files encountered in the filesystem or working directory iterators. This is not expected to be used in production code paths, but may allow us to simplify some test contexts. For working directory iterators, apply filters as appropriate, since we have the context able to do it.

623647af

2018-10-26T12:33:59

Merge pull request #4864 from pks-t/pks/object-parse-fixes Object parse fixes

7655b2d8

2018-10-19T10:29:19

commit: fix reading out of bounds when parsing encoding The commit message encoding is currently being parsed by the `git__prefixcmp` function. As this function does not accept a buffer length, it will happily skip over a buffer's end if it is not `NUL` terminated. Fix the issue by using `git__prefixncmp` instead. Add a test that verifies that we are unable to parse the encoding field if it's cut off by the supplied buffer length.

ee11d47e

2018-10-19T09:47:50

tag: fix out of bounds read when searching for tag message When parsing tags, we skip all unknown fields that appear before the tag message. This skipping is done by using a plain `strstr(buffer, "\n\n")` to search for the two newlines that separate tag fields from tag message. As it is not possible to supply a buffer length to `strstr`, this call may skip over the buffer's end and thus result in an out of bounds read. As `strstr` may return a pointer that is out of bounds, the following computation of `buffer_end - buffer` will overflow and result in an allocation of an invalid length. Fix the issue by using `git__memmem` instead. Add a test that verifies parsing the tag fails not due to the allocation failure but due to the tag having no message.

83e8a6b3

2018-10-18T16:08:46

util: provide `git__memmem` function Unfortunately, neither the `memmem` nor the `strnstr` functions are part of any C standard but are merely extensions of C that are implemented by e.g. glibc. Thus, there is no standardized way to search for a string in a block of memory with a limited size, and using `strstr` is to be considered unsafe in case where the buffer has not been sanitized. In fact, there are some uses of `strstr` in exactly that unsafe way in our codebase. Provide a new function `git__memmem` that implements the `memmem` semantics. That is in a given haystack of `n` bytes, search for the occurrence of a byte sequence of `m` bytes and return a pointer to the first occurrence. The implementation chosen is the "Not So Naive" algorithm from [1]. It was chosen as the implementation is comparably simple while still being reasonably efficient in most cases. Preprocessing happens in constant time and space, searching has a time complexity of O(n*m) with a slightly sub-linear average case. [1]: http://www-igm.univ-mlv.fr/~lecroq/string/

bea65980

2018-10-25T11:21:14

Merge pull request #4851 from pks-t/pks/strtol-removal strtol removal

305e801a

2018-10-21T09:52:32

util: allow callers to reset custom allocators Provide a utility to reset custom allocators back to their default. This is particularly useful for testing.

7c791f3d

2018-10-20T20:25:51

Merge pull request #4852 from libgit2/ethomson/unc_paths Win32 path canonicalization refactoring

6cc14ae3

2018-10-20T20:22:04

Merge pull request #4840 from libgit2/cmn/validity-tree-from-unowned-index Check object existence when creating a tree from an index

a2f9f94b

2018-10-20T20:18:04

Merge branch 'issue-4203'

32b81661

2018-10-20T20:16:32

merge: don't leak the index during reloads

0a4284b1

2018-10-19T14:54:13

Merge pull request #4819 from libgit2/cmn/config-nonewline Configuration variables can appear on the same line as the section header

ea19efc1

2018-10-18T15:08:56

util: fix out of bounds read in error message When an integer that is parsed with `git__strntol32` is too big to fit into an int32, we will generate an error message that includes the actual string that failed to parse. This does not acknowledge the fact that the string may either not be NUL terminated or alternative include additional characters after the number that is to be parsed. We may thus end up printing characters into the buffer that aren't the number or, worse, read out of bounds. Fix the issue by utilizing the `endptr` that was set by `git__strntol64`. This pointer is guaranteed to be set to the first character following the number, and we can thus use it to compute the width of the number that shall be printed. Create a test to verify that we correctly truncate the number.

a34f5b0d

2018-10-18T08:57:27

win32: refactor `git_win32_path_remove_namespace` Update `git_win32_path_remove_namespace` to disambiguate the prefix being removed versus the prefix being added. Now we remove the "namespace", and (may) add a "prefix" in its place. Eg, we remove the `\\?\` namespace. We remove the `\\?\UNC\` namespace, and replace it with the `\\` prefix. This aids readability somewhat. Additionally, use pointer arithmetic instead of offsets, which seems to also help readability.

b2e85f98

2018-10-17T08:48:43

win32: rename `git_win32__canonicalize_path` The internal API `git_win32__canonicalize_path` is far, far too easily confused with the internal API `git_win32_path_canonicalize`. The former removes the namespace prefix from a path (eg, given `\\?\C:\Temp\foo`, it returns `C:\Temp\foo`, and given `\\?\UNC\server\share`, it returns `\\server\share`). As such, rename it to `git_win32_path_remove_namespace`. `git_win32_path_canonicalize` remains unchanged.

b09c1c7b

2018-10-18T14:37:55

util: avoid signed integer overflows in `git__strntol64` While `git__strntol64` tries to detect integer overflows when doing the necessary arithmetics to come up with the final result, it does the detection only after the fact. This check thus relies on undefined behavior of signed integer overflows. Fix this by instead checking up-front whether the multiplications or additions will overflow. Note that a detected overflow will not cause us to abort parsing the current sequence of digits. In the case of an overflow, previous behavior was to still set up the end pointer correctly to point to the first character immediately after the currently parsed number. We do not want to change this now as code may rely on the end pointer being set up correctly even if the parsed number is too big to be represented as 64 bit integer.

8d7fa88a

2018-10-18T12:04:07

util: remove `git__strtol32` The function `git__strtol32` can easily be misused when untrusted data is passed to it that may not have been sanitized with trailing `NUL` bytes. As all usages of this function have now been removed, we can remove this function altogether to avoid future misuse of it.

2613fbb2

2018-10-18T11:58:14

global: replace remaining use of `git__strtol32` Replace remaining uses of the `git__strtol32` function. While these uses are all safe as the strings were either sanitized or from a trusted source, we want to remove `git__strtol32` altogether to avoid future misuse.

21652ee9

2018-10-18T11:43:30

tree-cache: avoid out-of-bound reads when parsing trees We use the `git__strtol32` function to parse the child and entry count of treecaches from the index, which do not accept a buffer length. As the buffer that is being passed in is untrusted data and may thus be malformed and may not contain a terminating `NUL` byte, we can overrun the buffer and thus perform an out-of-bounds read. Fix the issue by uzing `git__strntol32` instead.

68deb2cc

2018-10-18T11:37:10

util: remove unsafe `git__strtol64` function The function `git__strtol64` does not take a maximum buffer length as parameter. This has led to some unsafe usages of this function, and as such we may consider it as being unsafe to use. As we have now eradicated all usages of this function, let's remove it completely to avoid future misuse.

1a2efd10

2018-10-18T11:35:08

config: remove last instance of `git__strntol64` When parsing integers from configuration values, we use `git__strtol64`. This is fine to do, as we always sanitize values and can thus be sure that they'll have a terminating `NUL` byte. But as this is the last call-site of `git__strtol64`, let's just pass in the length explicitly by calling `strlen` on the value to be able to remove `git__strtol64` altogether.

3db9aa6f

2018-10-18T11:32:48

signature: avoid out-of-bounds reads when parsing signature dates We use `git__strtol64` and `git__strtol32` to parse the trailing commit or author date and timezone of signatures. As signatures are usually part of a commit or tag object and thus essentially untrusted data, the buffer may be misformatted and may not be `NUL` terminated. This may lead to an out-of-bounds read. Fix the issue by using `git__strntol64` and `git__strntol32` instead.

600ceadd

2018-10-18T11:29:06

index: avoid out-of-bounds read when reading reuc entry stage We use `git__strtol64` to parse file modes of the index entries, which does not limit the parsed buffer length. As the index can be essentially treated as "untrusted" in that the data stems from the file system, it may be misformatted and may not contain terminating `NUL` bytes. This may lead to out-of-bounds reads when trying to parse index entries with such malformatted modes. Fix the issue by using `git__strntol64` instead.

1a3fa1f5

2018-10-18T11:25:59

commit_list: avoid use of strtol64 without length limit When quick-parsing a commit, we use `git__strtol64` to parse the commit's time. The buffer that's passed to `commit_quick_parse` is the raw data of an ODB object, though, whose data may not be properly formatted and also does not have to be `NUL` terminated. This may lead to out-of-bound reads. Use `git__strntol64` to avoid this problem.

f010b66b

2018-10-17T14:33:47

Merge pull request #4849 from libgit2/cmn/expose-gitfile-check path: export the dotgit-checking functions

7615794c

2018-10-15T18:08:13

Merge pull request #4845 from pks-t/pks/object-fuzzer Object parsing fuzzer

9b6e4081

2018-10-15T17:08:38

Merge commit 'afd10f0' (Follow 308 redirects)

05e54e00

2018-10-15T13:54:17

path: export the dotgit-checking functions These checks are preformed by libgit2 on checkout, but they're also useful for performing checks in applications which do not involve checkout. Expose them under `sys/` as it's still fairly in the weeds even for this library.

1d718fa5

2018-09-28T11:55:34

config: variables might appear on the same line as a section header While rare and a machine would typically not generate such a configuration file, it is nevertheless valid to write [foo "bar"] baz = true and we need to deal with that instead of assuming everything is on its own line.

afd10f0b

2018-10-13T09:31:20

Follow 308 redirects (as used by GitLab)

814e7acb

2018-10-12T12:38:06

Merge pull request #4842 from nelhage/fuzz-config-memory config: Port config_file_fuzzer to the new in-memory backend.

463c21e2

2018-10-11T13:27:06

Apply code review feedback

6562cdda

2018-10-11T12:43:08

object: properly propagate errors on parsing failures When failing to parse a raw object fromits data, we free the partially parsed object but then fail to propagate the error to the caller. This may lead callers to operate on objects with invalid memory, which will sooner or later cause the program to segfault. Fix the issue by passing up the error code returned by `parse_raw`.

2d449a11

2018-10-09T02:42:14

config: Refactor `git_config_backend_from_string` to take a length

fd490d3e

2018-10-08T13:15:31

tree: unify the entry validity checks We have two similar functions, `git_treebuilder_insert` and `append_entry` which are used in different codepaths as part of creating a new tree. The former learnt to check for object existence under strict object creation, but the latter did not. This allowed the creation of a tree from an unowned index to bypass some of the checks and create a tree pointing to a nonexistent object. Extract a single function which performs these checks and call it from both codepaths. In `append_entry` we still do not validate when asked not to, as this is data which is already in the tree and we want to allow users to deal with repositories which already have some invalid data.

0cd976c8

2018-10-07T12:00:06

Merge pull request #4830 from pks-t/pks/diff-stats-rename-common diff_stats: use git's formatting of renames with common directories

475db39b

2018-10-06T12:58:06

ignore unsupported http authentication schemes auth_context_match returns 0 instead of -1 for unknown schemes to not fail in situations where some authentication schemes are supported and others are not. apply_credentials is adjusted to handle auth_context_match returning 0 without producing authentication context.

a8d447f6

2018-10-05T20:13:34

Merge pull request #4837 from pks-t/cmn/reject-option-submodule-url-path submodule: ignore path and url attributes if they look like options

ce8803a2

2018-10-05T20:03:38

Merge pull request #4836 from pks-t/pks/smart-packets Smart packet security fixes

c8ca3cae

2018-10-05T11:47:39

submodule: ignore path and url attributes if they look like options These can be used to inject options in an implementation which performs a recursive clone by executing an external command via crafted url and path attributes such that it triggers a local executable to be run. The library is not vulnerable as we do not rely on external executables but a user of the library might be relying on that so we add this protection. This matches this aspect of git's fix for CVE-2018-17456.

d06d4220

2018-10-05T10:56:02

config_file: properly ignore includes without "path" value In case a configuration includes a key "include.path=" without any value, the generated configuration entry will have its value set to `NULL`. This is unexpected by the logic handling includes, and as soon as we try to calculate the included path we will unconditionally dereference that `NULL` pointer and thus segfault. Fix the issue by returning early in both `parse_include` and `parse_conditional_include` in case where the `file` argument is `NULL`. Add a test to avoid future regression. The issue has been found by the oss-fuzz project, issue 10810.

e5090ee3

2018-10-04T11:19:28

diff_stats: use git's formatting of renames with common directories In cases where a file gets renamed such that the directories containing it previous and after the rename have a common prefix, then git will avoid printing this prefix twice and instead format the rename as "prefix/{old => new}". We currently didn't do anything like that, but simply printed "prefix/old -> prefix/new". Adjust our behaviour to instead match upstream. Adjust the test for this behaviour to expect the new format.

1bc5b05c

2018-10-03T16:17:21

smart_pkt: do not accept callers passing in no line length Right now, we simply ignore the `linelen` parameter of `git_pkt_parse_line` in case the caller passed in zero. But in fact, we never want to assume anything about the provided buffer length and always want the caller to pass in the available number of bytes. And in fact, checking all the callers, one can see that the funciton is never being called in case where the buffer length is zero, and thus we are safe to remove this check.

c05790a8

2018-08-09T11:16:15

smart_pkt: return parsed length via out-parameter The `parse_len` function currently directly returns the parsed length of a packet line or an error code in case there was an error. Instead, convert this to our usual style of using the return value as error code only and returning the actual value via an out-parameter. Thus, we can now convert the output parameter to an unsigned type, as the size of a packet cannot ever be negative. While at it, we also move the check whether the input buffer is long enough into `parse_len` itself. We don't really want to pass around potentially non-NUL-terminated buffers to functions without also passing along the length, as this is dangerous in the unlikely case where other callers for that function get added. Note that we need to make sure though to not mess with `GIT_EBUFS` error codes, as these indicate not an error to the caller but that he needs to fetch more data.

0b3dfbf4

2018-08-09T11:13:59

smart_pkt: reorder and rename parameters of `git_pkt_parse_line` The parameters of the `git_pkt_parse_line` function are quite confusing. First, there is no real indicator what the `out` parameter is actually all about, and it's not really clear what the `bufflen` parameter refers to. Reorder and rename the parameters to make this more obvious.

5fabaca8

2018-08-09T11:04:42

smart_pkt: fix buffer overflow when parsing "unpack" packets When checking whether an "unpack" packet returned the "ok" status or not, we use a call to `git__prefixcmp`. In case where the passed line isn't properly NUL terminated, though, this may overrun the line buffer. Fix this by using `git__prefixncmp` instead.

b5ba7af2

2018-08-09T11:03:37

smart_pkt: fix "ng" parser accepting non-space character When parsing "ng" packets, we blindly assume that the character immediately following the "ng" prefix is a space and skip it. As the calling function doesn't make sure that this is the case, we can thus end up blindly accepting an invalid packet line. Fix the issue by using `git__prefixncmp`, checking whether the line starts with "ng ".

a9f1ca09

2018-08-09T11:01:00

smart_pkt: fix buffer overflow when parsing "ok" packets There are two different buffer overflows present when parsing "ok" packets. First, we never verify whether the line already ends after "ok", but directly go ahead and also try to skip the expected space after "ok". Second, we then go ahead and use `strchr` to scan for the terminating newline character. But in case where the line isn't terminated correctly, this can overflow the line buffer. Fix the issues by using `git__prefixncmp` to check for the "ok " prefix and only checking for a trailing '\n' instead of using `memchr`. This also fixes the issue of us always requiring a trailing '\n'. Reported by oss-fuzz, issue 9749: Crash Type: Heap-buffer-overflow READ {*} Crash Address: 0x6310000389c0 Crash State: ok_pkt git_pkt_parse_line git_smart__store_refs Sanitizer: address (ASAN)

bc349045

2018-08-09T10:38:10

smart_pkt: fix buffer overflow when parsing "ACK" packets We are being quite lenient when parsing "ACK" packets. First, we didn't correctly verify that we're not overrunning the provided buffer length, which we fix here by using `git__prefixncmp` instead of `git__prefixcmp`. Second, we do not verify that the actual contents make any sense at all, as we simply ignore errors when parsing the ACKs OID and any unknown status strings. This may result in a parsed packet structure with invalid contents, which is being silently passed to the caller. This is being fixed by performing proper input validation and checking of return codes.

5edcf5d1

2018-08-09T10:57:06

smart_pkt: adjust style of "ref" packet parsing function While the function parsing ref packets doesn't have any immediately obvious buffer overflows, it's style is different to all the other parsing functions. Instead of checking buffer length while we go, it does a check up-front. This causes the code to seem a lot more magical than it really is due to some magic constants. Refactor the function to instead make use of the style of other packet parser and verify buffer lengths as we go.

786426ea

2018-08-09T10:46:58

smart_pkt: check whether error packets are prefixed with "ERR " In the `git_pkt_parse_line` function, we determine what kind of packet a given packet line contains by simply checking for the prefix of that line. Except for "ERR" packets, we always only check for the immediate identifier without the trailing space (e.g. we check for an "ACK" prefix, not for "ACK "). But for "ERR" packets, we do in fact include the trailing space in our check. This is not really much of a problem at all, but it is inconsistent with all the other packet types and thus causes confusion when the `err_pkt` function just immediately skips the space without checking whether it overflows the line buffer. Adjust the check in `git_pkt_parse_line` to not include the trailing space and instead move it into `err_pkt` for consistency.

40fd84cc

2018-08-09T10:46:26

smart_pkt: explicitly avoid integer overflows when parsing packets When parsing data, progress or error packets, we need to copy the contents of the rest of the current packet line into the flex-array of the parsed packet. To keep track of this array's length, we then assign the remaining length of the packet line to the structure. We do have a mismatch of types here, as the structure's `len` field is a signed integer, while the length that we are assigning has type `size_t`. On nearly all platforms, this shouldn't pose any problems at all. The line length can at most be 16^4, as the line's length is being encoded by exactly four hex digits. But on a platforms with 16 bit integers, this assignment could cause an overflow. While such platforms will probably only exist in the embedded ecosystem, we still want to avoid this potential overflow. Thus, we now simply change the structure's `len` member to be of type `size_t` to avoid any integer promotion.

4a5804c9

2018-08-09T10:36:44

smart_pkt: honor line length when determining packet type When we parse the packet type of an incoming packet line, we do not verify that we don't overflow the provided line buffer. Fix this by using `git__prefixncmp` instead and passing in `len`. As we have previously already verified that `len <= linelen`, we thus won't ever overflow the provided buffer length.

b36cc7a4

2018-09-27T11:18:00

fix check if blob is uninteresting when inserting tree to packbuilder Blobs that have been marked as uninteresting should not be inserted into packbuilder when inserting a tree. The check as to whether a blob was uninteresting looked at the status for the tree itself instead of the blob. This could cause significantly larger packfiles.

8ab11dd5

2018-09-30T16:40:22

Fix issue with path canonicalization for Win32 paths

0530d7d9

2018-09-28T18:04:23

Merge pull request #4767 from pks-t/pks/config-mem In-memory configuration

2be39cef

2018-08-10T19:38:57

config: introduce new read-only in-memory backend Now that we have abstracted away how to store and retrieve config entries, it became trivial to implement a new in-memory backend by making use of this. And thus we do so. This commit implements a new read-only in-memory backend that can parse a chunk of memory into a `git_config_backend` structure.

b78f4ab0

2018-08-16T12:22:03

config_entries: refactor entries iterator memory ownership Right now, the config file code requires us to pass in its backend to the config entry iterator. This is required with the current code, as the config file backend will first create a read-only snapshot which is then passed to the iterator just for that purpose. So after the iterator is getting free'd, the code needs to make sure that the snapshot gets free'd, as well. By now, though, we can easily refactor the code to be more efficient and remove the reverse dependency from iterator to backend. Instead of creating a read-only snapshot (which also requires us to re-parse the complete configuration file), we can simply duplicate the config entries and pass those to the iterator. Like that, the iterator only needs to make sure to free the duplicated config entries, which is trivial to do and clears up memory ownership by a lot.

d49b1365

2018-08-10T19:01:37

config_entries: internalize structure declarations Access to the config entries is now completely done via the modules function interface and no caller messes with the struct's internals. We can thus completely move the structure declarations into the implementation file so that nobody even has a chance to mess with the members.

123e5963

2018-08-10T18:59:59

config_entries: abstract away reference counting Instead of directly calling `git_atomic_inc` in users of the config entries store, provide a `git_config_entries_incref` function to further decouple the interfaces. Convert the refcount to a `git_refcount` structure while at it.

5a7e0b3c

2018-08-10T18:49:38

config_entries: abstract away iteration over entries The nice thing about our `git_config_iterator` interfaces is that nobody needs to know anything about the implementation details. All that is required is to obtain the iterator via any backend and then use it by executing generic functions. We can thus completely internalize all the implementation details of how to iterate over entries into the config entries store and simply create such an iterator in our config file backend when we want to iterate its entries. This further decouples the config file backend from the config entries store.

60ebc137

2018-08-10T14:53:09

config_entries: abstract away retrieval of config entries The code accessing config entries in the `git_config_entries` structure is still much too intimate with implementation details, directly accessing the maps and handling indices. Provide two new functions to get config entries from the internal map structure to decouple the interfaces and use them in the config file code. The function `git_config_entries_get` will simply look up the entry by name and, in the case of a multi-value, return the last occurrence of that entry. The second function, `git_config_entries_get_unique`, will only return an entry if it is unique and not included via another configuration file. This one is required to properly implement write operations for single entries, as we refuse to write to or delete a single entry if it is not clear which one was meant.

fb8a87da

2018-08-10T14:50:15

config_entries: rename functions and structure The previous commit simply moved all code that is required to handle config entries to a new module without yet adjusting any of the function and structure names to help readability. We now rename things accordingly to have a common "git_config_entries" entries instead of the old "diskfile_entries" one.

04f57d51

2018-08-10T13:33:02

config_entries: pull out implementation of entry store The configuration entry store that is used for configuration files needs to keep track of all entries in two different structures: - a singly linked list is being used to be able to iterate through configuration files in the order they have been found - a string map is being used to efficiently look up configuration entries by their key This store is thus something that may be used by other, future backends as well to abstract away implementation details and iteration over the entries. Pull out the necessary functions from "config_file.c" and moves them into their own "config_entries.c" module. For now, this is simply moving over code without any renames and/or refactorings to help reviewing.

d75bbea1

2018-08-10T14:35:23

config_file: remove unnecessary snapshot indirection The implementation for config file snapshots has an unnecessary redirection from `config_snapshot` to `git_config_file__snapshot`. Inline the call to `git_config_file__snapshot` and remove it.

b944e137

2018-08-10T13:03:33

config: rename "config_file.h" to "config_backend.h" The header "config_file.h" has a list of inline-functions to access the contents of a config backend without directly messing with the struct's function pointers. While all these functions are called "git_config_file_*", they are in fact completely backend-agnostic and don't care whether it is a file or not. Rename all the function to instead be backend-agnostic versions called "git_config_backend_*" and rename the header to match.

1aeff5d7

2018-08-10T12:52:18

config: move function normalizing section names into "config.c" The function `git_config_file_normalize_section` is never being used in any file different than "config.c", but it is implemented in "config_file.c". Move it over and make the symbol static.

a5562692

2018-08-10T12:49:50

config: make names backend-agnostic As a last step to make variables and structures more backend agnostic for our `git_config` structure, rename local variables to not be called `file` anymore.

ba1cd495

2018-09-28T11:10:49

Merge pull request #4784 from tiennou/fix/warnings Some warnings

367f6243

2018-09-28T11:04:06

Merge pull request #4803 from tiennou/fix/4802 index: release the snapshot instead of freeing the index

e0afd1c2

2018-09-26T21:17:39

vector: do not realloc 0-size vectors

fa48d2ea

2018-09-26T19:15:35

vector: do not malloc 0-length vectors on dup

be4717d2

2018-09-18T12:12:06

path: fix "comparison always true" warning

21496c30

2018-03-30T00:20:23

stransport: fix a warning on iOS "warning: values of type 'OSStatus' should not be used as format arguments; add an explicit cast to 'int' instead [-Wformat]"

a54043b7

2018-09-21T15:32:18

Merge pull request #4794 from marcin-krystianc/mkrystianc/prune_perf git_remote_prune to be O(n * logn)

83733aeb

2018-08-10T12:43:21

config: rename `file_internal` and its `file` member Same as with the previous commit, the `file_internal` struct is used to keep track of all the backends that are added to a `git_config` struct. Rename it to `backend_internal` and rename its `file` member to `backend` to make the implementation more backend-agnostic.

633cf40c

2018-08-10T12:39:17

config: rename `files` vector to `backends` Originally, the `git_config` struct is a collection of all the parsed configuration files from different scopes (system-wide config, user-specific config as well as the repo-specific config files). Historically, we didn't and don't yet have any other configuration backends than the one for files, which is why the field holding the config backends is called `files`. But in fact, nothing dictates that the vector of backends actually holds file backends only, as they are generic and custom backends can be implemented by users. Rename the member to be called `backends` to clarify that there is nothing specific to files here.

b9affa32

2018-08-10T19:23:00

config_parse: avoid unused static declared values The variables `git_config_escaped` and `git_config_escapes` are both defined as static const character pointers in "config_parse.h". In case where "config_parse.h" is included but those two variables are not being used, the compiler will thus complain about defined but unused variables. Fix this by declaring them as external and moving the actual initialization to the C file. Note that it is not possible to simply make this a #define, as we are indexing into those arrays.

0b9c68b1

2018-08-16T14:10:58

submodule: fix submodule names depending on config-owned memory When populating the list of submodule names, we use the submodule configuration entry's name as the key in the map of submodule names. This creates a hidden dependency on the liveliness of the configuration that was used to parse the submodule, which is fragile and unexpected. Fix the issue by duplicating the string before writing it into the submodule name map.

e181a649

2018-09-18T03:03:03

Merge pull request #4809 from libgit2/cmn/revwalk-sign-regression Fix revwalk limiting regression

12a1790d

2018-09-17T14:49:46

revwalk: only check the first commit in the list for an earlier timestamp This is not a big deal, but it does make us match git more closely by checking only the first. The lists are sorted already, so there should be no functional difference other than removing a possible check from every iteration in the loop.

thodg/libgit2/src

src

Log