kmx git

Commit	Date	Message
c6f9ad73	2019-12-13T13:18:54	patch_parse: fix undefined behaviour due to arithmetic on NULL pointers Doing arithmetic with NULL pointers is undefined behaviour in the C standard. We do so regardless when parsing patches, as we happily add a potential prefix length to prefixed paths. While this works out just fine as the prefix length is always equal to zero in these cases, thus resulting in another NULL pointer, it still is undefined behaviour and was pointed out to us by OSSfuzz. Fix the issue by checking whether paths are NULL, avoiding the arithmetic if they are.
33e6c402	2019-11-28T15:26:36	patch_parse: fix out-of-bounds reads caused by integer underflow The patch format for binary files is a simple Base85 encoding with a length byte as prefix that encodes the current line's length. For each line, we thus check whether the line's actual length matches its expected length in order to not faultily apply a truncated patch. This also acts as a check to verify that we're not reading outside of the line's string: if (encoded_len > ctx->parse_ctx.line_len - 1) { error = git_parse_err(...); goto done; } There is the possibility for an integer underflow, though. Given a line with a single prefix byte, only, `line_len` will be zero when reaching this check. As a result, subtracting one from that will result in an integer underflow, causing us to assume that there's a wealth of bytes available later on. Naturally, this may result in an out-of-bounds read. Fix the issue by checking both `encoded_len` and `line_len` for a non-zero value. The binary format doesn't make use of zero-length lines anyway, so we need to know that there are both encoded bytes and remaining characters available at all. This patch also adds a test that works based on the last error message. Checking error messages is usually too tightly coupled, but in fact parsing the patch failed even before the change. Thus the only possibility is to use e.g. Valgrind, but that'd result in us not catching issues when run without Valgrind. As a result, using the error message is considered a viable tradeoff as we know that we didn't start decoding Base85 in the first place.
fb439c97	2019-11-28T14:41:58	Merge pull request #5306 from herrerog/patchid diff: complete support for git patchid
fefefd1d	2019-06-23T16:42:14	odb: use `git_object_size_t` for object size Instead of using a signed type (`off_t`) use a new `git_object_size_t` for the sizes of objects.
048e94ad	2019-11-07T14:13:14	patch_parse: correct parsing of patch containing not shown binary data. When not shown binary data is added or removed in a patch, patch parser is currently returning 'error -1 - corrupt git binary header at line 4'. Fix it by correctly handling case where binary data is added/removed. Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
de7659cc	2019-11-10T18:44:56	patch_parse: use paths from "---"/"+++" lines for binary patches For some patches, it is not possible to derive the old and new file paths from the patch header's first line, most importantly when they contain spaces. In such a case, we derive both paths from the "---" and "+++" lines, which allow for non-ambiguous parsing. We fail to use these paths when parsing binary patches without data, though, as we always expect the header paths to be filled in. Fix this by using the "---"/"+++" paths by default and only fall back to header paths if they aren't set. If neither of those paths are set, we just return an error. Add two tests to verify this behaviour, one of which would have previously caused a segfault.
de543e29	2019-11-05T22:44:27	patch_parse: fix segfault when header path contains whitespace only When parsing header paths from a patch, we reject any patches with empty paths as malformed patches. We perform the check whether a path is empty before sanitizing it, though, which may lead to a path becoming empty after the check, e.g. if we have trimmed whitespace. This may lead to a segfault later when any part of our patching logic actually references such a path, which may then be a `NULL` pointer. Fix the issue by performing the check after sanitizing. Add tests to catch the issue as they would have produced a segfault previosuly.
37141ff7	2019-10-21T18:56:59	patch_parse: detect overflow when calculating old/new line position When the patch contains lines close to INT_MAX, then it may happen that we end up with an integer overflow when calculating the line of the current diff hunk. Reject such patches as unreasonable to avoid the integer overflow. As the calculation is performed on integers, we introduce two new helpers `git__add_int_overflow` and `git__sub_int_overflow` that perform the integer overflow check in a generic way.
468e3ddc	2019-10-19T16:48:11	patch_parse: fix out-of-bounds read with No-NL lines We've got two locations where we copy lines into the patch. The first one is when copying normal " ", "-" or "+" lines, while the second location gets executed when we copy "\ No newline at end of file" lines. While the first one correctly uses `git__strndup` to copy only until the newline, the other one doesn't. Thus, if the line occurs at the end of the patch and if there is no terminating NUL character, then it may result in an out-of-bounds read. Fix the issue by using `git__strndup`, as was already done in the other location. Furthermore, add allocation checks to both locations to detect out-of-memory situations.
6c6c15e9	2019-10-19T15:52:35	patch_parse: reject empty path names When parsing patch headers, we currently accept empty path names just fine, e.g. a line "--- \n" would be parsed as the empty filename. This is not a valid patch format and may cause `NULL` pointer accesses at a later place as `git_buf_detach` will return `NULL` in that case. Reject such patches as malformed with a nice error message.
223e7e43	2019-10-19T15:42:54	patch_parse: reject patches with multiple old/new paths It's currently possible to have patches with multiple old path name headers. As we didn't check for this case, this resulted in a memory leak when overwriting the old old path with the new old path because we simply discarded the old pointer. Instead of fixing this by free'ing the old pointer, we should reject such patches altogether. It doesn't make any sense for the "---" or "+++" markers to occur multiple times within a patch n the first place. This also implicitly fixes the memory leak.
11de594f	2019-10-16T22:11:33	patch_parse: handle patches without extended headers Extended header lines (especially the "index <hash>..<hash> <mode>") are not required by "git apply" so it import patches. So we allow the from-file/to-file lines (--- a/file\n+++ b/file) to directly follow the git diff header. This fixes #5267.
b61810bf	2019-09-28T15:52:25	patch_parse: handle patches with new empty files Patches containing additions of empty files will not contain diff data but will end with the index header line followed by the terminating sequence "-- ". We follow the same logic as in cc4c44a and allow "-- " to immediately follow the index header.
27b8b31e	2019-08-01T11:57:03	parse: remove use of variadic macros which are not C89 compliant The macro `git_parse_error` is implemented in a variadic way so that it's possible to pass printf-style parameters. Unfortunately, variadic macros are not defined by C89 and thus we cannot use that functionality. But as we have implemented `git_error_vset` in the previous commit, we can now just use that instead. Convert `git_parse_error` to a variadic function and use `git_error_vset` to fix the compliance violation. While at it, move the function to "patch_parse.c".
a613832e	2019-07-20T18:49:48	patch_parse: fix segfault due to line containing static contents With commit dedf70ad2 (patch_parse: do not depend on parsed buffer's lifetime, 2019-07-05), all lines of the patch are allocated with `strdup` to make lifetime of the parsed patch independent of the buffer that is currently being parsed. In patch b08932824 (patch_parse: ensure valid patch output with EOFNL, 2019-07-11), we introduced another code location where we add lines to the parsed patch. But as that one was implemented via a separate pull request, it wasn't converted to use `strdup`, as well. As a consequence, we generate a segfault when trying to deallocate the potentially static buffer that's now in some of the lines. Use `git__strdup` to fix the issue.
fd7a384b	2019-07-20T11:24:37	Merge pull request #5159 from pks-t/pks/patch-parse-old-missing-nl patch_parse: handle missing newline indicator in old file
b0893282	2019-07-11T12:12:04	patch_parse: ensure valid patch output with EOFNL
3f855fe8	2019-07-05T11:06:33	patch_parse: handle missing newline indicator in old file When either the old or new file contents have no newline at the end of the file, then git-diff(1) will print out a "\ No newline at end of file" indicator. While we do correctly handle this in the case where the new file has this indcator, we fail to parse patches where the old file is missing a newline at EOF. Fix this bug by handling and missing newline indicators in the old file. Add tests to verify that we can parse such files.
dedf70ad	2019-07-05T09:35:43	patch_parse: do not depend on parsed buffer's lifetime When parsing a patch from a buffer, we let the patch lines point into the original buffer. While this is efficient use of resources, this also ties the lifetime of the parsed patch to the parsed buffer. As this behaviour is not documented anywhere in our API it is very surprising to its users. Untie the lifetime by duplicating the lines into the parsed patch. Add a test that verifies that lifetimes are indeed independent of each other.
f1d73189	2019-05-20T07:02:50	patch: use size_t for size when parsing
30c06b60	2019-03-22T23:56:10	patch_parse.c: Handle CRLF in parse_header_start
b3497344	2019-03-29T12:15:20	patch_parse: fix parsing addition/deletion of file with space The diff header format is a strange beast in that it is inherently unparseable in an unambiguous way. While parsing a/file.txt b/file.txt is obvious and trivially doable, parsing a diff header of a/file b/file ab.txt b/file b/file ab.txt is not (but in fact valid and created by git.git). Due to that, we have relaxed our diff header parser in commit 80226b5f6 (patch_parse: allow parsing ambiguous patch headers, 2017-09-22), so that we started to bail out when seeing diff headers with spaces in their file names. Instead, we try to use the "---" and "+++" lines, which are unambiguous. In some cases, though, we neither have a useable file name from the header nor from the "---" or "+++" lines. This is the case when we have a deletion or addition of a file with spaces: the header is unparseable and the other lines will simply show "/dev/null". This trips our parsing logic when we try to extract the prefix (the "a/" part) that is being used in the path line, where we unconditionally try to dereference a NULL pointer in such a scenario. We can fix this by simply not trying to parse the prefix in cases where we have no useable path name. That'd leave the parsed patch without either `old_prefix` or `new_prefix` populated. But in fact such cases are already handled by users of the patch object, which simply opt to use the default prefixes in that case.
131cd9b1	2019-03-29T11:58:50	patch_parse: improve formatting
80c3867b	2019-01-20T19:20:12	patch: explicitly cast down in parse_header_percent Quiet down a warning from MSVC about how we're potentially losing data. This is safe since we've explicitly tested that it's within the range of 0-100.
f673e232	2018-12-27T13:47:34	git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
4b84db6a	2018-11-14T12:33:38	patch_parse: remove unused function `parse_number` The function `parse_number` was replaced by `git_parse_advance_digit` which is provided by the parser interface in commit 252f2eeee (parse: implement and use `git_parse_advance_digit`, 2017-07-14). As there are no remaining callers, remove it.
f9e28026	2018-06-18T20:37:18	patch_parse: populate line numbers while parsing diffs
ecf4f33a	2018-02-08T11:14:48	Convert usage of `git_buf_free` to new `git_buf_dispose`
06b8a40f	2018-02-16T11:29:46	Explicitly mark fallthrough cases with comments A lot of compilers nowadays generate warnings when there are cases in a switch statement which implicitly fall through to the next case. To avoid this warning, the last line in the case that is falling through can have a comment matching a regular expression, where one possible comment body would be `/* fall through */`. An alternative to the comment would be an explicit attribute like e.g. `[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only introduced support for such an attribute recently with GCC 7. Thus, and also because the fallthrough comment is supported by most compilers, we settle for using comments instead. One shortcoming of that method is that compilers are very strict about that. Most interestingly, that comment _really_ has to be the last line. In case a closing brace follows the comment, the heuristic will fail.
4110fc84	2017-12-23T23:30:29	Merge pull request #4285 from pks-t/pks/patches-with-whitespace patch_parse: fix parsing unquoted filenames with spaces
585b5dac	2017-11-18T15:43:11	refcount: make refcounting conform to aliasing rules Strict aliasing rules dictate that for most data types, you are not allowed to cast them to another data type and then access the casted pointers. While this works just fine for most compilers, technically we end up in undefined behaviour when we hurt that rule. Our current refcounting code makes heavy use of casting and thus violates that rule. While we didn't have any problems with that code, Travis started spitting out a lot of warnings due to a change in their toolchain. In the refcounting case, the code is also easy to fix: as all refcounting-statements are actually macros, we can just access the `rc` field directly instead of casting. There are two outliers in our code where that doesn't work. Both the `git_diff` and `git_patch` structures have specializations for generated and parsed diffs/patches, which directly inherit from them. Because of that, the refcounting code is only part of the base structure and not of the children themselves. We can help that by instead passing their base into `GIT_REFCOUNT_INC`, though.
80226b5f	2017-09-22T13:39:05	patch_parse: allow parsing ambiguous patch headers The git patch format allows for having unquoted paths with whitespaces inside. This format becomes ambiguous to parse, e.g. in the following example: diff --git a/file b/with spaces.txt b/file b/with spaces.txt While we cannot parse this in a correct way, we can instead use the "---" and "+++" lines to retrieve the file names, as the path is not followed by anything here but spans the complete remaining line. Because of this, we can simply bail outwhen parsing the "diff --git" header here without an actual error and then proceed to just take the paths from the other headers.
3892f70d	2017-09-22T13:26:47	patch_parse: treat complete line after "---"/"+++" as path When parsing the "---" and "+++" line, we stop after the first whitespace inside of the filename. But as files containing whitespaces do not need to be quoted, we should instead use the complete line here. This fixes parsing patches with unquoted paths with whitespaces.
7bdfc0a6	2017-07-14T15:33:32	parse: always initialize line pointer Upon initializing the parser context, we do not currently initialize the current line, line length and line number. Do so in order to make the interface easier to use and more obvious for future consumers of the parsing API.
e72cb769	2017-07-14T14:37:07	parse: implement `git_parse_peek` Some code parts need to inspect the next few bytes without actually consuming it yet, for example to examine what content it has to expect next. Create a new function `git_parse_peek` which returns the next byte without modifying the parsing context and use it at multiple call sites.
252f2eee	2017-07-14T13:45:05	parse: implement and use `git_parse_advance_digit` The patch parsing code has multiple recurring patterns where we want to parse an actual number. Create a new function `git_parse_advance_digit` and use it to avoid code duplication.
65dcb645	2017-07-14T13:29:29	patch_parse: use git_parse_contains_s Instead of manually checking the parsing context's remaining length and comparing the leading bytes with a specific string, we can simply re-use the function `git_parse_ctx_contains_s`. Do so to avoid code duplication and to further decouple patch parsing from the parsing context's struct members.
ef1395f3	2017-11-11T15:30:43	parse: extract parse module The `git_patch_parse_ctx` encapsulates both parser state as well as options specific to patch parsing. To advance this state and keep it consistent, we provide a few functions which handle advancing the current position and accessing bytes of the patch contents. In fact, these functions are quite generic and not related to patch-parsing by themselves. Seeing that we have similar logic inside of other modules, it becomes quite enticing to extract this functionality into its own parser module. To do so, we create a new module `parse` with a central struct called `git_parse_ctx`. It encapsulates both the content that is to be parsed as well as its lengths and the current position. `git_patch_parse_ctx` now only contains this `parse_ctx` only, which is then accessed whenever we need to touch the current parser. This is the first step towards re-using this functionality across other modules which require parsing functionality and remove code-duplication.
cc4c44a9	2017-09-01T09:37:05	patch_parse: fix parsing patches only containing exact renames Patches which contain exact renames only will not contain an actual diff body, but only a list of files that were renamed. Thus, the patch header is immediately followed by the terminating sequence "-- ". We currently do not recognize this character sequence as a possible terminating sequence. Add it and create a test to catch the failure.
57bc9dab	2017-07-14T10:57:49	patch_parse: implement state machine for parsing patch headers Our code parsing Git patch headers is rather lax in parsing headers of a Git-style patch. Most notably, we do not care for the exact order in which header lines appear and as such, we may parse patch files which are not really valid after all. Furthermore, the state transitions inside of the parser are not as obvious as they could be, making it harder than required to follow its logic. To improve upon this situation, this patch introduces a real state machine to parse the patches. Instead of simply parsing each line without caring for previous state and the exact ordering, we define a set of states with their allowed transitions. This makes the patch parser more strict in only allowing valid successions of header lines. As the transition table is defined inside of a single structure with the expected line, required state as well as the state that we end up in, all state transitions are immediately obvious from just having a look at this structure. This improves both maintainability and eases reasoning about the patch parser.
0c7f49dd	2017-06-30T13:39:01	Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
723bdf48	2017-03-20T09:35:23	patch_parse: check if advancing over header newline succeeds While parsing patch header lines, we iterate over each line and check if the line has trailing garbage. What we do not check though is that the line is actually a line ending with a trailing newline. Fix this by checking the return code of `parse_advance_expected_str`.
ad5a909c	2017-03-14T09:39:37	patch_parse: fix parsing minimal trailing diff line In a diff, the shortest possible hunk with a modification (that is, no deletion) results from a file with only one line with a single character which is removed. Thus the following hunk @@ -1 +1 @@ -a + is the shortest valid hunk modifying a line. The function parsing the hunk body though assumes that there must always be at least 4 bytes present to make up a valid hunk, which is obviously wrong in this case. The absolute minimum number of bytes required for a modification is actually 2 bytes, that is the "+" and the following newline. Note: if there is no trailing newline, the assumption will not be offended as the diff will have a line "\ No trailing newline" at its end. This patch fixes the issue by lowering the amount of bytes required.
613381fc	2016-11-15T13:33:05	patch_parse: fix memory leak
c77a55a9	2016-11-14T10:05:31	common: use PRIuZ for size_t in `giterr_set` calls
adedac5a	2016-09-02T02:03:45	diff: treat binary patches with no data special When creating and printing diffs, deal with binary deltas that have binary data specially, versus diffs that have a binary file but lack the actual binary data.
b859faa6	2016-08-23T23:38:39	Teach `git_patch_from_diff` about parsed diffs Ensure that `git_patch_from_diff` can return the patch for parsed diffs, not just generate a patch for a generated diff.
002c8e29	2016-08-03T17:09:41	git_diff_file: move `id_abbrev` Move `id_abbrev` to a more reasonable place where it packs more nicely (before anybody starts using it).
c065f6a1	2016-07-14T23:04:47	apply: check allocation properly
1a79cd95	2016-04-26T01:18:01	patch: show copy information for identical copies When showing copy information because we are duplicating contents, for example, when performing a `diff --find-copies-harder -M100 -B100`, then show copy from/to lines in a patch, and do not show context. Ensure that we can also parse such patches.
38a347ea	2016-04-25T17:52:39	patch::parse: handle patches with no hunks Patches may have no hunks when there's no modifications (for example, in a rename). Handle them.
853e585f	2016-04-25T16:32:30	patch: zero id and abbrev length for empty files
33ae8762	2016-04-25T13:07:18	patch: identify non-binary patches as `NOT_BINARY`
7166bb16	2016-04-25T00:35:48	introduce `git_diff_from_buffer` to parse diffs Parse diff files into a `git_diff` structure.
94e488a0	2016-04-24T16:14:25	patch: differentiate not found and invalid patches
17572f67	2016-04-21T00:04:14	git_patch_parse_ctx: refcount the context
aa4bfb32	2016-02-07T15:08:16	parse: introduce parse_ctx_contains_s
440e3bae	2015-11-21T12:27:03	patch: `git_patch_from_patchfile` -> `git_patch_from_buffer`
00e63b36	2015-11-21T12:37:01	patch: provide static string `advance_expected`
4117a235	2015-09-24T10:32:15	patch parse: dup the patch from the callers
6278fbc5	2015-09-24T09:40:42	patch parsing: squash some memory leaks
f941f035	2015-09-24T09:25:10	patch: drop some warnings
82175084	2015-09-23T13:40:12	Introduce git_patch_options, handle prefixes Handle prefixes (in terms of number of path components) for patch parsing.
19e46645	2015-09-23T11:07:04	patch printing: include rename information
d536ceac	2015-09-23T10:47:34	patch_parse: don't set new mode when deleted
28f70443	2015-09-23T10:38:51	patch_parse: use names from `diff --git` header When a text file is added or deleted, use the file names from the `diff --git` header instead of the `---` or `+++` lines. This is for compatibility with git.
1462c95a	2015-09-23T09:54:25	patch_parse: set binary flag We may have parsed binary data, set the `SHOW_BINARY` flag which indicates that we have actually computed a binary diff.
bc6a31c9	2015-09-22T18:29:14	patch: when parsing, set nfiles correctly in delta
d68cb736	2015-09-22T18:25:03	diff: include oid length in deltas Now that `git_diff_delta` data can be produced by reading patch file data, which may have an abbreviated oid, allow consumers to know that the id is abbreviated.
e7ec327d	2015-09-22T17:56:42	patch parse: unset path prefix
b85bd8ce	2015-09-16T11:37:03	patch: use delta's old_file/new_file members No need to replicate the old_file/new_file members, or plumb them strangely up.
804d5fe9	2015-09-11T08:37:12	patch: abstract patches into diff'ed and parsed Patches can now come from a variety of sources - either internally generated (from diffing two commits) or as the results of parsing some external data.

c6f9ad73

2019-12-13T13:18:54

patch_parse: fix undefined behaviour due to arithmetic on NULL pointers Doing arithmetic with NULL pointers is undefined behaviour in the C standard. We do so regardless when parsing patches, as we happily add a potential prefix length to prefixed paths. While this works out just fine as the prefix length is always equal to zero in these cases, thus resulting in another NULL pointer, it still is undefined behaviour and was pointed out to us by OSSfuzz. Fix the issue by checking whether paths are NULL, avoiding the arithmetic if they are.

33e6c402

2019-11-28T15:26:36

patch_parse: fix out-of-bounds reads caused by integer underflow The patch format for binary files is a simple Base85 encoding with a length byte as prefix that encodes the current line's length. For each line, we thus check whether the line's actual length matches its expected length in order to not faultily apply a truncated patch. This also acts as a check to verify that we're not reading outside of the line's string: if (encoded_len > ctx->parse_ctx.line_len - 1) { error = git_parse_err(...); goto done; } There is the possibility for an integer underflow, though. Given a line with a single prefix byte, only, `line_len` will be zero when reaching this check. As a result, subtracting one from that will result in an integer underflow, causing us to assume that there's a wealth of bytes available later on. Naturally, this may result in an out-of-bounds read. Fix the issue by checking both `encoded_len` and `line_len` for a non-zero value. The binary format doesn't make use of zero-length lines anyway, so we need to know that there are both encoded bytes and remaining characters available at all. This patch also adds a test that works based on the last error message. Checking error messages is usually too tightly coupled, but in fact parsing the patch failed even before the change. Thus the only possibility is to use e.g. Valgrind, but that'd result in us not catching issues when run without Valgrind. As a result, using the error message is considered a viable tradeoff as we know that we didn't start decoding Base85 in the first place.

fb439c97

2019-11-28T14:41:58

Merge pull request #5306 from herrerog/patchid diff: complete support for git patchid

fefefd1d

2019-06-23T16:42:14

odb: use `git_object_size_t` for object size Instead of using a signed type (`off_t`) use a new `git_object_size_t` for the sizes of objects.

048e94ad

2019-11-07T14:13:14

patch_parse: correct parsing of patch containing not shown binary data. When not shown binary data is added or removed in a patch, patch parser is currently returning 'error -1 - corrupt git binary header at line 4'. Fix it by correctly handling case where binary data is added/removed. Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>

de7659cc

2019-11-10T18:44:56

patch_parse: use paths from "---"/"+++" lines for binary patches For some patches, it is not possible to derive the old and new file paths from the patch header's first line, most importantly when they contain spaces. In such a case, we derive both paths from the "---" and "+++" lines, which allow for non-ambiguous parsing. We fail to use these paths when parsing binary patches without data, though, as we always expect the header paths to be filled in. Fix this by using the "---"/"+++" paths by default and only fall back to header paths if they aren't set. If neither of those paths are set, we just return an error. Add two tests to verify this behaviour, one of which would have previously caused a segfault.

de543e29

2019-11-05T22:44:27

patch_parse: fix segfault when header path contains whitespace only When parsing header paths from a patch, we reject any patches with empty paths as malformed patches. We perform the check whether a path is empty before sanitizing it, though, which may lead to a path becoming empty after the check, e.g. if we have trimmed whitespace. This may lead to a segfault later when any part of our patching logic actually references such a path, which may then be a `NULL` pointer. Fix the issue by performing the check after sanitizing. Add tests to catch the issue as they would have produced a segfault previosuly.

37141ff7

2019-10-21T18:56:59

patch_parse: detect overflow when calculating old/new line position When the patch contains lines close to INT_MAX, then it may happen that we end up with an integer overflow when calculating the line of the current diff hunk. Reject such patches as unreasonable to avoid the integer overflow. As the calculation is performed on integers, we introduce two new helpers `git__add_int_overflow` and `git__sub_int_overflow` that perform the integer overflow check in a generic way.

468e3ddc

2019-10-19T16:48:11

patch_parse: fix out-of-bounds read with No-NL lines We've got two locations where we copy lines into the patch. The first one is when copying normal " ", "-" or "+" lines, while the second location gets executed when we copy "\ No newline at end of file" lines. While the first one correctly uses `git__strndup` to copy only until the newline, the other one doesn't. Thus, if the line occurs at the end of the patch and if there is no terminating NUL character, then it may result in an out-of-bounds read. Fix the issue by using `git__strndup`, as was already done in the other location. Furthermore, add allocation checks to both locations to detect out-of-memory situations.

6c6c15e9

2019-10-19T15:52:35

patch_parse: reject empty path names When parsing patch headers, we currently accept empty path names just fine, e.g. a line "--- \n" would be parsed as the empty filename. This is not a valid patch format and may cause `NULL` pointer accesses at a later place as `git_buf_detach` will return `NULL` in that case. Reject such patches as malformed with a nice error message.

223e7e43

2019-10-19T15:42:54

patch_parse: reject patches with multiple old/new paths It's currently possible to have patches with multiple old path name headers. As we didn't check for this case, this resulted in a memory leak when overwriting the old old path with the new old path because we simply discarded the old pointer. Instead of fixing this by free'ing the old pointer, we should reject such patches altogether. It doesn't make any sense for the "---" or "+++" markers to occur multiple times within a patch n the first place. This also implicitly fixes the memory leak.

11de594f

2019-10-16T22:11:33

patch_parse: handle patches without extended headers Extended header lines (especially the "index <hash>..<hash> <mode>") are not required by "git apply" so it import patches. So we allow the from-file/to-file lines (--- a/file\n+++ b/file) to directly follow the git diff header. This fixes #5267.

b61810bf

2019-09-28T15:52:25

patch_parse: handle patches with new empty files Patches containing additions of empty files will not contain diff data but will end with the index header line followed by the terminating sequence "-- ". We follow the same logic as in cc4c44a and allow "-- " to immediately follow the index header.

27b8b31e

2019-08-01T11:57:03

parse: remove use of variadic macros which are not C89 compliant The macro `git_parse_error` is implemented in a variadic way so that it's possible to pass printf-style parameters. Unfortunately, variadic macros are not defined by C89 and thus we cannot use that functionality. But as we have implemented `git_error_vset` in the previous commit, we can now just use that instead. Convert `git_parse_error` to a variadic function and use `git_error_vset` to fix the compliance violation. While at it, move the function to "patch_parse.c".

a613832e

2019-07-20T18:49:48

patch_parse: fix segfault due to line containing static contents With commit dedf70ad2 (patch_parse: do not depend on parsed buffer's lifetime, 2019-07-05), all lines of the patch are allocated with `strdup` to make lifetime of the parsed patch independent of the buffer that is currently being parsed. In patch b08932824 (patch_parse: ensure valid patch output with EOFNL, 2019-07-11), we introduced another code location where we add lines to the parsed patch. But as that one was implemented via a separate pull request, it wasn't converted to use `strdup`, as well. As a consequence, we generate a segfault when trying to deallocate the potentially static buffer that's now in some of the lines. Use `git__strdup` to fix the issue.

fd7a384b

2019-07-20T11:24:37

Merge pull request #5159 from pks-t/pks/patch-parse-old-missing-nl patch_parse: handle missing newline indicator in old file

b0893282

2019-07-11T12:12:04

patch_parse: ensure valid patch output with EOFNL

3f855fe8

2019-07-05T11:06:33

patch_parse: handle missing newline indicator in old file When either the old or new file contents have no newline at the end of the file, then git-diff(1) will print out a "\ No newline at end of file" indicator. While we do correctly handle this in the case where the new file has this indcator, we fail to parse patches where the old file is missing a newline at EOF. Fix this bug by handling and missing newline indicators in the old file. Add tests to verify that we can parse such files.

dedf70ad

2019-07-05T09:35:43

patch_parse: do not depend on parsed buffer's lifetime When parsing a patch from a buffer, we let the patch lines point into the original buffer. While this is efficient use of resources, this also ties the lifetime of the parsed patch to the parsed buffer. As this behaviour is not documented anywhere in our API it is very surprising to its users. Untie the lifetime by duplicating the lines into the parsed patch. Add a test that verifies that lifetimes are indeed independent of each other.

f1d73189

2019-05-20T07:02:50

patch: use size_t for size when parsing

30c06b60

2019-03-22T23:56:10

patch_parse.c: Handle CRLF in parse_header_start

b3497344

2019-03-29T12:15:20

patch_parse: fix parsing addition/deletion of file with space The diff header format is a strange beast in that it is inherently unparseable in an unambiguous way. While parsing a/file.txt b/file.txt is obvious and trivially doable, parsing a diff header of a/file b/file ab.txt b/file b/file ab.txt is not (but in fact valid and created by git.git). Due to that, we have relaxed our diff header parser in commit 80226b5f6 (patch_parse: allow parsing ambiguous patch headers, 2017-09-22), so that we started to bail out when seeing diff headers with spaces in their file names. Instead, we try to use the "---" and "+++" lines, which are unambiguous. In some cases, though, we neither have a useable file name from the header nor from the "---" or "+++" lines. This is the case when we have a deletion or addition of a file with spaces: the header is unparseable and the other lines will simply show "/dev/null". This trips our parsing logic when we try to extract the prefix (the "a/" part) that is being used in the path line, where we unconditionally try to dereference a NULL pointer in such a scenario. We can fix this by simply not trying to parse the prefix in cases where we have no useable path name. That'd leave the parsed patch without either `old_prefix` or `new_prefix` populated. But in fact such cases are already handled by users of the patch object, which simply opt to use the default prefixes in that case.

131cd9b1

2019-03-29T11:58:50

patch_parse: improve formatting

80c3867b

2019-01-20T19:20:12

patch: explicitly cast down in parse_header_percent Quiet down a warning from MSVC about how we're potentially losing data. This is safe since we've explicitly tested that it's within the range of 0-100.

f673e232

2018-12-27T13:47:34

git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.

4b84db6a

2018-11-14T12:33:38

patch_parse: remove unused function `parse_number` The function `parse_number` was replaced by `git_parse_advance_digit` which is provided by the parser interface in commit 252f2eeee (parse: implement and use `git_parse_advance_digit`, 2017-07-14). As there are no remaining callers, remove it.

f9e28026

2018-06-18T20:37:18

patch_parse: populate line numbers while parsing diffs

ecf4f33a

2018-02-08T11:14:48

Convert usage of `git_buf_free` to new `git_buf_dispose`

06b8a40f

2018-02-16T11:29:46

Explicitly mark fallthrough cases with comments A lot of compilers nowadays generate warnings when there are cases in a switch statement which implicitly fall through to the next case. To avoid this warning, the last line in the case that is falling through can have a comment matching a regular expression, where one possible comment body would be `/* fall through */`. An alternative to the comment would be an explicit attribute like e.g. `[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only introduced support for such an attribute recently with GCC 7. Thus, and also because the fallthrough comment is supported by most compilers, we settle for using comments instead. One shortcoming of that method is that compilers are very strict about that. Most interestingly, that comment _really_ has to be the last line. In case a closing brace follows the comment, the heuristic will fail.

4110fc84

2017-12-23T23:30:29

Merge pull request #4285 from pks-t/pks/patches-with-whitespace patch_parse: fix parsing unquoted filenames with spaces

585b5dac

2017-11-18T15:43:11

refcount: make refcounting conform to aliasing rules Strict aliasing rules dictate that for most data types, you are not allowed to cast them to another data type and then access the casted pointers. While this works just fine for most compilers, technically we end up in undefined behaviour when we hurt that rule. Our current refcounting code makes heavy use of casting and thus violates that rule. While we didn't have any problems with that code, Travis started spitting out a lot of warnings due to a change in their toolchain. In the refcounting case, the code is also easy to fix: as all refcounting-statements are actually macros, we can just access the `rc` field directly instead of casting. There are two outliers in our code where that doesn't work. Both the `git_diff` and `git_patch` structures have specializations for generated and parsed diffs/patches, which directly inherit from them. Because of that, the refcounting code is only part of the base structure and not of the children themselves. We can help that by instead passing their base into `GIT_REFCOUNT_INC`, though.

80226b5f

2017-09-22T13:39:05

patch_parse: allow parsing ambiguous patch headers The git patch format allows for having unquoted paths with whitespaces inside. This format becomes ambiguous to parse, e.g. in the following example: diff --git a/file b/with spaces.txt b/file b/with spaces.txt While we cannot parse this in a correct way, we can instead use the "---" and "+++" lines to retrieve the file names, as the path is not followed by anything here but spans the complete remaining line. Because of this, we can simply bail outwhen parsing the "diff --git" header here without an actual error and then proceed to just take the paths from the other headers.

3892f70d

2017-09-22T13:26:47

patch_parse: treat complete line after "---"/"+++" as path When parsing the "---" and "+++" line, we stop after the first whitespace inside of the filename. But as files containing whitespaces do not need to be quoted, we should instead use the complete line here. This fixes parsing patches with unquoted paths with whitespaces.

7bdfc0a6

2017-07-14T15:33:32

parse: always initialize line pointer Upon initializing the parser context, we do not currently initialize the current line, line length and line number. Do so in order to make the interface easier to use and more obvious for future consumers of the parsing API.

e72cb769

2017-07-14T14:37:07

parse: implement `git_parse_peek` Some code parts need to inspect the next few bytes without actually consuming it yet, for example to examine what content it has to expect next. Create a new function `git_parse_peek` which returns the next byte without modifying the parsing context and use it at multiple call sites.

252f2eee

2017-07-14T13:45:05

parse: implement and use `git_parse_advance_digit` The patch parsing code has multiple recurring patterns where we want to parse an actual number. Create a new function `git_parse_advance_digit` and use it to avoid code duplication.

65dcb645

2017-07-14T13:29:29

patch_parse: use git_parse_contains_s Instead of manually checking the parsing context's remaining length and comparing the leading bytes with a specific string, we can simply re-use the function `git_parse_ctx_contains_s`. Do so to avoid code duplication and to further decouple patch parsing from the parsing context's struct members.

ef1395f3

2017-11-11T15:30:43

parse: extract parse module The `git_patch_parse_ctx` encapsulates both parser state as well as options specific to patch parsing. To advance this state and keep it consistent, we provide a few functions which handle advancing the current position and accessing bytes of the patch contents. In fact, these functions are quite generic and not related to patch-parsing by themselves. Seeing that we have similar logic inside of other modules, it becomes quite enticing to extract this functionality into its own parser module. To do so, we create a new module `parse` with a central struct called `git_parse_ctx`. It encapsulates both the content that is to be parsed as well as its lengths and the current position. `git_patch_parse_ctx` now only contains this `parse_ctx` only, which is then accessed whenever we need to touch the current parser. This is the first step towards re-using this functionality across other modules which require parsing functionality and remove code-duplication.

cc4c44a9

2017-09-01T09:37:05

patch_parse: fix parsing patches only containing exact renames Patches which contain exact renames only will not contain an actual diff body, but only a list of files that were renamed. Thus, the patch header is immediately followed by the terminating sequence "-- ". We currently do not recognize this character sequence as a possible terminating sequence. Add it and create a test to catch the failure.

57bc9dab

2017-07-14T10:57:49

patch_parse: implement state machine for parsing patch headers Our code parsing Git patch headers is rather lax in parsing headers of a Git-style patch. Most notably, we do not care for the exact order in which header lines appear and as such, we may parse patch files which are not really valid after all. Furthermore, the state transitions inside of the parser are not as obvious as they could be, making it harder than required to follow its logic. To improve upon this situation, this patch introduces a real state machine to parse the patches. Instead of simply parsing each line without caring for previous state and the exact ordering, we define a set of states with their allowed transitions. This makes the patch parser more strict in only allowing valid successions of header lines. As the transition table is defined inside of a single structure with the expected line, required state as well as the state that we end up in, all state transitions are immediately obvious from just having a look at this structure. This improves both maintainability and eases reasoning about the patch parser.

0c7f49dd

2017-06-30T13:39:01

Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.

723bdf48

2017-03-20T09:35:23

patch_parse: check if advancing over header newline succeeds While parsing patch header lines, we iterate over each line and check if the line has trailing garbage. What we do not check though is that the line is actually a line ending with a trailing newline. Fix this by checking the return code of `parse_advance_expected_str`.

ad5a909c

2017-03-14T09:39:37

patch_parse: fix parsing minimal trailing diff line In a diff, the shortest possible hunk with a modification (that is, no deletion) results from a file with only one line with a single character which is removed. Thus the following hunk @@ -1 +1 @@ -a + is the shortest valid hunk modifying a line. The function parsing the hunk body though assumes that there must always be at least 4 bytes present to make up a valid hunk, which is obviously wrong in this case. The absolute minimum number of bytes required for a modification is actually 2 bytes, that is the "+" and the following newline. Note: if there is no trailing newline, the assumption will not be offended as the diff will have a line "\ No trailing newline" at its end. This patch fixes the issue by lowering the amount of bytes required.

613381fc

2016-11-15T13:33:05

patch_parse: fix memory leak

c77a55a9

2016-11-14T10:05:31

common: use PRIuZ for size_t in `giterr_set` calls

adedac5a

2016-09-02T02:03:45

diff: treat binary patches with no data special When creating and printing diffs, deal with binary deltas that have binary data specially, versus diffs that have a binary file but lack the actual binary data.

b859faa6

2016-08-23T23:38:39

Teach `git_patch_from_diff` about parsed diffs Ensure that `git_patch_from_diff` can return the patch for parsed diffs, not just generate a patch for a generated diff.

002c8e29

2016-08-03T17:09:41

git_diff_file: move `id_abbrev` Move `id_abbrev` to a more reasonable place where it packs more nicely (before anybody starts using it).

c065f6a1

2016-07-14T23:04:47

apply: check allocation properly

1a79cd95

2016-04-26T01:18:01

patch: show copy information for identical copies When showing copy information because we are duplicating contents, for example, when performing a `diff --find-copies-harder -M100 -B100`, then show copy from/to lines in a patch, and do not show context. Ensure that we can also parse such patches.

38a347ea

2016-04-25T17:52:39

patch::parse: handle patches with no hunks Patches may have no hunks when there's no modifications (for example, in a rename). Handle them.

853e585f

2016-04-25T16:32:30

patch: zero id and abbrev length for empty files

33ae8762

2016-04-25T13:07:18

patch: identify non-binary patches as `NOT_BINARY`

7166bb16

2016-04-25T00:35:48

introduce `git_diff_from_buffer` to parse diffs Parse diff files into a `git_diff` structure.

94e488a0

2016-04-24T16:14:25

patch: differentiate not found and invalid patches

17572f67

2016-04-21T00:04:14

git_patch_parse_ctx: refcount the context

aa4bfb32

2016-02-07T15:08:16

parse: introduce parse_ctx_contains_s

440e3bae

2015-11-21T12:27:03

patch: `git_patch_from_patchfile` -> `git_patch_from_buffer`

00e63b36

2015-11-21T12:37:01

patch: provide static string `advance_expected`

4117a235

2015-09-24T10:32:15

patch parse: dup the patch from the callers

6278fbc5

2015-09-24T09:40:42

patch parsing: squash some memory leaks

f941f035

2015-09-24T09:25:10

patch: drop some warnings

82175084

2015-09-23T13:40:12

Introduce git_patch_options, handle prefixes Handle prefixes (in terms of number of path components) for patch parsing.

19e46645

2015-09-23T11:07:04

patch printing: include rename information

d536ceac

2015-09-23T10:47:34

patch_parse: don't set new mode when deleted

28f70443

2015-09-23T10:38:51

patch_parse: use names from `diff --git` header When a text file is added or deleted, use the file names from the `diff --git` header instead of the `---` or `+++` lines. This is for compatibility with git.

1462c95a

2015-09-23T09:54:25

patch_parse: set binary flag We may have parsed binary data, set the `SHOW_BINARY` flag which indicates that we have actually computed a binary diff.

bc6a31c9

2015-09-22T18:29:14

patch: when parsing, set nfiles correctly in delta

d68cb736

2015-09-22T18:25:03

diff: include oid length in deltas Now that `git_diff_delta` data can be produced by reading patch file data, which may have an abbreviated oid, allow consumers to know that the id is abbreviated.

e7ec327d

2015-09-22T17:56:42

patch parse: unset path prefix

b85bd8ce

2015-09-16T11:37:03

patch: use delta's old_file/new_file members No need to replicate the old_file/new_file members, or plumb them strangely up.

804d5fe9

2015-09-11T08:37:12

patch: abstract patches into diff'ed and parsed Patches can now come from a variety of sources - either internally generated (from diffing two commits) or as the results of parsing some external data.

thodg/libgit2/src/patch_parse.c

src/patch_parse.c

Log