tests/patch


Log

Author Commit Date CI Message
Edward Thomson ca14942e 2021-11-11T13:28:08 tests: declare functions statically where appropriate
Edward Thomson f0e693b1 2021-09-07T17:53:49 str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
Patrick Steinhardt 5f47cb48 2020-03-26T14:16:41 patch: correctly handle mode changes for renames When generating a patch for a renamed file whose mode bits have changed in addition to the rename, then we currently fail to parse the generated patch. Furthermore, when generating a diff we output mode bits after the similarity metric, which is different to how upstream git handles it. Fix both issues by adding another state transition that allows similarity indices after mode changes and by printing mode changes before the similarity index.
Patrick Steinhardt 33e6c402 2019-11-28T15:26:36 patch_parse: fix out-of-bounds reads caused by integer underflow The patch format for binary files is a simple Base85 encoding with a length byte as prefix that encodes the current line's length. For each line, we thus check whether the line's actual length matches its expected length in order to not faultily apply a truncated patch. This also acts as a check to verify that we're not reading outside of the line's string: if (encoded_len > ctx->parse_ctx.line_len - 1) { error = git_parse_err(...); goto done; } There is the possibility for an integer underflow, though. Given a line with a single prefix byte, only, `line_len` will be zero when reaching this check. As a result, subtracting one from that will result in an integer underflow, causing us to assume that there's a wealth of bytes available later on. Naturally, this may result in an out-of-bounds read. Fix the issue by checking both `encoded_len` and `line_len` for a non-zero value. The binary format doesn't make use of zero-length lines anyway, so we need to know that there are both encoded bytes and remaining characters available at all. This patch also adds a test that works based on the last error message. Checking error messages is usually too tightly coupled, but in fact parsing the patch failed even before the change. Thus the only possibility is to use e.g. Valgrind, but that'd result in us not catching issues when run without Valgrind. As a result, using the error message is considered a viable tradeoff as we know that we didn't start decoding Base85 in the first place.
Gregory Herrero ece5bb5e 2019-11-07T14:10:00 diff: make patchid computation work with all types of commits. Current implementation of patchid is not computing a correct patchid when given a patch where, for example, a new file is added or removed. Some more corner cases need to be handled to have same behavior as git patch-id command. Add some more tests to cover those corner cases. Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
Gregory Herrero 048e94ad 2019-11-07T14:13:14 patch_parse: correct parsing of patch containing not shown binary data. When not shown binary data is added or removed in a patch, patch parser is currently returning 'error -1 - corrupt git binary header at line 4'. Fix it by correctly handling case where binary data is added/removed. Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
Patrick Steinhardt de7659cc 2019-11-10T18:44:56 patch_parse: use paths from "---"/"+++" lines for binary patches For some patches, it is not possible to derive the old and new file paths from the patch header's first line, most importantly when they contain spaces. In such a case, we derive both paths from the "---" and "+++" lines, which allow for non-ambiguous parsing. We fail to use these paths when parsing binary patches without data, though, as we always expect the header paths to be filled in. Fix this by using the "---"/"+++" paths by default and only fall back to header paths if they aren't set. If neither of those paths are set, we just return an error. Add two tests to verify this behaviour, one of which would have previously caused a segfault.
Patrick Steinhardt de543e29 2019-11-05T22:44:27 patch_parse: fix segfault when header path contains whitespace only When parsing header paths from a patch, we reject any patches with empty paths as malformed patches. We perform the check whether a path is empty before sanitizing it, though, which may lead to a path becoming empty after the check, e.g. if we have trimmed whitespace. This may lead to a segfault later when any part of our patching logic actually references such a path, which may then be a `NULL` pointer. Fix the issue by performing the check after sanitizing. Add tests to catch the issue as they would have produced a segfault previosuly.
Patrick Steinhardt 37141ff7 2019-10-21T18:56:59 patch_parse: detect overflow when calculating old/new line position When the patch contains lines close to INT_MAX, then it may happen that we end up with an integer overflow when calculating the line of the current diff hunk. Reject such patches as unreasonable to avoid the integer overflow. As the calculation is performed on integers, we introduce two new helpers `git__add_int_overflow` and `git__sub_int_overflow` that perform the integer overflow check in a generic way.
Patrick Steinhardt 468e3ddc 2019-10-19T16:48:11 patch_parse: fix out-of-bounds read with No-NL lines We've got two locations where we copy lines into the patch. The first one is when copying normal " ", "-" or "+" lines, while the second location gets executed when we copy "\ No newline at end of file" lines. While the first one correctly uses `git__strndup` to copy only until the newline, the other one doesn't. Thus, if the line occurs at the end of the patch and if there is no terminating NUL character, then it may result in an out-of-bounds read. Fix the issue by using `git__strndup`, as was already done in the other location. Furthermore, add allocation checks to both locations to detect out-of-memory situations.
Patrick Steinhardt 6c6c15e9 2019-10-19T15:52:35 patch_parse: reject empty path names When parsing patch headers, we currently accept empty path names just fine, e.g. a line "--- \n" would be parsed as the empty filename. This is not a valid patch format and may cause `NULL` pointer accesses at a later place as `git_buf_detach` will return `NULL` in that case. Reject such patches as malformed with a nice error message.
Patrick Steinhardt 223e7e43 2019-10-19T15:42:54 patch_parse: reject patches with multiple old/new paths It's currently possible to have patches with multiple old path name headers. As we didn't check for this case, this resulted in a memory leak when overwriting the old old path with the new old path because we simply discarded the old pointer. Instead of fixing this by free'ing the old pointer, we should reject such patches altogether. It doesn't make any sense for the "---" or "+++" markers to occur multiple times within a patch n the first place. This also implicitly fixes the memory leak.
Denis Laxalde 11de594f 2019-10-16T22:11:33 patch_parse: handle patches without extended headers Extended header lines (especially the "index <hash>..<hash> <mode>") are not required by "git apply" so it import patches. So we allow the from-file/to-file lines (--- a/file\n+++ b/file) to directly follow the git diff header. This fixes #5267.
Max Kostyukevich 585fbd74 2019-08-28T23:18:31 apply: Test for EOFNL mishandling when several hunks are processed Introduce an unit test to validate that git_apply__patch() properly handles EOFNL changes in case of patches with several hunks.
Edward Thomson fd7a384b 2019-07-20T11:24:37 Merge pull request #5159 from pks-t/pks/patch-parse-old-missing-nl patch_parse: handle missing newline indicator in old file
Erik Aigner b0893282 2019-07-11T12:12:04 patch_parse: ensure valid patch output with EOFNL
Patrick Steinhardt 3f855fe8 2019-07-05T11:06:33 patch_parse: handle missing newline indicator in old file When either the old or new file contents have no newline at the end of the file, then git-diff(1) will print out a "\ No newline at end of file" indicator. While we do correctly handle this in the case where the new file has this indcator, we fail to parse patches where the old file is missing a newline at EOF. Fix this bug by handling and missing newline indicators in the old file. Add tests to verify that we can parse such files.
Patrick Steinhardt dedf70ad 2019-07-05T09:35:43 patch_parse: do not depend on parsed buffer's lifetime When parsing a patch from a buffer, we let the patch lines point into the original buffer. While this is efficient use of resources, this also ties the lifetime of the parsed patch to the parsed buffer. As this behaviour is not documented anywhere in our API it is very surprising to its users. Untie the lifetime by duplicating the lines into the parsed patch. Add a test that verifies that lifetimes are indeed independent of each other.
Drew DeVault 30c06b60 2019-03-22T23:56:10 patch_parse.c: Handle CRLF in parse_header_start
Erik Aigner 9d65360b 2019-03-29T12:30:37 tests: diff: test parsing diffs with a new file with spaces in its path Add a test that verifies that we are able to parse patches which add a new file that has spaces in its path.
Jason Haslam 72630572 2017-03-30T22:40:47 patch: add support for partial patch application Add hunk callback parameter to git_apply__patch to allow hunks to be skipped.
Patrick Steinhardt ecf4f33a 2018-02-08T11:14:48 Convert usage of `git_buf_free` to new `git_buf_dispose`
Patrick Steinhardt 80226b5f 2017-09-22T13:39:05 patch_parse: allow parsing ambiguous patch headers The git patch format allows for having unquoted paths with whitespaces inside. This format becomes ambiguous to parse, e.g. in the following example: diff --git a/file b/with spaces.txt b/file b/with spaces.txt While we cannot parse this in a correct way, we can instead use the "---" and "+++" lines to retrieve the file names, as the path is not followed by anything here but spans the complete remaining line. Because of this, we can simply bail outwhen parsing the "diff --git" header here without an actual error and then proceed to just take the paths from the other headers.
Patrick Steinhardt 89a34828 2017-06-16T13:34:43 diff: implement function to calculate patch ID The upstream git project provides the ability to calculate a so-called patch ID. Quoting from git-patch-id(1): A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a patch, with whitespace and line numbers ignored." Patch IDs can be used to identify two patches which are probably the same thing, e.g. when a patch has been cherry-picked to another branch. This commit implements a new function `git_diff_patchid`, which gets a patch and derives an OID from the diff. Note the different terminology here: a patch in libgit2 are the differences in a single file and a diff can contain multiple patches for different files. The implementation matches the upstream implementation and should derive the same OID for the same diff. In fact, some code has been directly derived from the upstream implementation. The upstream implementation has two different modes to calculate patch IDs, which is the stable and unstable mode. The old way of calculating the patch IDs was unstable in a sense that a different ordering the diffs was leading to different results. This oversight was fixed in git 1.9, but as git tries hard to never break existing workflows, the old and unstable way is still default. The newer and stable way does not care for ordering of the diff hunks, and in fact it is the mode that should probably be used today. So right now, we only implement the stable way of generating the patch ID.
Edward Thomson adedac5a 2016-09-02T02:03:45 diff: treat binary patches with no data special When creating and printing diffs, deal with binary deltas that have binary data specially, versus diffs that have a binary file but lack the actual binary data.
Edward Thomson 94e488a0 2016-04-24T16:14:25 patch: differentiate not found and invalid patches
Edward Thomson 17572f67 2016-04-21T00:04:14 git_patch_parse_ctx: refcount the context
Edward Thomson 440e3bae 2015-11-21T12:27:03 patch: `git_patch_from_patchfile` -> `git_patch_from_buffer`
Edward Thomson 0ff723cc 2015-09-25T12:09:50 apply: test postimages that grow/shrink original Test with some postimages that actually grow/shrink from the original, adding new lines or removing them. (Also do so without context to ensure that we can add/remove from a non-zero part of the line vector.)
Edward Thomson 82175084 2015-09-23T13:40:12 Introduce git_patch_options, handle prefixes Handle prefixes (in terms of number of path components) for patch parsing.
Edward Thomson 2f3b922f 2015-09-22T18:54:10 patch_parse: test roundtrip patch parsing -> print
Edward Thomson 42b34428 2015-09-22T18:54:10 patch_parse: ensure we can parse a patch
Edward Thomson 8bca8b9e 2015-09-16T14:40:44 apply: move patch data to patch_common.h