|
c6f9ad73
|
2019-12-13T13:18:54
|
|
patch_parse: fix undefined behaviour due to arithmetic on NULL pointers
Doing arithmetic with NULL pointers is undefined behaviour in the C
standard. We do so regardless when parsing patches, as we happily add a
potential prefix length to prefixed paths. While this works out just
fine as the prefix length is always equal to zero in these cases, thus
resulting in another NULL pointer, it still is undefined behaviour and
was pointed out to us by OSSfuzz.
Fix the issue by checking whether paths are NULL, avoiding the
arithmetic if they are.
|
|
33e6c402
|
2019-11-28T15:26:36
|
|
patch_parse: fix out-of-bounds reads caused by integer underflow
The patch format for binary files is a simple Base85 encoding with a
length byte as prefix that encodes the current line's length. For each
line, we thus check whether the line's actual length matches its
expected length in order to not faultily apply a truncated patch. This
also acts as a check to verify that we're not reading outside of the
line's string:
if (encoded_len > ctx->parse_ctx.line_len - 1) {
error = git_parse_err(...);
goto done;
}
There is the possibility for an integer underflow, though. Given a line
with a single prefix byte, only, `line_len` will be zero when reaching
this check. As a result, subtracting one from that will result in an
integer underflow, causing us to assume that there's a wealth of bytes
available later on. Naturally, this may result in an out-of-bounds read.
Fix the issue by checking both `encoded_len` and `line_len` for a
non-zero value. The binary format doesn't make use of zero-length lines
anyway, so we need to know that there are both encoded bytes and
remaining characters available at all.
This patch also adds a test that works based on the last error message.
Checking error messages is usually too tightly coupled, but in fact
parsing the patch failed even before the change. Thus the only
possibility is to use e.g. Valgrind, but that'd result in us not
catching issues when run without Valgrind. As a result, using the error
message is considered a viable tradeoff as we know that we didn't start
decoding Base85 in the first place.
|
|
fb439c97
|
2019-11-28T14:41:58
|
|
Merge pull request #5306 from herrerog/patchid
diff: complete support for git patchid
|
|
fefefd1d
|
2019-06-23T16:42:14
|
|
odb: use `git_object_size_t` for object size
Instead of using a signed type (`off_t`) use a new `git_object_size_t`
for the sizes of objects.
|
|
048e94ad
|
2019-11-07T14:13:14
|
|
patch_parse: correct parsing of patch containing not shown binary data.
When not shown binary data is added or removed in a patch, patch parser
is currently returning 'error -1 - corrupt git binary header at line 4'.
Fix it by correctly handling case where binary data is added/removed.
Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
|
|
de7659cc
|
2019-11-10T18:44:56
|
|
patch_parse: use paths from "---"/"+++" lines for binary patches
For some patches, it is not possible to derive the old and new file
paths from the patch header's first line, most importantly when they
contain spaces. In such a case, we derive both paths from the "---" and
"+++" lines, which allow for non-ambiguous parsing. We fail to use these
paths when parsing binary patches without data, though, as we always
expect the header paths to be filled in.
Fix this by using the "---"/"+++" paths by default and only fall back to
header paths if they aren't set. If neither of those paths are set, we
just return an error. Add two tests to verify this behaviour, one of
which would have previously caused a segfault.
|
|
de543e29
|
2019-11-05T22:44:27
|
|
patch_parse: fix segfault when header path contains whitespace only
When parsing header paths from a patch, we reject any patches with empty
paths as malformed patches. We perform the check whether a path is empty
before sanitizing it, though, which may lead to a path becoming empty
after the check, e.g. if we have trimmed whitespace. This may lead to a
segfault later when any part of our patching logic actually references
such a path, which may then be a `NULL` pointer.
Fix the issue by performing the check after sanitizing. Add tests to
catch the issue as they would have produced a segfault previosuly.
|
|
37141ff7
|
2019-10-21T18:56:59
|
|
patch_parse: detect overflow when calculating old/new line position
When the patch contains lines close to INT_MAX, then it may happen that
we end up with an integer overflow when calculating the line of the
current diff hunk. Reject such patches as unreasonable to avoid the
integer overflow.
As the calculation is performed on integers, we introduce two new
helpers `git__add_int_overflow` and `git__sub_int_overflow` that perform
the integer overflow check in a generic way.
|
|
468e3ddc
|
2019-10-19T16:48:11
|
|
patch_parse: fix out-of-bounds read with No-NL lines
We've got two locations where we copy lines into the patch. The first
one is when copying normal " ", "-" or "+" lines, while the second
location gets executed when we copy "\ No newline at end of file" lines.
While the first one correctly uses `git__strndup` to copy only until the
newline, the other one doesn't. Thus, if the line occurs at the end of
the patch and if there is no terminating NUL character, then it may
result in an out-of-bounds read.
Fix the issue by using `git__strndup`, as was already done in the other
location. Furthermore, add allocation checks to both locations to detect
out-of-memory situations.
|
|
6c6c15e9
|
2019-10-19T15:52:35
|
|
patch_parse: reject empty path names
When parsing patch headers, we currently accept empty path names just
fine, e.g. a line "--- \n" would be parsed as the empty filename. This
is not a valid patch format and may cause `NULL` pointer accesses at a
later place as `git_buf_detach` will return `NULL` in that case.
Reject such patches as malformed with a nice error message.
|
|
223e7e43
|
2019-10-19T15:42:54
|
|
patch_parse: reject patches with multiple old/new paths
It's currently possible to have patches with multiple old path name
headers. As we didn't check for this case, this resulted in a memory
leak when overwriting the old old path with the new old path because we
simply discarded the old pointer.
Instead of fixing this by free'ing the old pointer, we should reject
such patches altogether. It doesn't make any sense for the "---" or
"+++" markers to occur multiple times within a patch n the first place.
This also implicitly fixes the memory leak.
|
|
11de594f
|
2019-10-16T22:11:33
|
|
patch_parse: handle patches without extended headers
Extended header lines (especially the "index <hash>..<hash> <mode>") are
not required by "git apply" so it import patches. So we allow the
from-file/to-file lines (--- a/file\n+++ b/file) to directly follow the
git diff header.
This fixes #5267.
|
|
b61810bf
|
2019-09-28T15:52:25
|
|
patch_parse: handle patches with new empty files
Patches containing additions of empty files will not contain diff data
but will end with the index header line followed by the terminating
sequence "-- ". We follow the same logic as in cc4c44a and allow "-- "
to immediately follow the index header.
|
|
27b8b31e
|
2019-08-01T11:57:03
|
|
parse: remove use of variadic macros which are not C89 compliant
The macro `git_parse_error` is implemented in a variadic way so
that it's possible to pass printf-style parameters.
Unfortunately, variadic macros are not defined by C89 and thus we
cannot use that functionality. But as we have implemented
`git_error_vset` in the previous commit, we can now just use that
instead.
Convert `git_parse_error` to a variadic function and use
`git_error_vset` to fix the compliance violation. While at it,
move the function to "patch_parse.c".
|
|
a613832e
|
2019-07-20T18:49:48
|
|
patch_parse: fix segfault due to line containing static contents
With commit dedf70ad2 (patch_parse: do not depend on parsed buffer's
lifetime, 2019-07-05), all lines of the patch are allocated with
`strdup` to make lifetime of the parsed patch independent of the buffer
that is currently being parsed. In patch b08932824 (patch_parse: ensure
valid patch output with EOFNL, 2019-07-11), we introduced another
code location where we add lines to the parsed patch. But as that one
was implemented via a separate pull request, it wasn't converted to use
`strdup`, as well. As a consequence, we generate a segfault when trying
to deallocate the potentially static buffer that's now in some of the
lines.
Use `git__strdup` to fix the issue.
|
|
fd7a384b
|
2019-07-20T11:24:37
|
|
Merge pull request #5159 from pks-t/pks/patch-parse-old-missing-nl
patch_parse: handle missing newline indicator in old file
|
|
b0893282
|
2019-07-11T12:12:04
|
|
patch_parse: ensure valid patch output with EOFNL
|
|
3f855fe8
|
2019-07-05T11:06:33
|
|
patch_parse: handle missing newline indicator in old file
When either the old or new file contents have no newline at the end of
the file, then git-diff(1) will print out a "\ No newline at end of
file" indicator. While we do correctly handle this in the case where the
new file has this indcator, we fail to parse patches where the old file
is missing a newline at EOF.
Fix this bug by handling and missing newline indicators in the old file.
Add tests to verify that we can parse such files.
|
|
dedf70ad
|
2019-07-05T09:35:43
|
|
patch_parse: do not depend on parsed buffer's lifetime
When parsing a patch from a buffer, we let the patch lines point into
the original buffer. While this is efficient use of resources, this also
ties the lifetime of the parsed patch to the parsed buffer. As this
behaviour is not documented anywhere in our API it is very surprising to
its users.
Untie the lifetime by duplicating the lines into the parsed patch. Add a
test that verifies that lifetimes are indeed independent of each other.
|
|
f1d73189
|
2019-05-20T07:02:50
|
|
patch: use size_t for size when parsing
|
|
30c06b60
|
2019-03-22T23:56:10
|
|
patch_parse.c: Handle CRLF in parse_header_start
|
|
b3497344
|
2019-03-29T12:15:20
|
|
patch_parse: fix parsing addition/deletion of file with space
The diff header format is a strange beast in that it is inherently
unparseable in an unambiguous way. While parsing
a/file.txt b/file.txt
is obvious and trivially doable, parsing a diff header of
a/file b/file ab.txt b/file b/file ab.txt
is not (but in fact valid and created by git.git).
Due to that, we have relaxed our diff header parser in commit 80226b5f6
(patch_parse: allow parsing ambiguous patch headers, 2017-09-22), so
that we started to bail out when seeing diff headers with spaces in
their file names. Instead, we try to use the "---" and "+++" lines,
which are unambiguous.
In some cases, though, we neither have a useable file name from the
header nor from the "---" or "+++" lines. This is the case when we have
a deletion or addition of a file with spaces: the header is unparseable
and the other lines will simply show "/dev/null". This trips our parsing
logic when we try to extract the prefix (the "a/" part) that is being
used in the path line, where we unconditionally try to dereference a
NULL pointer in such a scenario.
We can fix this by simply not trying to parse the prefix in cases where
we have no useable path name. That'd leave the parsed patch without
either `old_prefix` or `new_prefix` populated. But in fact such cases
are already handled by users of the patch object, which simply opt to
use the default prefixes in that case.
|
|
131cd9b1
|
2019-03-29T11:58:50
|
|
patch_parse: improve formatting
|
|
80c3867b
|
2019-01-20T19:20:12
|
|
patch: explicitly cast down in parse_header_percent
Quiet down a warning from MSVC about how we're potentially losing data.
This is safe since we've explicitly tested that it's within the range of
0-100.
|
|
f673e232
|
2018-12-27T13:47:34
|
|
git_error: use new names in internal APIs and usage
Move to the `git_error` name in the internal API for error-related
functions.
|
|
4b84db6a
|
2018-11-14T12:33:38
|
|
patch_parse: remove unused function `parse_number`
The function `parse_number` was replaced by `git_parse_advance_digit`
which is provided by the parser interface in commit 252f2eeee (parse:
implement and use `git_parse_advance_digit`, 2017-07-14). As there are
no remaining callers, remove it.
|
|
f9e28026
|
2018-06-18T20:37:18
|
|
patch_parse: populate line numbers while parsing diffs
|
|
ecf4f33a
|
2018-02-08T11:14:48
|
|
Convert usage of `git_buf_free` to new `git_buf_dispose`
|
|
06b8a40f
|
2018-02-16T11:29:46
|
|
Explicitly mark fallthrough cases with comments
A lot of compilers nowadays generate warnings when there are cases in a
switch statement which implicitly fall through to the next case. To
avoid this warning, the last line in the case that is falling through
can have a comment matching a regular expression, where one possible
comment body would be `/* fall through */`.
An alternative to the comment would be an explicit attribute like e.g.
`[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only
introduced support for such an attribute recently with GCC 7. Thus, and
also because the fallthrough comment is supported by most compilers, we
settle for using comments instead.
One shortcoming of that method is that compilers are very strict about
that. Most interestingly, that comment _really_ has to be the last line.
In case a closing brace follows the comment, the heuristic will fail.
|
|
4110fc84
|
2017-12-23T23:30:29
|
|
Merge pull request #4285 from pks-t/pks/patches-with-whitespace
patch_parse: fix parsing unquoted filenames with spaces
|
|
585b5dac
|
2017-11-18T15:43:11
|
|
refcount: make refcounting conform to aliasing rules
Strict aliasing rules dictate that for most data types, you are not
allowed to cast them to another data type and then access the casted
pointers. While this works just fine for most compilers, technically we
end up in undefined behaviour when we hurt that rule.
Our current refcounting code makes heavy use of casting and thus
violates that rule. While we didn't have any problems with that code,
Travis started spitting out a lot of warnings due to a change in their
toolchain. In the refcounting case, the code is also easy to fix:
as all refcounting-statements are actually macros, we can just access
the `rc` field directly instead of casting.
There are two outliers in our code where that doesn't work. Both the
`git_diff` and `git_patch` structures have specializations for generated
and parsed diffs/patches, which directly inherit from them. Because of
that, the refcounting code is only part of the base structure and not of
the children themselves. We can help that by instead passing their base
into `GIT_REFCOUNT_INC`, though.
|
|
80226b5f
|
2017-09-22T13:39:05
|
|
patch_parse: allow parsing ambiguous patch headers
The git patch format allows for having unquoted paths with whitespaces
inside. This format becomes ambiguous to parse, e.g. in the following
example:
diff --git a/file b/with spaces.txt b/file b/with spaces.txt
While we cannot parse this in a correct way, we can instead use the
"---" and "+++" lines to retrieve the file names, as the path is not
followed by anything here but spans the complete remaining line. Because
of this, we can simply bail outwhen parsing the "diff --git" header here
without an actual error and then proceed to just take the paths from the
other headers.
|
|
3892f70d
|
2017-09-22T13:26:47
|
|
patch_parse: treat complete line after "---"/"+++" as path
When parsing the "---" and "+++" line, we stop after the first
whitespace inside of the filename. But as files containing whitespaces
do not need to be quoted, we should instead use the complete line here.
This fixes parsing patches with unquoted paths with whitespaces.
|
|
7bdfc0a6
|
2017-07-14T15:33:32
|
|
parse: always initialize line pointer
Upon initializing the parser context, we do not currently initialize the
current line, line length and line number. Do so in order to make the
interface easier to use and more obvious for future consumers of the
parsing API.
|
|
e72cb769
|
2017-07-14T14:37:07
|
|
parse: implement `git_parse_peek`
Some code parts need to inspect the next few bytes without actually
consuming it yet, for example to examine what content it has to expect
next. Create a new function `git_parse_peek` which returns the next byte
without modifying the parsing context and use it at multiple call sites.
|
|
252f2eee
|
2017-07-14T13:45:05
|
|
parse: implement and use `git_parse_advance_digit`
The patch parsing code has multiple recurring patterns where we want to
parse an actual number. Create a new function `git_parse_advance_digit`
and use it to avoid code duplication.
|
|
65dcb645
|
2017-07-14T13:29:29
|
|
patch_parse: use git_parse_contains_s
Instead of manually checking the parsing context's remaining length and
comparing the leading bytes with a specific string, we can simply re-use
the function `git_parse_ctx_contains_s`. Do so to avoid code duplication
and to further decouple patch parsing from the parsing context's struct
members.
|
|
ef1395f3
|
2017-11-11T15:30:43
|
|
parse: extract parse module
The `git_patch_parse_ctx` encapsulates both parser state as well as
options specific to patch parsing. To advance this state and keep it
consistent, we provide a few functions which handle advancing the
current position and accessing bytes of the patch contents. In fact,
these functions are quite generic and not related to patch-parsing by
themselves. Seeing that we have similar logic inside of other modules,
it becomes quite enticing to extract this functionality into its own
parser module.
To do so, we create a new module `parse` with a central struct called
`git_parse_ctx`. It encapsulates both the content that is to be parsed
as well as its lengths and the current position. `git_patch_parse_ctx`
now only contains this `parse_ctx` only, which is then accessed whenever
we need to touch the current parser. This is the first step towards
re-using this functionality across other modules which require parsing
functionality and remove code-duplication.
|
|
cc4c44a9
|
2017-09-01T09:37:05
|
|
patch_parse: fix parsing patches only containing exact renames
Patches which contain exact renames only will not contain an actual diff
body, but only a list of files that were renamed. Thus, the patch header
is immediately followed by the terminating sequence "-- ". We currently
do not recognize this character sequence as a possible terminating
sequence. Add it and create a test to catch the failure.
|
|
57bc9dab
|
2017-07-14T10:57:49
|
|
patch_parse: implement state machine for parsing patch headers
Our code parsing Git patch headers is rather lax in parsing headers of a
Git-style patch. Most notably, we do not care for the exact order in
which header lines appear and as such, we may parse patch files which
are not really valid after all. Furthermore, the state transitions
inside of the parser are not as obvious as they could be, making it
harder than required to follow its logic.
To improve upon this situation, this patch introduces a real state
machine to parse the patches. Instead of simply parsing each line
without caring for previous state and the exact ordering, we define a
set of states with their allowed transitions. This makes the patch
parser more strict in only allowing valid successions of header lines.
As the transition table is defined inside of a single structure with
the expected line, required state as well as the state that we end up
in, all state transitions are immediately obvious from just having a
look at this structure. This improves both maintainability and eases
reasoning about the patch parser.
|
|
0c7f49dd
|
2017-06-30T13:39:01
|
|
Make sure to always include "common.h" first
Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.
This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.
This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.
|
|
723bdf48
|
2017-03-20T09:35:23
|
|
patch_parse: check if advancing over header newline succeeds
While parsing patch header lines, we iterate over each line and check if
the line has trailing garbage. What we do not check though is that the
line is actually a line ending with a trailing newline.
Fix this by checking the return code of `parse_advance_expected_str`.
|
|
ad5a909c
|
2017-03-14T09:39:37
|
|
patch_parse: fix parsing minimal trailing diff line
In a diff, the shortest possible hunk with a modification (that is, no
deletion) results from a file with only one line with a single character
which is removed. Thus the following hunk
@@ -1 +1 @@
-a
+
is the shortest valid hunk modifying a line. The function parsing the
hunk body though assumes that there must always be at least 4 bytes
present to make up a valid hunk, which is obviously wrong in this case.
The absolute minimum number of bytes required for a modification is
actually 2 bytes, that is the "+" and the following newline. Note: if
there is no trailing newline, the assumption will not be offended as the
diff will have a line "\ No trailing newline" at its end.
This patch fixes the issue by lowering the amount of bytes required.
|
|
613381fc
|
2016-11-15T13:33:05
|
|
patch_parse: fix memory leak
|
|
c77a55a9
|
2016-11-14T10:05:31
|
|
common: use PRIuZ for size_t in `giterr_set` calls
|
|
adedac5a
|
2016-09-02T02:03:45
|
|
diff: treat binary patches with no data special
When creating and printing diffs, deal with binary deltas that have
binary data specially, versus diffs that have a binary file but lack the
actual binary data.
|
|
b859faa6
|
2016-08-23T23:38:39
|
|
Teach `git_patch_from_diff` about parsed diffs
Ensure that `git_patch_from_diff` can return the patch for parsed diffs,
not just generate a patch for a generated diff.
|
|
002c8e29
|
2016-08-03T17:09:41
|
|
git_diff_file: move `id_abbrev`
Move `id_abbrev` to a more reasonable place where it packs more nicely
(before anybody starts using it).
|
|
c065f6a1
|
2016-07-14T23:04:47
|
|
apply: check allocation properly
|
|
1a79cd95
|
2016-04-26T01:18:01
|
|
patch: show copy information for identical copies
When showing copy information because we are duplicating contents,
for example, when performing a `diff --find-copies-harder -M100 -B100`,
then show copy from/to lines in a patch, and do not show context.
Ensure that we can also parse such patches.
|
|
38a347ea
|
2016-04-25T17:52:39
|
|
patch::parse: handle patches with no hunks
Patches may have no hunks when there's no modifications (for example,
in a rename). Handle them.
|
|
853e585f
|
2016-04-25T16:32:30
|
|
patch: zero id and abbrev length for empty files
|
|
33ae8762
|
2016-04-25T13:07:18
|
|
patch: identify non-binary patches as `NOT_BINARY`
|
|
7166bb16
|
2016-04-25T00:35:48
|
|
introduce `git_diff_from_buffer` to parse diffs
Parse diff files into a `git_diff` structure.
|
|
94e488a0
|
2016-04-24T16:14:25
|
|
patch: differentiate not found and invalid patches
|
|
17572f67
|
2016-04-21T00:04:14
|
|
git_patch_parse_ctx: refcount the context
|
|
aa4bfb32
|
2016-02-07T15:08:16
|
|
parse: introduce parse_ctx_contains_s
|
|
440e3bae
|
2015-11-21T12:27:03
|
|
patch: `git_patch_from_patchfile` -> `git_patch_from_buffer`
|
|
00e63b36
|
2015-11-21T12:37:01
|
|
patch: provide static string `advance_expected`
|
|
4117a235
|
2015-09-24T10:32:15
|
|
patch parse: dup the patch from the callers
|
|
6278fbc5
|
2015-09-24T09:40:42
|
|
patch parsing: squash some memory leaks
|
|
f941f035
|
2015-09-24T09:25:10
|
|
patch: drop some warnings
|
|
82175084
|
2015-09-23T13:40:12
|
|
Introduce git_patch_options, handle prefixes
Handle prefixes (in terms of number of path components) for patch
parsing.
|
|
19e46645
|
2015-09-23T11:07:04
|
|
patch printing: include rename information
|
|
d536ceac
|
2015-09-23T10:47:34
|
|
patch_parse: don't set new mode when deleted
|
|
28f70443
|
2015-09-23T10:38:51
|
|
patch_parse: use names from `diff --git` header
When a text file is added or deleted, use the file names from the
`diff --git` header instead of the `---` or `+++` lines. This is
for compatibility with git.
|
|
1462c95a
|
2015-09-23T09:54:25
|
|
patch_parse: set binary flag
We may have parsed binary data, set the `SHOW_BINARY` flag which
indicates that we have actually computed a binary diff.
|
|
bc6a31c9
|
2015-09-22T18:29:14
|
|
patch: when parsing, set nfiles correctly in delta
|
|
d68cb736
|
2015-09-22T18:25:03
|
|
diff: include oid length in deltas
Now that `git_diff_delta` data can be produced by reading patch
file data, which may have an abbreviated oid, allow consumers to
know that the id is abbreviated.
|
|
e7ec327d
|
2015-09-22T17:56:42
|
|
patch parse: unset path prefix
|
|
b85bd8ce
|
2015-09-16T11:37:03
|
|
patch: use delta's old_file/new_file members
No need to replicate the old_file/new_file members, or plumb them
strangely up.
|
|
804d5fe9
|
2015-09-11T08:37:12
|
|
patch: abstract patches into diff'ed and parsed
Patches can now come from a variety of sources - either internally
generated (from diffing two commits) or as the results of parsing
some external data.
|