kmx git

Commit	Date	Message
f9e28026	2018-06-18T20:37:18	patch_parse: populate line numbers while parsing diffs
ecf4f33a	2018-02-08T11:14:48	Convert usage of `git_buf_free` to new `git_buf_dispose`
06b8a40f	2018-02-16T11:29:46	Explicitly mark fallthrough cases with comments A lot of compilers nowadays generate warnings when there are cases in a switch statement which implicitly fall through to the next case. To avoid this warning, the last line in the case that is falling through can have a comment matching a regular expression, where one possible comment body would be `/* fall through */`. An alternative to the comment would be an explicit attribute like e.g. `[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only introduced support for such an attribute recently with GCC 7. Thus, and also because the fallthrough comment is supported by most compilers, we settle for using comments instead. One shortcoming of that method is that compilers are very strict about that. Most interestingly, that comment _really_ has to be the last line. In case a closing brace follows the comment, the heuristic will fail.
4110fc84	2017-12-23T23:30:29	Merge pull request #4285 from pks-t/pks/patches-with-whitespace patch_parse: fix parsing unquoted filenames with spaces
585b5dac	2017-11-18T15:43:11	refcount: make refcounting conform to aliasing rules Strict aliasing rules dictate that for most data types, you are not allowed to cast them to another data type and then access the casted pointers. While this works just fine for most compilers, technically we end up in undefined behaviour when we hurt that rule. Our current refcounting code makes heavy use of casting and thus violates that rule. While we didn't have any problems with that code, Travis started spitting out a lot of warnings due to a change in their toolchain. In the refcounting case, the code is also easy to fix: as all refcounting-statements are actually macros, we can just access the `rc` field directly instead of casting. There are two outliers in our code where that doesn't work. Both the `git_diff` and `git_patch` structures have specializations for generated and parsed diffs/patches, which directly inherit from them. Because of that, the refcounting code is only part of the base structure and not of the children themselves. We can help that by instead passing their base into `GIT_REFCOUNT_INC`, though.
80226b5f	2017-09-22T13:39:05	patch_parse: allow parsing ambiguous patch headers The git patch format allows for having unquoted paths with whitespaces inside. This format becomes ambiguous to parse, e.g. in the following example: diff --git a/file b/with spaces.txt b/file b/with spaces.txt While we cannot parse this in a correct way, we can instead use the "---" and "+++" lines to retrieve the file names, as the path is not followed by anything here but spans the complete remaining line. Because of this, we can simply bail outwhen parsing the "diff --git" header here without an actual error and then proceed to just take the paths from the other headers.
3892f70d	2017-09-22T13:26:47	patch_parse: treat complete line after "---"/"+++" as path When parsing the "---" and "+++" line, we stop after the first whitespace inside of the filename. But as files containing whitespaces do not need to be quoted, we should instead use the complete line here. This fixes parsing patches with unquoted paths with whitespaces.
7bdfc0a6	2017-07-14T15:33:32	parse: always initialize line pointer Upon initializing the parser context, we do not currently initialize the current line, line length and line number. Do so in order to make the interface easier to use and more obvious for future consumers of the parsing API.
e72cb769	2017-07-14T14:37:07	parse: implement `git_parse_peek` Some code parts need to inspect the next few bytes without actually consuming it yet, for example to examine what content it has to expect next. Create a new function `git_parse_peek` which returns the next byte without modifying the parsing context and use it at multiple call sites.
252f2eee	2017-07-14T13:45:05	parse: implement and use `git_parse_advance_digit` The patch parsing code has multiple recurring patterns where we want to parse an actual number. Create a new function `git_parse_advance_digit` and use it to avoid code duplication.
65dcb645	2017-07-14T13:29:29	patch_parse: use git_parse_contains_s Instead of manually checking the parsing context's remaining length and comparing the leading bytes with a specific string, we can simply re-use the function `git_parse_ctx_contains_s`. Do so to avoid code duplication and to further decouple patch parsing from the parsing context's struct members.
ef1395f3	2017-11-11T15:30:43	parse: extract parse module The `git_patch_parse_ctx` encapsulates both parser state as well as options specific to patch parsing. To advance this state and keep it consistent, we provide a few functions which handle advancing the current position and accessing bytes of the patch contents. In fact, these functions are quite generic and not related to patch-parsing by themselves. Seeing that we have similar logic inside of other modules, it becomes quite enticing to extract this functionality into its own parser module. To do so, we create a new module `parse` with a central struct called `git_parse_ctx`. It encapsulates both the content that is to be parsed as well as its lengths and the current position. `git_patch_parse_ctx` now only contains this `parse_ctx` only, which is then accessed whenever we need to touch the current parser. This is the first step towards re-using this functionality across other modules which require parsing functionality and remove code-duplication.
cc4c44a9	2017-09-01T09:37:05	patch_parse: fix parsing patches only containing exact renames Patches which contain exact renames only will not contain an actual diff body, but only a list of files that were renamed. Thus, the patch header is immediately followed by the terminating sequence "-- ". We currently do not recognize this character sequence as a possible terminating sequence. Add it and create a test to catch the failure.
57bc9dab	2017-07-14T10:57:49	patch_parse: implement state machine for parsing patch headers Our code parsing Git patch headers is rather lax in parsing headers of a Git-style patch. Most notably, we do not care for the exact order in which header lines appear and as such, we may parse patch files which are not really valid after all. Furthermore, the state transitions inside of the parser are not as obvious as they could be, making it harder than required to follow its logic. To improve upon this situation, this patch introduces a real state machine to parse the patches. Instead of simply parsing each line without caring for previous state and the exact ordering, we define a set of states with their allowed transitions. This makes the patch parser more strict in only allowing valid successions of header lines. As the transition table is defined inside of a single structure with the expected line, required state as well as the state that we end up in, all state transitions are immediately obvious from just having a look at this structure. This improves both maintainability and eases reasoning about the patch parser.
0c7f49dd	2017-06-30T13:39:01	Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
723bdf48	2017-03-20T09:35:23	patch_parse: check if advancing over header newline succeeds While parsing patch header lines, we iterate over each line and check if the line has trailing garbage. What we do not check though is that the line is actually a line ending with a trailing newline. Fix this by checking the return code of `parse_advance_expected_str`.
ad5a909c	2017-03-14T09:39:37	patch_parse: fix parsing minimal trailing diff line In a diff, the shortest possible hunk with a modification (that is, no deletion) results from a file with only one line with a single character which is removed. Thus the following hunk @@ -1 +1 @@ -a + is the shortest valid hunk modifying a line. The function parsing the hunk body though assumes that there must always be at least 4 bytes present to make up a valid hunk, which is obviously wrong in this case. The absolute minimum number of bytes required for a modification is actually 2 bytes, that is the "+" and the following newline. Note: if there is no trailing newline, the assumption will not be offended as the diff will have a line "\ No trailing newline" at its end. This patch fixes the issue by lowering the amount of bytes required.
613381fc	2016-11-15T13:33:05	patch_parse: fix memory leak
c77a55a9	2016-11-14T10:05:31	common: use PRIuZ for size_t in `giterr_set` calls
adedac5a	2016-09-02T02:03:45	diff: treat binary patches with no data special When creating and printing diffs, deal with binary deltas that have binary data specially, versus diffs that have a binary file but lack the actual binary data.
b859faa6	2016-08-23T23:38:39	Teach `git_patch_from_diff` about parsed diffs Ensure that `git_patch_from_diff` can return the patch for parsed diffs, not just generate a patch for a generated diff.
002c8e29	2016-08-03T17:09:41	git_diff_file: move `id_abbrev` Move `id_abbrev` to a more reasonable place where it packs more nicely (before anybody starts using it).
c065f6a1	2016-07-14T23:04:47	apply: check allocation properly
1a79cd95	2016-04-26T01:18:01	patch: show copy information for identical copies When showing copy information because we are duplicating contents, for example, when performing a `diff --find-copies-harder -M100 -B100`, then show copy from/to lines in a patch, and do not show context. Ensure that we can also parse such patches.
38a347ea	2016-04-25T17:52:39	patch::parse: handle patches with no hunks Patches may have no hunks when there's no modifications (for example, in a rename). Handle them.
853e585f	2016-04-25T16:32:30	patch: zero id and abbrev length for empty files
33ae8762	2016-04-25T13:07:18	patch: identify non-binary patches as `NOT_BINARY`
7166bb16	2016-04-25T00:35:48	introduce `git_diff_from_buffer` to parse diffs Parse diff files into a `git_diff` structure.
94e488a0	2016-04-24T16:14:25	patch: differentiate not found and invalid patches
17572f67	2016-04-21T00:04:14	git_patch_parse_ctx: refcount the context
aa4bfb32	2016-02-07T15:08:16	parse: introduce parse_ctx_contains_s
440e3bae	2015-11-21T12:27:03	patch: `git_patch_from_patchfile` -> `git_patch_from_buffer`
00e63b36	2015-11-21T12:37:01	patch: provide static string `advance_expected`
4117a235	2015-09-24T10:32:15	patch parse: dup the patch from the callers
6278fbc5	2015-09-24T09:40:42	patch parsing: squash some memory leaks
f941f035	2015-09-24T09:25:10	patch: drop some warnings
82175084	2015-09-23T13:40:12	Introduce git_patch_options, handle prefixes Handle prefixes (in terms of number of path components) for patch parsing.
19e46645	2015-09-23T11:07:04	patch printing: include rename information
d536ceac	2015-09-23T10:47:34	patch_parse: don't set new mode when deleted
28f70443	2015-09-23T10:38:51	patch_parse: use names from `diff --git` header When a text file is added or deleted, use the file names from the `diff --git` header instead of the `---` or `+++` lines. This is for compatibility with git.
1462c95a	2015-09-23T09:54:25	patch_parse: set binary flag We may have parsed binary data, set the `SHOW_BINARY` flag which indicates that we have actually computed a binary diff.
bc6a31c9	2015-09-22T18:29:14	patch: when parsing, set nfiles correctly in delta
d68cb736	2015-09-22T18:25:03	diff: include oid length in deltas Now that `git_diff_delta` data can be produced by reading patch file data, which may have an abbreviated oid, allow consumers to know that the id is abbreviated.
e7ec327d	2015-09-22T17:56:42	patch parse: unset path prefix
b85bd8ce	2015-09-16T11:37:03	patch: use delta's old_file/new_file members No need to replicate the old_file/new_file members, or plumb them strangely up.
804d5fe9	2015-09-11T08:37:12	patch: abstract patches into diff'ed and parsed Patches can now come from a variety of sources - either internally generated (from diffing two commits) or as the results of parsing some external data.

f9e28026

2018-06-18T20:37:18

patch_parse: populate line numbers while parsing diffs

ecf4f33a

2018-02-08T11:14:48

Convert usage of `git_buf_free` to new `git_buf_dispose`

06b8a40f

2018-02-16T11:29:46

Explicitly mark fallthrough cases with comments A lot of compilers nowadays generate warnings when there are cases in a switch statement which implicitly fall through to the next case. To avoid this warning, the last line in the case that is falling through can have a comment matching a regular expression, where one possible comment body would be `/* fall through */`. An alternative to the comment would be an explicit attribute like e.g. `[[clang::fallthrough]` or `__attribute__ ((fallthrough))`. But GCC only introduced support for such an attribute recently with GCC 7. Thus, and also because the fallthrough comment is supported by most compilers, we settle for using comments instead. One shortcoming of that method is that compilers are very strict about that. Most interestingly, that comment _really_ has to be the last line. In case a closing brace follows the comment, the heuristic will fail.

4110fc84

2017-12-23T23:30:29

Merge pull request #4285 from pks-t/pks/patches-with-whitespace patch_parse: fix parsing unquoted filenames with spaces

585b5dac

2017-11-18T15:43:11

refcount: make refcounting conform to aliasing rules Strict aliasing rules dictate that for most data types, you are not allowed to cast them to another data type and then access the casted pointers. While this works just fine for most compilers, technically we end up in undefined behaviour when we hurt that rule. Our current refcounting code makes heavy use of casting and thus violates that rule. While we didn't have any problems with that code, Travis started spitting out a lot of warnings due to a change in their toolchain. In the refcounting case, the code is also easy to fix: as all refcounting-statements are actually macros, we can just access the `rc` field directly instead of casting. There are two outliers in our code where that doesn't work. Both the `git_diff` and `git_patch` structures have specializations for generated and parsed diffs/patches, which directly inherit from them. Because of that, the refcounting code is only part of the base structure and not of the children themselves. We can help that by instead passing their base into `GIT_REFCOUNT_INC`, though.

80226b5f

2017-09-22T13:39:05

patch_parse: allow parsing ambiguous patch headers The git patch format allows for having unquoted paths with whitespaces inside. This format becomes ambiguous to parse, e.g. in the following example: diff --git a/file b/with spaces.txt b/file b/with spaces.txt While we cannot parse this in a correct way, we can instead use the "---" and "+++" lines to retrieve the file names, as the path is not followed by anything here but spans the complete remaining line. Because of this, we can simply bail outwhen parsing the "diff --git" header here without an actual error and then proceed to just take the paths from the other headers.

3892f70d

2017-09-22T13:26:47

patch_parse: treat complete line after "---"/"+++" as path When parsing the "---" and "+++" line, we stop after the first whitespace inside of the filename. But as files containing whitespaces do not need to be quoted, we should instead use the complete line here. This fixes parsing patches with unquoted paths with whitespaces.

7bdfc0a6

2017-07-14T15:33:32

parse: always initialize line pointer Upon initializing the parser context, we do not currently initialize the current line, line length and line number. Do so in order to make the interface easier to use and more obvious for future consumers of the parsing API.

e72cb769

2017-07-14T14:37:07

parse: implement `git_parse_peek` Some code parts need to inspect the next few bytes without actually consuming it yet, for example to examine what content it has to expect next. Create a new function `git_parse_peek` which returns the next byte without modifying the parsing context and use it at multiple call sites.

252f2eee

2017-07-14T13:45:05

parse: implement and use `git_parse_advance_digit` The patch parsing code has multiple recurring patterns where we want to parse an actual number. Create a new function `git_parse_advance_digit` and use it to avoid code duplication.

65dcb645

2017-07-14T13:29:29

patch_parse: use git_parse_contains_s Instead of manually checking the parsing context's remaining length and comparing the leading bytes with a specific string, we can simply re-use the function `git_parse_ctx_contains_s`. Do so to avoid code duplication and to further decouple patch parsing from the parsing context's struct members.

ef1395f3

2017-11-11T15:30:43

parse: extract parse module The `git_patch_parse_ctx` encapsulates both parser state as well as options specific to patch parsing. To advance this state and keep it consistent, we provide a few functions which handle advancing the current position and accessing bytes of the patch contents. In fact, these functions are quite generic and not related to patch-parsing by themselves. Seeing that we have similar logic inside of other modules, it becomes quite enticing to extract this functionality into its own parser module. To do so, we create a new module `parse` with a central struct called `git_parse_ctx`. It encapsulates both the content that is to be parsed as well as its lengths and the current position. `git_patch_parse_ctx` now only contains this `parse_ctx` only, which is then accessed whenever we need to touch the current parser. This is the first step towards re-using this functionality across other modules which require parsing functionality and remove code-duplication.

cc4c44a9

2017-09-01T09:37:05

patch_parse: fix parsing patches only containing exact renames Patches which contain exact renames only will not contain an actual diff body, but only a list of files that were renamed. Thus, the patch header is immediately followed by the terminating sequence "-- ". We currently do not recognize this character sequence as a possible terminating sequence. Add it and create a test to catch the failure.

57bc9dab

2017-07-14T10:57:49

patch_parse: implement state machine for parsing patch headers Our code parsing Git patch headers is rather lax in parsing headers of a Git-style patch. Most notably, we do not care for the exact order in which header lines appear and as such, we may parse patch files which are not really valid after all. Furthermore, the state transitions inside of the parser are not as obvious as they could be, making it harder than required to follow its logic. To improve upon this situation, this patch introduces a real state machine to parse the patches. Instead of simply parsing each line without caring for previous state and the exact ordering, we define a set of states with their allowed transitions. This makes the patch parser more strict in only allowing valid successions of header lines. As the transition table is defined inside of a single structure with the expected line, required state as well as the state that we end up in, all state transitions are immediately obvious from just having a look at this structure. This improves both maintainability and eases reasoning about the patch parser.

0c7f49dd

2017-06-30T13:39:01

Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.

723bdf48

2017-03-20T09:35:23

patch_parse: check if advancing over header newline succeeds While parsing patch header lines, we iterate over each line and check if the line has trailing garbage. What we do not check though is that the line is actually a line ending with a trailing newline. Fix this by checking the return code of `parse_advance_expected_str`.

ad5a909c

2017-03-14T09:39:37

patch_parse: fix parsing minimal trailing diff line In a diff, the shortest possible hunk with a modification (that is, no deletion) results from a file with only one line with a single character which is removed. Thus the following hunk @@ -1 +1 @@ -a + is the shortest valid hunk modifying a line. The function parsing the hunk body though assumes that there must always be at least 4 bytes present to make up a valid hunk, which is obviously wrong in this case. The absolute minimum number of bytes required for a modification is actually 2 bytes, that is the "+" and the following newline. Note: if there is no trailing newline, the assumption will not be offended as the diff will have a line "\ No trailing newline" at its end. This patch fixes the issue by lowering the amount of bytes required.

613381fc

2016-11-15T13:33:05

patch_parse: fix memory leak

c77a55a9

2016-11-14T10:05:31

common: use PRIuZ for size_t in `giterr_set` calls

adedac5a

2016-09-02T02:03:45

diff: treat binary patches with no data special When creating and printing diffs, deal with binary deltas that have binary data specially, versus diffs that have a binary file but lack the actual binary data.

b859faa6

2016-08-23T23:38:39

Teach `git_patch_from_diff` about parsed diffs Ensure that `git_patch_from_diff` can return the patch for parsed diffs, not just generate a patch for a generated diff.

002c8e29

2016-08-03T17:09:41

git_diff_file: move `id_abbrev` Move `id_abbrev` to a more reasonable place where it packs more nicely (before anybody starts using it).

c065f6a1

2016-07-14T23:04:47

apply: check allocation properly

1a79cd95

2016-04-26T01:18:01

patch: show copy information for identical copies When showing copy information because we are duplicating contents, for example, when performing a `diff --find-copies-harder -M100 -B100`, then show copy from/to lines in a patch, and do not show context. Ensure that we can also parse such patches.

38a347ea

2016-04-25T17:52:39

patch::parse: handle patches with no hunks Patches may have no hunks when there's no modifications (for example, in a rename). Handle them.

853e585f

2016-04-25T16:32:30

patch: zero id and abbrev length for empty files

33ae8762

2016-04-25T13:07:18

patch: identify non-binary patches as `NOT_BINARY`

7166bb16

2016-04-25T00:35:48

introduce `git_diff_from_buffer` to parse diffs Parse diff files into a `git_diff` structure.

94e488a0

2016-04-24T16:14:25

patch: differentiate not found and invalid patches

17572f67

2016-04-21T00:04:14

git_patch_parse_ctx: refcount the context

aa4bfb32

2016-02-07T15:08:16

parse: introduce parse_ctx_contains_s

440e3bae

2015-11-21T12:27:03

patch: `git_patch_from_patchfile` -> `git_patch_from_buffer`

00e63b36

2015-11-21T12:37:01

patch: provide static string `advance_expected`

4117a235

2015-09-24T10:32:15

patch parse: dup the patch from the callers

6278fbc5

2015-09-24T09:40:42

patch parsing: squash some memory leaks

f941f035

2015-09-24T09:25:10

patch: drop some warnings

82175084

2015-09-23T13:40:12

Introduce git_patch_options, handle prefixes Handle prefixes (in terms of number of path components) for patch parsing.

19e46645

2015-09-23T11:07:04

patch printing: include rename information

d536ceac

2015-09-23T10:47:34

patch_parse: don't set new mode when deleted

28f70443

2015-09-23T10:38:51

patch_parse: use names from `diff --git` header When a text file is added or deleted, use the file names from the `diff --git` header instead of the `---` or `+++` lines. This is for compatibility with git.

1462c95a

2015-09-23T09:54:25

patch_parse: set binary flag We may have parsed binary data, set the `SHOW_BINARY` flag which indicates that we have actually computed a binary diff.

bc6a31c9

2015-09-22T18:29:14

patch: when parsing, set nfiles correctly in delta

d68cb736

2015-09-22T18:25:03

diff: include oid length in deltas Now that `git_diff_delta` data can be produced by reading patch file data, which may have an abbreviated oid, allow consumers to know that the id is abbreviated.

e7ec327d

2015-09-22T17:56:42

patch parse: unset path prefix

b85bd8ce

2015-09-16T11:37:03

patch: use delta's old_file/new_file members No need to replicate the old_file/new_file members, or plumb them strangely up.

804d5fe9

2015-09-11T08:37:12

patch: abstract patches into diff'ed and parsed Patches can now come from a variety of sources - either internally generated (from diffing two commits) or as the results of parsing some external data.

thodg/libgit2/src/patch_parse.c

src/patch_parse.c

Log