src/diff_generate.c


Log

Author Commit Date CI Message
Edward Thomson f0e693b1 2021-09-07T17:53:49 str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
Edward Thomson 91156a0f 2021-04-04T10:44:29 diff: use git_repository_workdir_path The new git_repository_workdir_path function does error checking on working directory inputs on Windows; use it to construct paths within working directories.
lhchavez 6cd0c853 2020-12-11T05:08:45 Small refactor to make thing tidier Also repurposed an unused function and deleted another one.
lhchavez 41da4e16 2020-12-10T19:52:01 Cache the parsed submodule config when diffing This change makes that anything that calls `git_diff__from_iterators` (any of the `git_diff_xxx` functions) only need to parse the `.gitmodules` file once. This can be avoided by calling `git_repository_submodule_cache_all(...)`, but we can do that safely for the user with no change in semantics. Fixes: #5725
Edward Thomson 79b0c8c8 2020-11-21T23:29:29 iterator: use GIT_ASSERT
Edward Thomson 5d6c2f26 2020-04-05T14:59:54 diff: use GIT_ASSERT
Patrick Steinhardt a6c9e0b3 2020-06-08T12:40:47 tree-wide: mark local functions as static We've accumulated quite some functions which are never used outside of their respective code unit, but which are lacking the `static` keyword. Add it to reduce their linkage scope and allow the compiler to optimize better.
Edward Thomson 0f35efeb 2020-05-23T10:15:51 git_pool_init: handle failure cases Propagate failures caused by pool initialization errors.
Edward Thomson b59c71d8 2020-01-18T14:11:01 iterator: update enum type name for consistency libgit2 does not use `type_t` suffixes as it's redundant; thus, rename `git_iterator_type_t` to `git_iterator_t` for consistency.
Edward Thomson 4334b177 2019-06-23T15:43:38 blob: use `git_object_size_t` for object size Instead of using a signed type (`off_t`) use a new `git_object_size_t` for the sizes of objects.
romkatv 1886478d 2019-11-05T07:45:11 fix a bug introduced in 8a23597b
Patrick Steinhardt fe241071 2019-08-27T10:36:19 diff_generate: detect memory allocation errors when preparing opts When preparing options for the two iterators that are about to be diffed, we allocate a common prefix for both iterators depending on the options passed by the user. We do not check whether the allocation was successful, though. In fact, this isn't much of a problem, as using a `NULL` prefix is perfectly fine. But in the end, we probably want to detect that the system doesn't have any memory left, as we're unlikely to be able to continue afterwards anyway. While the issue is being fixed in the newly created function `diff_prepare_iterator_opts`, it has been previously existing in the previous macro `DIFF_FROM_ITERATORS` already.
Patrick Steinhardt 8a23597b 2019-08-27T10:36:18 diff_generate: refactor `DIFF_FROM_ITERATORS` macro of doom While the `DIFF_FROM_ITERATORS` does make it shorter to implement the various `git_diff_foo_to_bar` functions, it is a complex and unreadable beast that implicitly assumes certain local variable names. This is not something desirable to have at all and obstructs understanding and more importantly debugging the code by quite a bit. The `DIFF_FROM_ITERATORS` macro basically removed the burden of having to derive the options for both iterators from a pair of iterator flags and the diff options. This patch introduces a new function that does the that exact and refactors all callers to manage the iterators by themselves. As we potentially need to allocate a shared prefix for the iterator, we need to tell the caller to allocate that prefix as soon as the options aren't required anymore. Thus, the function has a `char **prefix` out pointer that will get set to the allocated string and subsequently be free'd by the caller. While this patch increases the line count, I personally deem this to an acceptable tradeoff for increased readbiblity.
Patrick Steinhardt e54343a4 2019-06-29T09:17:32 fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.
Patrick Steinhardt 658022c4 2019-07-18T13:53:41 configuration: cvar -> configmap `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.
Edward Thomson 5d92e547 2019-06-08T17:28:35 oid: `is_zero` instead of `iszero` The only function that is named `issomething` (without underscore) was `git_oid_iszero`. Rename it to `git_oid_is_zero` for consistency with the rest of the library.
Edward Thomson 89bd4ddb 2019-01-21T11:32:53 diff_generate: validate oid file size Index entries are 32 bit unsigned ints, not `size_t`s.
Edward Thomson c6cac733 2019-01-20T22:40:38 blob: validate that blob sizes fit in a size_t Our blob size is a `git_off_t`, which is a signed 64 bit int. This may be erroneously negative or larger than `SIZE_MAX`. Ensure that the blob size fits into a `size_t` before casting.
Edward Thomson f673e232 2018-12-27T13:47:34 git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
Edward Thomson 168fe39b 2018-11-28T14:26:57 object_type: use new enumeration names Use the new object_type enumeration names within the codebase.
Edward Thomson 18e71e6d 2018-11-28T13:31:06 index: use new enum and structure names Use the new-style index names throughout our own codebase.
Edward Thomson 3be73011 2018-06-11T18:26:22 Merge pull request #4436 from pks-t/pks/packfile-stream-free pack: rename `git_packfile_stream_free`
Patrick Steinhardt ecf4f33a 2018-02-08T11:14:48 Convert usage of `git_buf_free` to new `git_buf_dispose`
David Turner 5a7d454b 2018-06-04T12:56:08 Fix stash save bug with fast path index check If the index contains stat data for a modified file, and the file is not racily dirty, and there exists an untracked working tree directory alphabetically after that file, and there are no other changes to the repo, then git_stash_save would fail. It would confuse the untracked working tree directory for the modified file, because they have the same sha: zero. The wt directory has a sha of zero because it's a directory, and the file would have a zero sha because we wouldn't read the file -- we would just know that it doesn't match the index. To fix this confusion, we simply check mode as well as SHA.
Patrick Steinhardt d8896bda 2018-01-03T16:07:36 diff_generate: avoid excessive stats of .gitattribute files When generating a diff between two trees, for each file that is to be diffed we have to determine whether it shall be treated as text or as binary files. While git has heuristics to determine which kind of diff to generate, users can also that default behaviour by setting or unsetting the 'diff' attribute for specific files. Because of that, we have to query gitattributes in order to determine how to diff the current files. Instead of hitting the '.gitattributes' file every time we need to query an attribute, which can get expensive especially on networked file systems, we try to cache them instead. This works perfectly fine for every '.gitattributes' file that is found, but we hit cache invalidation problems when we determine that an attribuse file is _not_ existing. We do create an entry in the cache for missing '.gitattributes' files, but as soon as we hit that file again we invalidate it and stat it again to see if it has now appeared. In the case of diffing large trees with each other, this behaviour is very suboptimal. For each pair of files that is to be diffed, we will repeatedly query every directory component leading towards their respective location for an attributes file. This leads to thousands or even hundreds of thousands of wasted syscalls. The attributes cache already has a mechanism to help in that scenario in form of the `git_attr_session`. As long as the same attributes session is still active, we will not try to re-query the gitmodules files at all but simply retain our currently cached results. To fix our problem, we can create a session at the top-most level, which is the initialization of the `git_diff` structure, and use it in order to look up the correct diff driver. As the `git_diff` structure is used to generate patches for multiple files at once, this neatly solves our problem by retaining the session until patches for all files have been generated. The fix has been tested with linux.git by calling `git_diff_tree_to_tree` and `git_diff_to_buf` with v4.10^{tree} and v4.14^{tree}. | time | .gitattributes stats without fix | 33.201s | 844614 with fix | 30.327s | 4441 While execution only improved by roughly 10%, the stat(3) syscalls for .gitattributes files decreased by 99.5%. The benchmarks were quite simple with best-of-three timings on Linux ext4 systems. One can assume that for network based file systems the performance gain will be a lot larger due to a much higher latency.
Patrick Steinhardt 5ca3f115 2017-11-30T15:12:48 diff_generate: fix unsetting diff flags The macro `DIFF_FLAG_SET` can be used to set or unset a flag by modifying the diff's bitmask. While the case of setting the flag is handled correctly, the case of unsetting the flag was not. Instead of inverting the flags, we are inverting the value which is used to decide whether we want to set or unset the bits. The value being used here is a simple `bool` which is `false`. As that is being uplifted to `int` when getting the bitwise-complement, we will end up retaining all bits inside of the bitmask. As that's only ever used to set `GIT_DIFF_IGNORE_CASE`, we were actually always ignoring case for generated diffs. Fix that by instead getting the bitwise-complement of `FLAG`, not `VAL`.
Patrick Steinhardt 585b5dac 2017-11-18T15:43:11 refcount: make refcounting conform to aliasing rules Strict aliasing rules dictate that for most data types, you are not allowed to cast them to another data type and then access the casted pointers. While this works just fine for most compilers, technically we end up in undefined behaviour when we hurt that rule. Our current refcounting code makes heavy use of casting and thus violates that rule. While we didn't have any problems with that code, Travis started spitting out a lot of warnings due to a change in their toolchain. In the refcounting case, the code is also easy to fix: as all refcounting-statements are actually macros, we can just access the `rc` field directly instead of casting. There are two outliers in our code where that doesn't work. Both the `git_diff` and `git_patch` structures have specializations for generated and parsed diffs/patches, which directly inherit from them. Because of that, the refcounting code is only part of the base structure and not of the children themselves. We can help that by instead passing their base into `GIT_REFCOUNT_INC`, though.
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Edward Thomson b859faa6 2016-08-23T23:38:39 Teach `git_patch_from_diff` about parsed diffs Ensure that `git_patch_from_diff` can return the patch for parsed diffs, not just generate a patch for a generated diff.
Edward Thomson 7166bb16 2016-04-25T00:35:48 introduce `git_diff_from_buffer` to parse diffs Parse diff files into a `git_diff` structure.
Edward Thomson 9be638ec 2016-04-19T15:12:18 git_diff_generated: abstract generated diffs