src/merge.c


Log

Author Commit Date CI Message
Patrick Steinhardt 4dfcc50f 2020-04-01T15:16:18 merge: cache negative cache results for similarity metrics When computing renames, we cache the hash signatures for each of the potentially conflicting entries so that we do not need to repeatedly read the file and can at least halfway efficiently determine whether two files are similar enough to be deemed a rename. In order to make the hash signatures meaningful, we require at least four lines of data to be present, resulting in at least four different hashes that can be compared. Files that are deemed too small are not cached at all and will thus be repeatedly re-hashed, which is usually not a huge issue. The issue with above heuristic is in case a file does _not_ have at least four lines, where a line is anything separated by a consecutive run of "\n" or "\0" characters. For example "a\nb" is two lines, but "a\0\0b" is also just two lines. Taken to the extreme, a file that has megabytes of consecutive space- or NUL-only may also be deemed as too small and thus not get cached. As a result, we will repeatedly load its blob, calculate its hash signature just to finally throw it away as we notice it's not of any value. When you've got a comparitively big file that you compare against a big set of potentially renamed files, then the cost simply expodes. The issue can be trivially fixed by introducing negative cache entries. Whenever we determine that a given blob does not have a meaningful representation via a hash signature, we store this negative cache marker and will from then on not hash it again, but also ignore it as a potential rename target. This should help the "normal" case already where you have a lot of small files as rename candidates, but in the above scenario it's savings are extraordinarily high. To verify we do not hit the issue anymore with described solution, this commit adds a test that uses the exact same setup described above with one 50 megabyte blob of '\0' characters and 1000 other files that get renamed. Without the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 11m48.377s user 11m11.576s sys 0m35.187s And with the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 0m1.972s user 0m1.851s sys 0m0.118s So this represents a ~350-fold performance improvement, but it obviously depends on how many files you have and how big the blob is. The test number were chosen in a way that one will immediately notice as soon as the bug resurfaces.
Edward Thomson 4334b177 2019-06-23T15:43:38 blob: use `git_object_size_t` for object size Instead of using a signed type (`off_t`) use a new `git_object_size_t` for the sizes of objects.
Sebastian Henke 3335a034 2019-10-10T15:28:46 refs: fix locks getting forcibly removed The flag GIT_FILEBUF_FORCE currently does two things: 1. It will cause the filebuf to create non-existing leading directories for the file that is about to be written. 2. It will forcibly remove any pre-existing locks. While most call sites actually do want (1), they do not want to remove pre-existing locks, as that renders the locking mechanisms effectively useless. Introduce a new flag `GIT_FILEBUF_CREATE_LEADING_DIRS` to separate both behaviours cleanly from each other and convert callers to use it instead of `GIT_FILEBUF_FORCE` to have them honor locked files correctly. As this conversion removes all current users of `GIT_FILEBUF_FORCE`, this commit removes the flag altogether.
Patrick Steinhardt d4fe402b 2019-08-08T10:36:33 merge: check return value of `git_commit_list_insert` The function `git_commit_list_insert` dynamically allocates memory and may thus fail to insert a given commit, but we didn't check for that in several places in "merge.c". Convert surrounding functions to return error codes and check whether `git_commit_list_insert` was successful, returning an error if not.
Edward Thomson 9a6992c4 2019-05-20T06:46:10 merge: safely cast size of merged file for index Explicitly truncate the file size to a `uint32_t`.
Edward Thomson 0b5ba0d7 2019-06-06T16:36:23 Rename opt init functions to `options_init` In libgit2 nomenclature, when we need to verb a direct object, we name a function `git_directobject_verb`. Thus, if we need to init an options structure named `git_foo_options`, then the name of the function that does that should be `git_foo_options_init`. The previous names of `git_foo_init_options` is close - it _sounds_ as if it's initializing the options of a `foo`, but in fact `git_foo_options` is its own noun that should be respected. Deprecate the old names; they'll now call directly to the new ones.
Robert Coup 6d2ab2cf 2019-03-19T23:43:10 merge: analysis support for bare repositories
Patrick Steinhardt 2e0a3048 2019-01-23T10:48:55 oidmap: introduce high-level setter for key/value pairs Currently, one would use either `git_oidmap_insert` to insert key/value pairs into a map or `git_oidmap_put` to insert a key only. These function have historically been macros, which is why their syntax is kind of weird: instead of returning an error code directly, they instead have to be passed a pointer to where the return value shall be stored. This does not match libgit2's common idiom of directly returning error codes.Furthermore, `git_oidmap_put` is tightly coupled with implementation details of the map as it exposes the index of inserted entries. Introduce a new function `git_oidmap_set`, which takes as parameters the map, key and value and directly returns an error code. Convert all trivial callers of `git_oidmap_insert` and `git_oidmap_put` to make use of it.
Patrick Steinhardt 9694ef20 2018-12-17T09:01:53 oidmap: introduce high-level getter for values The current way of looking up an entry from a map is tightly coupled with the map implementation, as one first has to look up the index of the key and then retrieve the associated value by using the index. As a caller, you usually do not care about any indices at all, though, so this is more complicated than really necessary. Furthermore, it invites for errors to happen if the correct error checking sequence is not being followed. Introduce a new high-level function `git_oidmap_get` that takes a map and a key and returns a pointer to the associated value if such a key exists. Otherwise, a `NULL` pointer is returned. Adjust all callers that can trivially be converted.
Patrick Steinhardt 351eeff3 2019-01-23T10:42:46 maps: use uniform lifecycle management functions Currently, the lifecycle functions for maps (allocation, deallocation, resize) are not named in a uniform way and do not have a uniform function signature. Rename the functions to fix that, and stick to libgit2's naming scheme of saying `git_foo_new`. This results in the following new interface for allocation: - `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an error code if we ran out of memory - `void git_<t>map_free(git_<t>map *map)` to free a map - `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map This commit also fixes all existing callers.
Edward Thomson f673e232 2018-12-27T13:47:34 git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
Edward Thomson 168fe39b 2018-11-28T14:26:57 object_type: use new enumeration names Use the new object_type enumeration names within the codebase.
Patrick Steinhardt 0ddc6094 2018-11-30T09:46:14 Merge pull request #4770 from tiennou/feature/merge-analysis-any-branch Allow merge analysis against any reference
Patrick Steinhardt 852bc9f4 2018-11-23T19:26:24 khash: remove intricate knowledge of khash types Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types, simply use `size_t` instead. This decouples code from the khash stuff and makes it possible to move the khash includes into the implementation files.
Edward Thomson a2f9f94b 2018-10-20T20:18:04 Merge branch 'issue-4203'
Edward Thomson 32b81661 2018-10-20T20:16:32 merge: don't leak the index during reloads
Etienne Samson cb71a9ce 2018-08-26T18:34:46 merge: assert that we're passed sane parameters
Etienne Samson 6e9fb040 2018-08-25T01:47:39 merge: make analysis possible against a non-HEAD reference This moves the current merge analysis code into a more generic version that can work against any reference. Also change the tests to check returned analysis values exactly.
Patrick Steinhardt ecf4f33a 2018-02-08T11:14:48 Convert usage of `git_buf_free` to new `git_buf_dispose`
Tyrie Vella 1403c612 2018-01-22T14:44:31 merge: virtual commit should be last argument to merge-base Our virtual commit must be the last argument to merge-base: since our algorithm pushes _both_ parents of the virtual commit, it needs to be the last argument, since merge-base: > Given three commits A, B and C, git merge-base A B C will compute the > merge base between A and a hypothetical commit M We want to calculate the merge base between the actual commit ("two") and the virtual commit ("one") - since one actually pushes its parents to the merge-base calculation, we need to calculate the merge base of "two" and the parents of one.
Edward Thomson b924df1e 2018-01-21T18:05:45 merge: reverse merge bases for recursive merge When the commits being merged have multiple merge bases, reverse the order when creating the virtual merge base. This is for compatibility with git's merge-recursive algorithm, and ensures that we build identical trees. Git does this to try to use older merge bases first. Per 8918b0c: > It seems to be the only sane way to do it: when a two-head merge is > done, and the merge-base and one of the two branches agree, the > merge assumes that the other branch has something new. > > If we start creating virtual commits from newer merge-bases, and go > back to older merge-bases, and then merge with newer commits again, > chances are that a patch is lost, _because_ the merge-base and the > head agree on it. Unlikely, yes, but it happened to me.
Edward Thomson 185b0d08 2018-01-20T19:41:28 merge: recursive uses larger conflict markers Git uses longer conflict markers in the recursive merge base - two more than the default (thus, 9 character long conflict markers). This allows users to tell the difference between the recursive merge conflicts and conflicts between the ours and theirs branches. This was introduced in git d694a17986a28bbc19e2a6c32404ca24572e400f. Update our tests to expect this as well.
Etiene Dalcol e8d373c4 2017-11-11T17:39:25 merge: add error handling for index reload Cleans up should git_repository_index or git_index_read fail
Greg Collinge bb9e3797 2017-11-11T17:20:16 merge: reload index before git_merge If the index in memory is different from the index on the disk, previously merge would abort with GIT_ECONFLICT. Reload the index before merging to fix this. Fixes #4203
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Patrick Steinhardt 4dc87e72 2017-06-21T13:35:46 merge: fix potential free of uninitialized memory The function `merge_diff_mark_similarity_exact` may error our early and, when it does so, free the `ours_deletes_by_oid` and `theirs_deletes_by_oid` variables. While the first one can never be uninitialized due to the first call actually assigning to it, the second variable can be freed without being initialized. Fix the issue by initializing both variables to `NULL`.
Michael Tesch cee1e7af 2017-04-12T14:38:30 merge: perform exact rename detection in linear time The current exact rename detection has order n^2 complexity. We can do better by using a map to first aggregate deletes and using that to match deletes to adds. This results in a substantial performance improvement for merges with a large quantity of adds and deletes.
Edward Thomson 69873685 2017-03-23T09:49:09 Merge branch 'pr/3957'
Edward Thomson b53d834f 2017-03-23T09:46:22 merge: indentation fixup
Patrick Steinhardt 84f56cb0 2016-11-04T11:59:52 repository: rename `path_repository` and `path_gitlink` The `path_repository` variable is actually confusing to think about, as it is not always clear what the repository actually is. It may either be the path to the folder containing worktree and .git directory, the path to .git itself, a worktree or something entirely different. Actually, the intent of the variable is to hold the path to the gitdir, which is either the .git directory or the bare repository. Rename the variable to `gitdir` to avoid confusion. While at it, also rename `path_gitlink` to `gitlink` to improve consistency.
Edward Thomson 95367366 2017-02-09T16:57:22 merge: don't do rename detection on submodules
Carlos Martín Nieto 2854e619 2017-01-14T17:12:23 Merge pull request #4061 from libgit2/ethomson/merge_opts merge: set default rename threshold
Edward Thomson 19ed4d0c 2017-01-01T22:19:23 merge: set default rename threshold When `GIT_MERGE_FIND_RENAMES` is set, provide a default for `rename_threshold` when it is unset.
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Patrick Steinhardt c77a55a9 2016-11-14T10:05:31 common: use PRIuZ for size_t in `giterr_set` calls
Arthur Schreiber 6d354747 2016-10-18T08:20:41 Perf: Don't perform merge operations for trivial merges. When one side of a merge is treesame to the ancestor, we can take the other side and skip all the expensive merge operations. This optimization can only be performed when the generation of REUC extension data is skipped.
Edward Thomson 9be638ec 2016-04-19T15:12:18 git_diff_generated: abstract generated diffs
Jason Haslam c864b4ab 2016-05-12T13:18:07 Ignore submodules when checking for merge conflicts in the workdir.
Edward Thomson d953c450 2016-02-28T21:30:00 merge drivers: handle configured but not found driver
Edward Thomson 6d8b2cdb 2016-02-28T09:34:11 merge driver: remove `check` callback Since the `apply` callback can defer, the `check` callback is not necessary. Removing the `check` callback further makes the `payload` unnecessary along with the `cleanup` callback.
Edward Thomson 967e073d 2016-02-27T16:42:02 merge driver: correct global initialization
Edward Thomson 7a3ab14f 2016-02-07T15:58:34 merge driver: get a pointer to favor
Edward Thomson 46625836 2016-02-07T15:19:43 merge driver: correct indentation
Edward Thomson 30a94ab7 2015-12-24T22:52:23 merge driver: allow custom default driver Allow merge users to configure a custom default merge driver via `git_merge_options`. Similarly, honor the `merge.default` configuration option.
Edward Thomson 3f04219f 2015-12-23T10:23:08 merge driver: introduce custom merge drivers Consumers can now register custom merged drivers with `git_merge_driver_register`. This allows consumers to support the merge drivers, as configured in `.gitattributes`. Consumers will be asked to perform the file-level merge when a custom driver is configured.
Stan Hu 7a74590d 2015-12-03T09:57:56 Fix rebase bug and include test for merge=union
Stan Hu f8787098 2015-10-31T18:50:13 Support union merges via .gitattributes file
Arthur Schreiber 3679ebae 2016-02-11T23:37:52 Horrible fix for #3173.
Patrick Steinhardt fac42ff9 2016-02-08T16:58:08 merge: fix memory leak
Vicent Marti 879ebab3 2015-12-16T12:30:52 merge: Use `git_index__fill` to populate the index Instead of calling `git_index_add` in a loop, use the new `git_index_fill` internal API to fill the index with the initial staged entries. The new `fill` helper assumes that all the entries will be unique and valid, so it can append them at the end of the entries vector and only sort it once at the end. It performs no validation checks. This prevents the quadratic behavior caused by having to sort the entries list once after every insertion.
Edward Thomson 5b9c63c3 2015-11-20T19:01:42 recursive merge: add a recursion limit
Edward Thomson 78859c63 2015-11-20T17:33:49 merge: handle conflicts in recursive base building When building a recursive merge base, allow conflicts to occur. Use the file (with conflict markers) as the common ancestor. The user has already seen and dealt with this conflict by virtue of having a criss-cross merge. If they resolved this conflict identically in both branches, then there will be no conflict in the result. This is the best case scenario. If they did not resolve the conflict identically in the two branches, then we will generate a new conflict. If the user is simply using standard conflict output then the results will be fairly sensible. But if the user is using a mergetool or using diff3 output, then the common ancestor will be a conflict file (itself with diff3 output, haha!). This is quite terrible, but it matches git's behavior.
Edward Thomson 76ade3a0 2015-11-10T21:21:26 merge: use annotated commits for recursion Use annotated commits to act as our virtual bases, instead of regular commits, to avoid polluting the odb with virtual base commits and trees. Instead, build an annotated commit with an index and pointers to the commits that it was merged from.
Edward Thomson 7730fe8e 2015-11-09T13:01:48 merge: merge annotated commits instead of regular commits
Edward Thomson 3f2bb387 2015-10-28T11:00:55 merge: octopus merge common ancestors when >2 When there are more than two common ancestors, continue merging the virtual base with the additional common ancestors, effectively octopus merging a new virtual base.
Edward Thomson 1b82f7b6 2015-10-27T14:24:51 merge: compute octopus merge bases
Edward Thomson 75dee59c 2015-10-26T10:37:58 merge: build virtual base of multiple merge bases When the commits to merge have multiple common ancestors, build a "virtual" base tree by merging the common ancestors.
Edward Thomson fa78782f 2015-10-22T17:00:09 merge: rename `git_merge_tree_flags_t` -> `git_merge_flags_t`
Vicent Marti 1d0bed9d 2015-10-30T14:02:01 merge-base: Style
Vicent Marti 4cacf5b5 2015-10-30T11:50:43 merge-base: Do not read parents from the root
Vicent Marti 136a71f4 2015-10-30T11:45:52 merge-base: Remove redundant merge bases
Vicent Marti d845abe6 2015-10-28T14:49:28 merge: Do not mallocz unecessary entries
Vicent Marti d3416dfe 2015-10-28T10:50:25 pool: Dot not assume mallocs are zeroed out
Vicent Marti 1e5e02b4 2015-10-27T17:26:04 pool: Simplify implementation
Vicent Marti 7a02e93e 2015-10-27T22:42:40 merge: Plug memory leak
Vicent Marti a1f5d691 2015-10-27T22:42:15 merge: Implement `GIT_MERGE_TREE_SKIP_REUC`
Edward Thomson 8683d31f 2015-10-22T14:39:20 merge: add GIT_MERGE_TREE_FAIL_ON_CONFLICT Provide a new merge option, GIT_MERGE_TREE_FAIL_ON_CONFLICT, which will stop on the first conflict and fail the merge operation with GIT_EMERGECONFLICT.
Edward Thomson 6c014bcc 2015-09-29T12:18:17 diff: don't feed large files to xdiff
Edward Thomson e4352066 2015-09-28T18:25:24 merge_file: treat large files as binary xdiff craps the bed on large files. Treat very large files as binary, so that it doesn't even have to try. Refactor our merge binary handling to better match git.git, which looks for a NUL in the first 8000 bytes.
Edward Thomson 56ed415a 2015-08-30T19:10:00 diff: drop `FILELIST_MATCH` Now that non-pathspec matching diffs are implemented at the iterator level, drop `FILELIST_MATCH`ing.
Edward Thomson 4a0dbeb0 2015-08-30T17:06:26 diff: use new iterator pathlist handling When using literal pathspecs in diff with `GIT_DIFF_DISABLE_PATHSPEC_MATCH` turn on the faster iterator pathlist handling. Updates iterator pathspecs to include directory prefixes (eg, `foo/`) for compatibility with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`.
Edward Thomson ef206124 2015-07-28T19:55:37 Move filelist into the iterator handling itself.
Edward Thomson ed1c6446 2015-07-28T11:41:27 iterator: use an options struct instead of args
Matthew Plough 768f8be3 2015-06-30T19:00:41 Fix #3094 - improve use of portable size_t/ssize_t format specifiers. The header src/cc-compat.h defines portable format specifiers PRIuZ, PRIdZ, and PRIxZ. The original report highlighted the need to use these specifiers in examples/network/fetch.c. For this commit, I checked all C source and header files not in deps/ and transitioned to the appropriate format specifier where appropriate.
Edward Thomson 8960dc1e 2015-06-24T18:10:30 iterator: provide git_iterator_walk Provide `git_iterator_walk` to walk each iterator in lockstep, returning each iterator's idea of the contents of the next path.
Carlos Martín Nieto ca2857d8 2015-06-10T10:30:08 merge: actually increment the counts, not the pointers `merge_diff_list_count_candidates()` takes pointers to the source and target counts, but when it comes time to increase them, we're increasing the pointer, rather than the value it's pointing to. Dereference the value to increase.
Edward Thomson 885b94aa 2015-05-28T15:26:13 Rename GIT_EMERGECONFLICT to GIT_ECONFLICT We do not error on "merge conflicts"; on the contrary, merge conflicts are a normal part of merging. We only error on "checkout conflicts", where a change exists in the index or the working directory that would otherwise be overwritten by performing the checkout. This *may* happen during merge (after the production of the new index that we're going to checkout) but it could happen during any checkout.
Edward Thomson 9f545b9d 2015-05-19T11:23:59 introduce `git_index_entry_is_conflict` It's not always obvious the mapping between stage level and conflict-ness. More importantly, this can lead otherwise sane people to write constructs like `if (!git_index_entry_stage(entry))`, which (while technically correct) is unreadable. Provide a nice method to help avoid such messy thinking.
Edward Thomson 9ebb5a3f 2015-02-18T22:53:40 merge: merge iterators
Edward Thomson 89ba9f1a 2015-03-18T13:17:04 Merge pull request #2967 from jacquesg/merge-whitespace Allow merges of files (and trees) with whitespace problems/fixes
Jeff Hostetler fea24c53 2015-03-16T15:54:53 PERF: In MERGE, lazily compute is_binary
Jacques Germishuys 13de9363 2015-03-12T12:36:09 Collapse whitespace flags into git_merge_file_flags_t
Jacques Germishuys f29dde68 2015-03-12T12:29:47 Renamed git_merge_options 'flags' to 'tree_flags'
Jacques Germishuys 45a86bbf 2015-03-09T17:02:52 Allow for merges with whitespace discrepancies
Carlos Martín Nieto a291790a 2015-02-15T05:18:01 Merge pull request #2831 from ethomson/merge_lock merge: lock index during the merge (not just checkout)
Edward Thomson 41fae48d 2015-02-03T22:31:10 indexwriter: an indexwriter for repo operations Provide git_indexwriter_init_for_operation for the common locking pattern in merge, rebase, revert and cherry-pick.
Edward Thomson 8b0ddd5d 2015-01-17T23:28:53 merge: lock the index at the start of the merge Always lock the index when we begin the merge, before we write any of the metdata files. This prevents a race where another client may run a commit after we have written the MERGE_HEAD but before we have updated the index, which will produce a merge commit that is treesame to one parent. The merge will finish and update the index and the resultant commit would not be a merge at all.
Edward Thomson f1453c59 2015-02-12T12:19:37 Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
Edward Thomson 392702ee 2015-02-09T23:41:13 allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
Edward Thomson 85880693 2015-01-14T10:19:28 Merge branch 'pr/2740'
Pierre-Olivier Latour b3837d4d 2014-12-02T05:47:32 Always use GIT_HASHSIG_SMART_WHITESPACE when diffing for merges git_merge_tree_flag_t cannot contain any GIT_DIFF_FIND_xxx flags so there's not point in checking for them
Jacques Germishuys 6f73e026 2014-12-24T11:42:50 Plug some leaks
Edward Thomson 18b00406 2014-10-03T19:02:29 s/git_merge_head/git_annotated_commit Rename git_merge_head to git_annotated_commit, as it becomes used in more operations than just merge.
Edward Thomson 867a36f3 2014-07-14T14:35:01 Introduce git_rebase to set up a rebase session Introduce `git_rebase` to set up a rebase session that can then be continued. Immediately, only merge-type rebase is supported.
Arthur Schreiber 917f85a1 2014-10-09T14:16:10 Extract shared functionality.
Arthur Schreiber eca07bcd 2014-10-09T13:58:23 Add git_merge_bases_many.
Vicent Marti 737b5051 2014-10-01T12:03:24 hashsig: Export as a `sys` header
Jacques Germishuys dc68ee8d 2014-09-12T22:37:15 Remove local unused index_repo variable
Jacques Germishuys a565f364 2014-09-12T22:53:56 Only check for workdir conflicts if the index has merged files Passing 0 as the length of the paths to check to git_diff_index_to_workdir results in all files being treated as conflicting, that is, all untracked or modified files in the worktree is reported as conflicting
Vicent Marti 46a13f32 2014-08-29T18:19:56 Merge pull request #2481 from libgit2/cmn/oidarray merge: expose multiple merge bases