src/diff_driver.c


Log

Author Commit Date CI Message
Peter Pettersson 7dcc29fc 2021-10-22T22:51:59 Make enum in src,tests and examples C90 compliant by removing trailing comma.
Edward Thomson f0e693b1 2021-09-07T17:53:49 str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
lhchavez 74708a81 2020-12-20T12:45:01 Homogenize semantics for atomic-related functions There were some subtle semantic differences between the various implementations of atomic functions. Now they behave the same, have tests and are better documented to avoid this from happening again in the future. Of note: * The semantics chosen for `git_atomic_compare_and_swap` match `InterlockedCompareExchangePointer`/`__sync_cal_compare_and_swap` now. * The semantics chosen for `git_atomic_add` match `InterlockedAdd`/`__atomic_add_fetch`. * `git_atomic_swap` and `git_atomic_load` still have a bit of semantic difference with the gcc builtins / msvc interlocked operations, since they require an l-value (not a pointer). If desired, this can be homogenized.
Peter Pettersson 7f1dd703 2021-08-25T20:08:58 array: fix dereference from void * type
Edward Thomson d525e063 2021-05-10T23:04:59 buf: remove internal `git_buf_text` namespace The `git_buf_text` namespace is unnecessary and strange. Remove it, just keep the functions prefixed with `git_buf`.
Edward Thomson ab772974 2020-12-05T15:49:30 threads: give atomic functions the git_atomic prefix
Edward Thomson 5d6c2f26 2020-04-05T14:59:54 diff: use GIT_ASSERT
Patrick Steinhardt 7aacf027 2019-09-13T08:55:33 global: convert all users of POSIX regex to use our new regexp API The old POSIX regex API has been superseded by our new regexp API. Convert all users to make use of the new one.
Edward Thomson 91a300b7 2019-06-16T00:46:30 attr: rename constants and macros for consistency Our enumeration values are not generally suffixed with `T`. Further, our enumeration names are generally more descriptive.
Patrick Steinhardt 31f8f82a 2018-03-02T12:18:59 diff_driver: detect memory allocation errors when loading diff driver When searching for a configuration key for the diff driver, we construct the config key by modifying a buffer and then passing it to `git_config_get_multivar_foreach`. We do not check though whether the modification of the buffer actually succeded, so we could in theory end up passing the OOM buffer to the config function. Fix that by checking return codes. While at it, switch to use `git_buf_PUTS` to avoid repetition of the appended string to calculate its length.
Edward Thomson 02683b20 2019-01-12T23:06:39 regexec: prefix all regexec function calls with p_ Prefix all the calls to the the regexec family of functions with `p_`. This allows us to swap out all the regular expression functions with our own implementation. Move the declarations to `posix_regex.h` for simpler inclusion.
Patrick Steinhardt 03555830 2019-01-23T10:44:33 strmap: introduce high-level setter for key/value pairs Currently, one would use the function `git_strmap_insert` to insert key/value pairs into a map. This function has historically been a macro, which is why its syntax is kind of weird: instead of returning an error code directly, it instead has to be passed a pointer to where the return value shall be stored. This does not match libgit2's common idiom of directly returning error codes. Introduce a new function `git_strmap_set`, which takes as parameters the map, key and value and directly returns an error code. Convert all callers of `git_strmap_insert` to make use of it.
Patrick Steinhardt ef507bc7 2019-01-23T10:44:02 strmap: introduce `git_strmap_get` and use it throughout the tree The current way of looking up an entry from a map is tightly coupled with the map implementation, as one first has to look up the index of the key and then retrieve the associated value by using the index. As a caller, you usually do not care about any indices at all, though, so this is more complicated than really necessary. Furthermore, it invites for errors to happen if the correct error checking sequence is not being followed. Introduce a new high-level function `git_strmap_get` that takes a map and a key and returns a pointer to the associated value if such a key exists. Otherwise, a `NULL` pointer is returned. Adjust all callers that can trivially be converted.
Patrick Steinhardt 351eeff3 2019-01-23T10:42:46 maps: use uniform lifecycle management functions Currently, the lifecycle functions for maps (allocation, deallocation, resize) are not named in a uniform way and do not have a uniform function signature. Rename the functions to fix that, and stick to libgit2's naming scheme of saying `git_foo_new`. This results in the following new interface for allocation: - `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an error code if we ran out of memory - `void git_<t>map_free(git_<t>map *map)` to free a map - `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map This commit also fixes all existing callers.
Edward Thomson f673e232 2018-12-27T13:47:34 git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
Patrick Steinhardt 852bc9f4 2018-11-23T19:26:24 khash: remove intricate knowledge of khash types Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types, simply use `size_t` instead. This decouples code from the khash stuff and makes it possible to move the khash includes into the implementation files.
Patrick Steinhardt ecf4f33a 2018-02-08T11:14:48 Convert usage of `git_buf_free` to new `git_buf_dispose`
Patrick Steinhardt d8896bda 2018-01-03T16:07:36 diff_generate: avoid excessive stats of .gitattribute files When generating a diff between two trees, for each file that is to be diffed we have to determine whether it shall be treated as text or as binary files. While git has heuristics to determine which kind of diff to generate, users can also that default behaviour by setting or unsetting the 'diff' attribute for specific files. Because of that, we have to query gitattributes in order to determine how to diff the current files. Instead of hitting the '.gitattributes' file every time we need to query an attribute, which can get expensive especially on networked file systems, we try to cache them instead. This works perfectly fine for every '.gitattributes' file that is found, but we hit cache invalidation problems when we determine that an attribuse file is _not_ existing. We do create an entry in the cache for missing '.gitattributes' files, but as soon as we hit that file again we invalidate it and stat it again to see if it has now appeared. In the case of diffing large trees with each other, this behaviour is very suboptimal. For each pair of files that is to be diffed, we will repeatedly query every directory component leading towards their respective location for an attributes file. This leads to thousands or even hundreds of thousands of wasted syscalls. The attributes cache already has a mechanism to help in that scenario in form of the `git_attr_session`. As long as the same attributes session is still active, we will not try to re-query the gitmodules files at all but simply retain our currently cached results. To fix our problem, we can create a session at the top-most level, which is the initialization of the `git_diff` structure, and use it in order to look up the correct diff driver. As the `git_diff` structure is used to generate patches for multiple files at once, this neatly solves our problem by retaining the session until patches for all files have been generated. The fix has been tested with linux.git by calling `git_diff_tree_to_tree` and `git_diff_to_buf` with v4.10^{tree} and v4.14^{tree}. | time | .gitattributes stats without fix | 33.201s | 844614 with fix | 30.327s | 4441 While execution only improved by roughly 10%, the stat(3) syscalls for .gitattributes files decreased by 99.5%. The benchmarks were quite simple with best-of-three timings on Linux ext4 systems. One can assume that for network based file systems the performance gain will be a lot larger due to a much higher latency.
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Patrick Steinhardt 13c3bc9a 2017-01-27T14:32:23 strmap: remove GIT__USE_STRMAP macro
Patrick Steinhardt 73028af8 2017-01-27T14:20:24 khash: avoid using macro magic to get return address
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Arthur Schreiber ab96ca55 2016-10-06T13:15:31 Make sure we use the `C` locale for `regcomp` on macOS.
Edward Thomson 20302aa4 2016-06-25T23:33:05 Merge pull request #3223 from ethomson/apply Reading patch files
Patrick Steinhardt 8fd74c08 2016-02-09T12:18:28 Avoid old-style function definitions Avoid declaring old-style functions without any parameters. Functions not accepting any parameters should be declared with `void fn(void)`. See ISO C89 $3.5.4.3.
Edward Thomson 804d5fe9 2015-09-11T08:37:12 patch: abstract patches into diff'ed and parsed Patches can now come from a variety of sources - either internally generated (from diffing two commits) or as the results of parsing some external data.
Carlos Martín Nieto e451cd5c 2015-08-15T18:46:38 diff: don't error out on an invalid regex When parsing user-provided regex patterns for functions, we must not fail to provide a diff just because a pattern is not well formed. Ignore it instead.
Patrick Steinhardt 129022ee 2015-04-10T09:36:38 Fix checking of return value for regcomp. The regcomp function returns a non-zero value if compilation of a regular expression fails. In most places we only check for negative values, but positive values indicate an error, as well. Fix this tree-wide, fixing a segmentation fault when calling git_config_iterator_glob_new with an invalid regexp.
Carlos Martín Nieto 9a97f49e 2014-12-21T15:31:03 config: borrow refcounted references This changes the get_entry() method to return a refcounted version of the config entry, which you have to free when you're done. This allows us to avoid freeing the memory in which the entry is stored on a refresh, which may happen at any time for a live config. For this reason, get_string() has been forbidden on live configs and a new function get_string_buf() has been added, which stores the string in a git_buf which the user then owns. The functions which parse the string value takea advantage of the borrowing to parse safely and then release the entry.
Edward Thomson d4cf1675 2015-02-19T10:05:33 buffer: introduce git_buf_attach_notowned Provide a convenience function that creates a buffer that can be provided to callers but will not be freed via `git_buf_free`, so the buffer creator maintains the allocation lifecycle of the buffer's contents.
Stefan Widgren c8e02b87 2015-02-15T21:07:05 Remove extra semicolon outside of a function Without this change, compiling with gcc and pedantic generates warning: ISO C does not allow extra ‘;’ outside of a function.
Edward Thomson f1453c59 2015-02-12T12:19:37 Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
Edward Thomson 392702ee 2015-02-09T23:41:13 allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
Russell Belfer d0f00de4 2014-05-16T11:08:19 Increase binary detection len to 8k
Russell Belfer a37aa82e 2014-05-13T15:54:23 Some coverity inspired cleanups
Russell Belfer b1914c36 2014-05-12T10:24:46 Minor fixes for warnings and error propagation
Carlos Martín Nieto ac99d86b 2014-05-07T11:34:32 repository: introduce a convenience config snapshot method Accessing the repository's config and immediately taking a snapshot of it is a common operation, so let's provide a convenience function for it.
Carlos Martín Nieto 29c4cb09 2014-03-15T03:53:36 Use config snapshotting This way we can assume we have a consistent view of the config situation when we're looking up remote, branch, pack-objects, etc.
Russell Belfer 40ed4990 2014-02-11T14:45:37 Add diff threading tests and attr file cache locks This adds a basic test of doing simultaneous diffs on multiple threads and adds basic locking for the attr file cache because that was the immediate problem that arose from these tests.
Russell Belfer 082e82db 2014-01-27T11:45:06 Update Javascript userdiff driver and tests Writing a sample Javascript driver pointed out some extra whitespace handling that needed to be done in the diff driver. This adds some tests with some sample javascript code that I pulled off of GitHub just to see what would happen. Also, to clean up the userdiff test data, I did a "git gc" and packed up the test objects.
Russell Belfer c7c260a5 2014-01-23T16:12:39 Got some permission to use userdiff patterns I contacted a number of Git authors and lined up their permission to relicense their work for use in libgit2 and copied over their code for diff driver xfuncname patterns. At this point, the code I've copied is taken verbatim from core Git although Thomas Rast warned me that the C++ patterns, at least, really need an update. I've left off patterns where I don't feel like I have permission at this point until I hear from more authors.
Russell Belfer b8e86c62 2014-01-21T12:00:08 Implement matched pattern extract for fn headers
Russell Belfer 2c65602e 2014-01-21T10:39:27 Import git drivers and test HTML driver Reorganize the builtin driver table slightly so that core Git builtin definitions can be imported verbatim. Then take a few of the core Git drivers and pull them in. This also creates a test of diffs with the builtin HTML driver which led to some small error handling fixes in the driver selection logic.
Russell Belfer a5a38643 2014-01-20T14:53:59 Initial take on builtin drivers with multiline This extends the diff driver parser to support multiline driver definitions along with ! prefixing for negated matches. This brings the driver function pattern parsing in line with core Git. This also adds an internal table of driver definitions and a fallback code path that will look in that table for diff drivers that are set with attributes without having a definition in the config file. Right now, I just populated the table with a kind of simple HTML definition that is similar to the core Git def.
Russell Belfer 9f77b3f6 2013-11-25T14:21:34 Add config read fns with controlled error behavior This adds `git_config__lookup_entry` which will look up a key in a config and return either the entry or NULL if the key was not present. Optionally, it can either suppress all errors or can return them (although not finding the key is not an error for this function). Unlike other accessors, this does not normalize the config key string, so it must only be used when the key is known to be in normalized form (i.e. all lower-case before the first dot and after the last dot, with no invalid characters). This also adds three high-level helper functions to look up config values with no errors and a fallback value. The three functions are for string, bool, and int values, and will resort to the fallback value for any error that arises. They are: * `git_config__get_string_force` * `git_config__get_bool_force` * `git_config__get_int_force` None of them normalize the config `key` either, so they can only be used for internal cases where the key is known to be in normal format.
Carlos Martín Nieto 4efa3290 2013-08-08T13:41:18 config: get_multivar -> get_multivar_foreach The plain function will return an iterator, so move this one out of the way.
Russell Belfer 584f2d30 2013-07-11T11:04:42 Fix warnings on Win64
Russell Belfer a5f9b5f8 2013-07-05T16:59:38 Diff hunk context off by one on long lines The diff hunk context string that is returned to xdiff need not be NUL terminated because the xdiff code just copies the number of bytes that you report directly into the output. There was an off by one in the diff driver code when the header context was longer than the output buffer size, the output buffer length included the NUL byte which was copied into the hunk header. Fixes #1710
Russell Belfer 37f66e82 2013-06-12T15:21:21 Fix Windows warnings This fixes problems with missing function prototypes and 64-bit data issues on Windows.
Russell Belfer ef3374a8 2013-06-12T13:46:44 Improvements to git_array This changes the size data to uint32_t, fixes the array growth logic to use a simple 1.5x multiplier, and uses a generic inline function for growing the array to make the git_array_alloc API feel more natural (i.e. it returns a pointer to the new item).
Russell Belfer 54faddd2 2013-06-12T11:54:11 Fix some diff driver memory leaks
Russell Belfer 42e6cf78 2013-06-11T17:45:14 Add diff drivers tests (and fix bugs) This adds real tests for user-configured diff drivers and in the process found a bunch of bugs.
Russell Belfer 5dc98298 2013-06-11T11:22:22 Implement regex pattern diff driver This implements the loading of regular expression pattern lists for diff drivers that search for function context in that way. This also changes the way that diff drivers update options and interface with xdiff APIs to make them a little more flexible.
Russell Belfer 3eadfecd 2013-06-10T15:24:20 start implementing diff driver registry
Russell Belfer 114f5a6c 2013-06-10T10:10:39 Reorganize diff and add basic diff driver This is a significant reorganization of the diff code to break it into a set of more clearly distinct files and to document the new organization. Hopefully this will make the diff code easier to understand and to extend. This adds a new `git_diff_driver` object that looks of diff driver information from the attributes and the config so that things like function content in diff headers can be provided. The full driver spec is not implemented in the commit - this is focused on the reorganization of the code and putting the driver hooks in place. This also removes a few #includes from src/repository.h that were overbroad, but as a result required extra #includes in a variety of places since including src/repository.h no longer results in pulling in the whole world.