src/commit_list.c


Log

Author Commit Date CI Message
lhchavez 6f544140 2021-01-05T19:45:23 commit-graph: Introduce `git_commit_list_generation_cmp` This change makes calculations of merge-bases a bit faster when there are complex graphs and the commit times cause visiting nodes multiple times. This is done by visiting the nodes in the graph in reverse generation order when the generation number is available instead of commit timestamp. If the generation number is missing in any pair of commits, it can safely fall back to the old heuristic with no negative side-effects. Part of: #5757
lhchavez 25b75cd9 2021-03-10T07:06:15 commit-graph: Create `git_commit_graph` as an abstraction for the file This change does a medium-size refactor of the git_commit_graph_file and the interaction with the ODB. Now instead of the ODB owning a direct reference to the git_commit_graph_file, there will be an intermediate git_commit_graph. The main advantage of that is that now end users can explicitly set a git_commit_graph that is eagerly checked for errors, while still being able to lazily use the commit-graph in a regular ODB, if the file is present.
lhchavez 248606eb 2021-01-05T17:20:27 commit-graph: Use the commit-graph in revwalks This change makes revwalks a bit faster by using the `commit-graph` file (if present). This is thanks to the `commit-graph` allow much faster parsing of the commit information by requiring near-zero I/O (aside from reading a few dozen bytes off of a `mmap(2)`-ed file) for each commit, instead of having to read the ODB, inflate the commit, and parse it. This is done by modifying `git_commit_list_parse()` and letting it use the ODB-owned commit-graph file. Part of: #5757
Patrick Steinhardt f04a58b0 2019-10-03T12:55:48 Merge pull request #4445 from tiennou/shallow/dry-commit-parsing DRY commit parsing
Patrick Steinhardt 5cf17e0f 2019-10-03T09:39:42 commit_list: store in/out-degrees as uint16_t The commit list's in- and out-degrees are currently stored as `unsigned short`. When assigning it the value of `git_array_size`, which returns an `size_t`, this generates a warning on some Win32 platforms due to loosing precision. We could just cast the returned value of `git_array_size`, which would work fine for 99.99% of all cases as commits typically have less than 2^16 parents. For crafted commits though we might end up with a wrong value, and thus we should definitely check whether the array size actually fits into the field. To ease the check, let's convert the fields to store the degrees as `uint16_t`. We shouldn't rely on such unspecific types anyway, as it may lead to different behaviour across platforms. Furthermore, this commit introduces a new `git__is_uint16` function to check whether it actually fits -- if not, we return an error.
Etienne Samson 5988cf34 2017-12-15T18:11:51 commit_list: unify commit information parsing
Patrick Steinhardt 57a9ccd5 2019-06-21T15:53:54 commit_list: fix possible buffer overflow in `commit_quick_parse` The function `commit_quick_parse` provides a way to quickly parse parts of a commit without storing or verifying most of its metadata. The first thing it does is calculating the number of parents by skipping "parent " lines until it finds the first non-parent line. Afterwards, this parent count is passed to `alloc_parents`, which will allocate an array to store all the parent. To calculate the amount of storage required for the parents array, `alloc_parents` simply multiplicates the number of parents with the respective elements's size. This already screams "buffer overflow", and in fact this problem is getting worse by the result being cast to an `uint32_t`. In fact, triggering this is possible: git-hash-object(1) will happily write a commit with multiple millions of parents for you. I've stopped at 67,108,864 parents as git-hash-object(1) unfortunately soaks up the complete object without streaming anything to disk and thus will cause an OOM situation at a later point. The point here is: this commit was about 4.1GB of size but compressed down to 24MB and thus easy to distribute. The above doesn't yet trigger the buffer overflow, thus. As the array's elements are all pointers which are 8 bytes on 64 bit, we need a total of 536,870,912 parents to trigger the overflow to `0`. The effect is that we're now underallocating the array and do an out-of-bound writes. As the buffer is kindly provided by the adversary, this may easily result in code execution. Extrapolating from the test file with 67m commits to the one with 536m commits results in a factor of 8. Thus the uncompressed contents would be about 32GB in size and the compressed ones 192MB. While still easily distributable via the network, only servers will have that amount of RAM and not cause an out-of-memory condition previous to triggering the overflow. This at least makes this attack not an easy vector for client-side use of libgit2.
Edward Thomson d103f008 2019-05-21T13:44:47 pool: use `size_t` for sizes
Edward Thomson f673e232 2018-12-27T13:47:34 git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
Edward Thomson 168fe39b 2018-11-28T14:26:57 object_type: use new enumeration names Use the new object_type enumeration names within the codebase.
Patrick Steinhardt 1a3fa1f5 2018-10-18T11:25:59 commit_list: avoid use of strtol64 without length limit When quick-parsing a commit, we use `git__strtol64` to parse the commit's time. The buffer that's passed to `commit_quick_parse` is the raw data of an ODB object, though, whose data may not be properly formatted and also does not have to be `NUL` terminated. This may lead to out-of-bound reads. Use `git__strntol64` to avoid this problem.
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Carlos Martín Nieto 5e2a29a7 2016-09-27T13:11:47 commit_list: fix the date comparison function This returns the integer-cast truth value comparing the dates. What we want instead of a (-1, 0, 1) output depending on how they compare.
Vicent Marti d3416dfe 2015-10-28T10:50:25 pool: Dot not assume mallocs are zeroed out
Carlos Martín Nieto 5ffdea6f 2015-10-14T16:49:01 revwalk: make commit list use 64 bits for time We moved the "main" parsing to use 64 bits for the timestamp, but the quick parsing for the revwalk did not. This means that for large timestamps we fail to parse the time and thus the walk. Move this parser to use 64 bits as well.
Will Stamper b874629b 2014-12-04T21:06:59 Spelling fixes
Russell Belfer 4075e060 2014-02-03T21:02:08 Replace pqueue with code from hashsig heap I accidentally wrote a separate priority queue implementation when I was working on file rename detection as part of the file hash signature calculation code. To simplify licensing terms, I just adapted that to a general purpose priority queue and replace the old priority queue implementation that was borrowed from elsewhere. This also removes parts of the COPYING document that no longer apply to libgit2.
Arthur Schreiber 3736b64f 2013-06-25T18:36:37 Prefer younger merge bases over older ones. git-core prefers younger merge bases over older ones in case that multiple valid merge bases exists.
Russell Belfer badd85a6 2013-04-10T17:10:17 Use git_odb_object_data/_size whereever possible This uses the odb object accessors so we can change the internals more easily...
Vicent Marti 8842c75f 2013-04-03T22:30:07 What has science done.
Edward Thomson 359fc2d2 2013-01-08T17:07:25 update copyrights
Scott J. Goldman bdf3e6df 2012-11-29T17:34:41 Fix error condition typo
Russell Belfer d5e44d84 2012-11-29T17:02:27 Fix function name and add real error check `revwalk.h:commit_lookup()` -> `git_revwalk__commit_lookup()` and make `git_commit_list_parse()` do real error checking that the item in the list is an actual commit object. Also fixed an apparent typo in a test name.
Ben Straub 4ff192d3 2012-11-26T19:47:47 Move merge functions to merge.c In so doing, promote commit_list to git_commit_list, with its own internal API header.