src/iterator.h


Log

Author Commit Date CI Message
Peter Pettersson 7dcc29fc 2021-10-22T22:51:59 Make enum in src,tests and examples C90 compliant by removing trailing comma.
Edward Thomson f0e693b1 2021-09-07T17:53:49 str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
Edward Thomson 79b0c8c8 2020-11-21T23:29:29 iterator: use GIT_ASSERT
Edward Thomson b59c71d8 2020-01-18T14:11:01 iterator: update enum type name for consistency libgit2 does not use `type_t` suffixes as it's redundant; thus, rename `git_iterator_type_t` to `git_iterator_t` for consistency.
Edward Thomson d54aa9ae 2018-06-26T15:25:30 iterator: introduce `git_iterator_foreach` Introduce a `git_iterator_foreach` helper function which invokes a callback on all files for a given iterator.
Edward Thomson 2b12dcf6 2018-03-19T19:45:11 iterator: optionally hash filesystem iterators Optionally hash the contents of files encountered in the filesystem or working directory iterators. This is not expected to be used in production code paths, but may allow us to simplify some test contexts. For working directory iterators, apply filters as appropriate, since we have the context able to do it.
Edward Thomson 9e94b6af 2017-12-30T00:12:46 iterator: cleanups with symlink dir handling Perform some error checking when examining symlink directories.
Andy Doan e9628e7b 2017-10-30T11:38:33 branches: Check symlinked subdirectories Native Git allows symlinked directories under .git/refs. This change allows libgit2 to also look for references that live under symlinked directories. Signed-off-by: Andy Doan <andy@opensourcefoundries.com>
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Edward Thomson 9eb9e5fa 2016-03-21T17:19:24 iterator: cleanups Remove some unused functions, refactor some ugliness.
Edward Thomson 247e3b43 2016-03-21T16:51:45 iterator: mandate `advance_over` Since the three iterators implement `advance_over` differently, mandate it and implement each.
Edward Thomson 82a1aab6 2016-03-18T12:59:35 iterator: move the index into the iterator itself
Edward Thomson 0a2e1032 2016-03-17T15:19:45 iterator: drop `advance_into_or_over` Now that iterators do not return `GIT_ENOTFOUND` when advancing into an empty directory, we do not need a special `advance_into_or_over` function.
Edward Thomson 0e0589fc 2016-03-10T00:04:26 iterator: combine fs+workdir iterators more completely Drop some of the layers of indirection between the workdir and the filesystem iterators. This makes the code a little bit easier to follow, and reduces the number of unnecessary allocations a bit as well. (Prior to this, when we filter entries, we would allocate them, filter them and then free them; now we do the filtering before allocation.) Also, rename `git_iterator_advance_over_with_status` to just `git_iterator_advance_over`. Mostly because it's a fucking long-ass function name otherwise.
Edward Thomson be30387e 2016-02-25T16:05:18 iterators: refactored tree iterator Refactored the tree iterator to never recurse; simply process the next entry in order in `advance`. Additionally, reduce the number of allocations and sorting as much as possible to provide a ~30% speedup on case-sensitive iteration. (The gains for case-insensitive iteration are less majestic.)
Edward Thomson 684b35c4 2016-02-25T15:11:14 iterator: disambiguate reset and reset_range Disambiguate the reset and reset_range functions. Now reset_range with a NULL path will clear the start or end; reset will leave the existing start and end unchanged.
Edward Thomson ac05086c 2016-02-25T14:51:23 iterator: drop unused/unimplemented `seek`
Arthur Schreiber 3679ebae 2016-02-11T23:37:52 Horrible fix for #3173.
Edward Thomson d53c8880 2015-08-30T19:25:47 iterator: saner pathlist matching for idx iterator Some nicer refactoring for index iteration walks. The index iterator doesn't binary search through the pathlist space, since it lacks directory entries, and would have to binary search each index entry and all its parents (eg, when presented with an index entry of `foo/bar/file.c`, you would have to look in the pathlist for `foo/bar/file.c`, `foo/bar` and `foo`). Since the index entries and the pathlist are both nicely sorted, we walk the index entries in lockstep with the pathlist like we do for other iteration/diff/merge walks.
Edward Thomson 4a0dbeb0 2015-08-30T17:06:26 diff: use new iterator pathlist handling When using literal pathspecs in diff with `GIT_DIFF_DISABLE_PATHSPEC_MATCH` turn on the faster iterator pathlist handling. Updates iterator pathspecs to include directory prefixes (eg, `foo/`) for compatibility with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`.
Edward Thomson ef206124 2015-07-28T19:55:37 Move filelist into the iterator handling itself.
Edward Thomson ed1c6446 2015-07-28T11:41:27 iterator: use an options struct instead of args
Edward Thomson 8960dc1e 2015-06-24T18:10:30 iterator: provide git_iterator_walk Provide `git_iterator_walk` to walk each iterator in lockstep, returning each iterator's idea of the contents of the next path.
Carlos Martín Nieto ff475375 2015-06-17T14:34:10 diff: check files with the same or newer timestamps When a file on the workdir has the same or a newer timestamp than the index, we need to perform a full check of the contents, as the update of the file may have happened just after we wrote the index. The iterator changes are such that we can reach inside the workdir iterator from the diff, though it may be better to have an accessor instead of moving these structs into the header.
Edward Thomson aa3af01d 2015-05-13T15:52:21 index iterator: optionally include conflicts
Carlos Martín Nieto 62a617dc 2014-11-06T16:16:46 iterator: submodules are determined by an index or tree We cannot know from looking at .gitmodules whether a directory is a submodule or not. We need the index or tree we are comparing against to tell us. Otherwise we have to assume the entry in .gitmodules is stale or otherwise invalid. Thus we pass the index of the repository into the workdir iterator, even if we do not want to compare against it. This follows what git does, which even for `git diff <tree>`, it will consider staged submodules as such.
Russell Belfer f554611a 2014-05-06T12:41:26 Improve checks for ignore containment The diff code was using an "ignored_prefix" directory to track if a parent directory was ignored that contained untracked files alongside tracked files. Unfortunately, when negative ignore rules were used for directories inside ignored parents, the wrong rules were applied to untracked files inside the negatively ignored child directories. This commit moves the logic for ignore containment into the workdir iterator (which is a better place for it), so the ignored-ness of a directory is contained in the frame stack during traversal. This allows a child directory to override with a negative ignore and yet still restore the ignored state of the parent when we traverse out of the child. Along with this, there are some problems with "directory only" ignore rules on container directories. Given "a/*" and "!a/b/c/" (where the second rule is a directory rule but the first rule is just a generic prefix rule), then the directory only constraint was having "a/b/c/d/file" match the first rule and not the second. This was fixed by having ignore directory-only rules test a rule against the prefix of a file with LEADINGDIR enabled. Lastly, spot checks for ignores using `git_ignore_path_is_ignored` were tested from the top directory down to the bottom to deal with the containment problem, but this is wrong. We have to test bottom to top so that negative subdirectory rules will be checked before parent ignore rules. This does change the behavior of some existing tests, but it seems only to bring us more in line with core Git, so I think those changes are acceptable.
Russell Belfer 9c8ed499 2014-04-29T15:05:58 Remove trace / add git_diff_perfdata struct + api
Russell Belfer cd424ad5 2014-04-28T16:39:53 Add GIT_STATUS_OPT_UPDATE_INDEX and use trace API This adds an option to refresh the stat cache while generating status. It also rips out the GIT_PERF stuff I had an makes use of the trace API to keep statistics about what happens during diff.
Russell Belfer 240f4af3 2014-04-28T14:04:29 Add build option for diff internal statistics
Russell Belfer 219c89d1 2014-04-23T16:28:45 Treat ignored, empty, and untracked dirs different In the iterator, distinguish between ignores and empty directories so that diff and status can ignore empty directories, but checkout and stash can treat them as untracked items.
Russell Belfer 37da3685 2014-04-22T21:51:54 Make checkout match diff for untracked/ignored dir When diff finds an untracked directory, it emulates Git behavior by looking inside the directory to see if there are any untracked items inside it. If there are only ignored items inside the dir, then diff considers it ignored, even if there is no direct ignore rule for it. Checkout was not copying this behavior - when it found an untracked directory, it just treated it as untracked. Unfortunately, when combined with GIT_CHECKOUT_REMOVE_UNTRACKED, this made is seem that checkout (and stash, which uses checkout) was removing ignored items when you had only asked it to remove untracked ones. This commit moves the logic for advancing past an untracked dir while scanning for non-ignored items into an iterator helper fn, and uses that for both diff and checkout.
Russell Belfer 2fe54afa 2013-09-30T16:58:33 Put hooks in place for precompose in dirload fn This doesn't actual do string precompose but it puts the hooks in place into the iterators and the git_path_dirload function so that the actual precompose work is ready to go.
Russell Belfer 9094ae5a 2013-06-21T11:51:16 Add target directory to checkout This adds the ability for checkout to write to a target directory instead of having to use the working directory of the repository. This makes it easier to do exports of repository data and the like. This is similar to, but not quite the same as, the --prefix option to `git checkout-index` (this will always be treated as a directory name, not just as a simple text prefix). As part of this, the workdir iterator was extended to take the path to the working directory as a parameter and fallback on the git_repository_workdir result only if it's not specified. Fixes #1332
Russell Belfer cee695ae 2013-05-31T12:18:43 Make iterators use GIT_ITEROVER & smart advance 1. internal iterators now return GIT_ITEROVER when you go past the last item in the iteration. 2. git_iterator_advance will "advance" to the first item in the iteration if it is called immediately after creating the iterator, which allows a simpler idiom for basic iteration. 3. if git_iterator_advance encounters an error reading data (e.g. a missing tree or an unreadable file), it returns the error but also attempts to advance past the invalid data to prevent an infinite loop. Updated all tests and internal usage of iterators to account for these new behaviors.
Russell Belfer ff0ddfa4 2013-04-17T15:56:31 Add filesystem iterator variant This adds a new variant iterator that is a raw filesystem iterator for scanning directories from a root. There is still more work to do to blend this with the working directory iterator.
Russell Belfer 9bea03ce 2013-03-06T15:16:34 Add INCLUDE_TREES, DONT_AUTOEXPAND iterator flags This standardizes iterator behavior across all three iterators (index, tree, and working directory). Previously the working directory iterator behaved differently from the other two. Each iterator can now operate in one of three modes: 1. *No tree results, auto expand trees* means that only non- tree items will be returned and when a tree/directory is encountered, we will automatically descend into it. 2. *Tree results, auto expand trees* means that results will be given for every item found, including trees, but you only need to call normal git_iterator_advance to yield every item (i.e. trees returned with pre-order iteration). 3. *Tree results, no auto expand* means that calling the normal git_iterator_advance when looking at a tree will not descend into the tree, but will skip over it to the next entry in the parent. Previously, behavior 1 was the only option for index and tree iterators, and behavior 3 was the only option for workdir. The main public API implications of this are that the `git_iterator_advance_into()` call is now valid for all iterators, not just working directory iterators, and all the existing uses of working directory iterators explicitly use the GIT_ITERATOR_DONT_AUTOEXPAND (for now). Interestingly, the majority of the implementation was in the index iterator, since there are no tree entries there and now have to fake them. The tree and working directory iterators only required small modifications.
Russell Belfer cc216a01 2013-03-05T16:29:04 Retire spoolandsort iterator Since the case sensitivity is moved into the respective iterators, this removes the spoolandsort iterator code.
Russell Belfer 169dc616 2013-03-05T16:10:05 Make iterator APIs consistent with standards The iterator APIs are not currently consistent with the parameter ordering of the rest of the codebase. This rearranges the order of parameters, simplifies the naming of a number of functions, and makes somewhat better use of macros internally to clean up the iterator code. This also expands the test coverage of iterator functionality, making sure that case sensitive range-limited iteration works correctly.
Russell Belfer 25423d03 2013-01-09T16:07:54 Support case insensitive tree iterators and status This makes tree iterators directly support case insensitivity by using a secondary index that can be sorted by icase. Also, this fixes the ambiguity check in the git_status_file API to also be case insensitive. Lastly, this adds new test cases for case insensitive range boundary checking for all types of iterators. With this change, it should be possible to deprecate the spool and sort iterator, but I haven't done that yet.
Russell Belfer 134d8c91 2013-01-08T15:53:13 Update iterator API with flags for ignore_case This changes the iterator API so that flags can be passed in to the constructor functions to control the ignore_case behavior. At this point, the flags are not supported on tree iterators (i.e. there is no functional change over the old API), but the API changes are all made to accomodate this. By the way, I went with a flags parameter because in the future I have a couple of other ideas for iterator flags that will make it easier to fix some diff/status/checkout bugs.
Russell Belfer 4b181037 2013-01-08T13:39:15 Minor iterator API cleanups In preparation for further iterator changes, this cleans up a few small things in the iterator API: * removed the git_iterator_for_repo_index_range API * made git_iterator_free not be inlined * minor param name and test function name tweaks
Edward Thomson 359fc2d2 2013-01-08T17:07:25 update copyrights
Russell Belfer 546d65a8 2013-01-02T17:01:34 Fix up spoolandsort iterator usage The spoolandsort iterator changes got sort-of cherry picked out of this branch and so I dropped the commit when rebasing; however, there were a few small changes that got dropped as well (since the version merged upstream wasn't quite the same as what I dropped).
Russell Belfer 5cf9875a 2012-12-18T15:19:24 Add index updating to checkout Make checkout update entries in the index for all files that are updated and/or removed, unless flag GIT_CHECKOUT_DONT_UPDATE_INDEX is given. To do this, iterators were extended to allow a little more introspection into the index being iterated over, etc.
Russell Belfer f616a36b 2012-12-27T22:25:52 Make spoolandsort a pushable iterator behavior An earlier change to `git_diff_from_iterators` introduced a memory leak where the allocated spoolandsort iterator was not returned to the caller and thus not freed. One proposal changes all iterator APIs to use git_iterator** so we can reallocate the iterator at will, but that seems unexpected. This commit makes it so that an iterator can be changed in place. The callbacks are isolated in a separate structure and a pointer to that structure can be reassigned by the spoolandsort extension. This means that spoolandsort doesn't create a new iterator; it just allocates a new block of callbacks (along with space for its own extra data) and swaps that into the iterator. Additionally, since spoolandsort is only needed to switch the case sensitivity of an iterator, this simplifies the API to only take the ignore_case boolean and to be a no-op if the iterator already matches the requested case sensitivity.
Russell Belfer 91e7d263 2012-12-10T15:29:44 Fix iterator reset and add reset ranges The `git_iterator_reset` command has not been working in all cases particularly when there is a start and end range. This fixes it and adds tests for it, and also extends it with the ability to update the start/end range strings when an iterator is reset.
Russell Belfer 9950d27a 2012-12-06T13:26:58 Clean up iterator APIs This removes the need to explicitly pass the repo into iterators where the repo is implied by the other parameters. This moves the repo to be owned by the parent struct. Also, this has some iterator related updates to the internal diff API to lay the groundwork for checkout improvements.
Russell Belfer bad68c0a 2012-11-13T14:02:59 Add iterator for git_index object The index iterator could previously only be created from a repo object, but this allows creating an iterator from a `git_index` object instead (while keeping, though renaming, the old function).
Russell Belfer 0d64bef9 2012-10-05T15:56:57 Add complex checkout test and then fix checkout This started as a complex new test for checkout going through the "typechanges" test repository, but that revealed numerous issues with checkout, including: * complete failure with submodules * failure to create blobs with exec bits * problems when replacing a tree with a blob because the tree "example/" sorts after the blob "example" so the delete was being processed after the single file blob was created This fixes most of those problems and includes a number of other minor changes that made it easier to do that, including improving the TYPECHANGE support in diff/status, etc.
Russell Belfer dfbff793 2012-10-08T15:14:12 Fix a few diff bugs with directory content There are a few cases where diff should leave directories in the diff list if we want to match core git, such as when the directory contains a .git dir. That feature was lost when I introduced some of the new submodule handling. This restores that and then fixes a couple of related to diff output that are triggered by having diffs with directories in them. Also, this adds a new flag that can be passed to diff if you want diff output to actually include the file content of any untracked files.
Philip Kelley f08c60a5 2012-09-17T16:10:42 Minor fixes for ignorecase support
Philip Kelley ec40b7f9 2012-09-17T15:42:41 Support for core.ignorecase
Russell Belfer 41a82592 2012-05-15T14:17:39 Ranged iterators and rewritten git_status_file The goal of this work is to rewrite git_status_file to use the same underlying code as git_status_foreach. This is done in 3 phases: 1. Extend iterators to allow ranged iteration with start and end prefixes for the range of file names to be covered. 2. Improve diff so that when there is a pathspec and there is a common non-wildcard prefix of the pathspec, it will use ranged iterators to minimize excess iteration. 3. Rewrite git_status_file to call git_status_foreach_ext with a pathspec that covers just the one file being checked. Since ranged iterators underlie the status & diff implementation, this is actually fairly efficient. The workdir iterator does end up loading the contents of all the directories down to the single file, which should ideally be avoided, but it is pretty good.
nulltoken 87fe3507 2012-05-13T19:09:25 iterator: prevent git_iterator_free() from segfaulting when being passed a NULL iterator
Russell Belfer 7e000ab2 2012-05-08T15:03:59 Add support for diffing index with no HEAD When a repo is first created, there is no HEAD yet and attempting to diff files in the index was showing nothing because a tree iterator could not be constructed. This adds an "empty" iterator and falls back on that when the head cannot be looked up.
Russell Belfer 74fa4bfa 2012-02-28T16:14:47 Update diff to use iterators This is a major reorganization of the diff code. This changes the diff functions to use the iterators for traversing the content. This allowed a lot of code to be simplified. Also, this moved the functions relating to outputting a diff into a new file (diff_output.c). This includes a number of other changes - adding utility functions, extending iterators, etc. plus more tests for the diff code. This also takes the example diff.c program much further in terms of emulating git-diff command line options.
Russell Belfer da337c80 2012-02-22T11:22:33 Iterator improvements from diff implementation This makes two changes to iterator behavior: first, advance can optionally do the work of returning the new current value. This is such a common pattern that it really cleans up usage. Second, for workdir iterators, this removes automatically iterating into directories. That seemed like a good idea, but when an entirely new directory hierarchy is introduced into the workdir, there is no reason to iterate into it if there are no corresponding entries in the tree/index that it is being compared to. This second change actually wasn't a lot of code because not descending into directories was already the behavior for ignored directories. This just extends that to all directories.
Russell Belfer b6c93aef 2012-02-21T14:46:24 Uniform iterators for trees, index, and workdir This create a new git_iterator type of object that provides a uniform interface for iterating over the index, an arbitrary tree, or the working directory of a repository. As part of this, git ignore support was extended to support push and pop of directory-based ignore files as the working directory is being traversed (so the array of ignores does not have to be recreated at each directory during traveral). There are a number of other small utility functions in buffer, path, vector, and fileops that are included in this patch that made the iterator implementation cleaner.