src/iterator.c


Log

Author Commit Date CI Message
Edward Thomson 91246ee5 2021-11-01T20:14:34 path: use new length validation functions
Edward Thomson 95117d47 2021-10-31T09:45:46 path: separate git-specific path functions from util Introduce `git_fs_path`, which operates on generic filesystem paths. `git_path` will be kept for only git-specific path functionality (for example, checking for `.git` in a path).
Edward Thomson f0e693b1 2021-09-07T17:53:49 str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
Edward Thomson b457fe27 2021-04-04T22:18:55 iterator: validate workdir paths Supply the repository for the filesystem and workdir iterators - for workdir iterators, this is non-null and we can lookup the core.longpaths configuration option. (For regular filesystem iterators, this is NULL, so core.longpaths does not apply.)
Edward Thomson 79b0c8c8 2020-11-21T23:29:29 iterator: use GIT_ASSERT
Edward Thomson 0f35efeb 2020-05-23T10:15:51 git_pool_init: handle failure cases Propagate failures caused by pool initialization errors.
Edward Thomson b59c71d8 2020-01-18T14:11:01 iterator: update enum type name for consistency libgit2 does not use `type_t` suffixes as it's redundant; thus, rename `git_iterator_type_t` to `git_iterator_t` for consistency.
Patrick Steinhardt 699de9c5 2019-08-27T10:36:17 iterator: remove duplicate memset When allocating new tree iterator frames, we zero out the allocated memory twice. Remove one of the `memset` calls.
Patrick Steinhardt 9ca7a60e 2019-08-27T10:36:20 iterator: avoid leaving partially initialized frame on stack When allocating tree iterator entries, we use GIT_ERROR_ALLOC_CHECK` to check whether the allocation has failed. The macro will cause the function to immediately return, though, leaving behind a partially initialized iterator frame. Fix the issue by manually checking for memory allocation errors and using `goto done` in case of an error, popping the iterator frame.
Patrick Steinhardt 658022c4 2019-07-18T13:53:41 configuration: cvar -> configmap `cvar` is an unhelpful name. Refactor its usage to `configmap` for more clarity.
Edward Thomson d103f008 2019-05-21T13:44:47 pool: use `size_t` for sizes
Edward Thomson b205f538 2019-05-20T06:38:51 iterator: sanity-check path length and safely cast
Etienne Samson 431601f2 2019-04-05T15:05:10 iterator: make use the `GIT_CONTAINER_OF` macro
Edward Thomson 1d4ddb8e 2019-01-20T23:42:08 iterator: cast filesystem iterator entry values explicitly The filesystem iterator takes `stat` data from disk and puts them into index entries, which use 32 bit ints for time (the seconds portion) and filesize. However, on most systems these are not 32 bit, thus will typically invoke a warning. Most users ignore these fields entirely. Diff and checkout code do use the values, however only for the cache to determine if they should check file modification. Thus, this is not a critical error (and will cause a hash recomputation at worst).
Edward Thomson f673e232 2018-12-27T13:47:34 git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
Edward Thomson 168fe39b 2018-11-28T14:26:57 object_type: use new enumeration names Use the new object_type enumeration names within the codebase.
Patrick Steinhardt b2af13f2 2018-11-21T12:07:23 iterator: remove unused function `tree_iterator_entry_cmp` The function `tree_iterator_entry_cmp` has been introduced in commit be30387e8 (iterators: refactored tree iterator, 2016-02-25), but in fact it has never been used at all. Remove it to avoid unused function warnings as soon as we re-enable "-Wunused-functions".
Edward Thomson d54aa9ae 2018-06-26T15:25:30 iterator: introduce `git_iterator_foreach` Introduce a `git_iterator_foreach` helper function which invokes a callback on all files for a given iterator.
Edward Thomson 2b12dcf6 2018-03-19T19:45:11 iterator: optionally hash filesystem iterators Optionally hash the contents of files encountered in the filesystem or working directory iterators. This is not expected to be used in production code paths, but may allow us to simplify some test contexts. For working directory iterators, apply filters as appropriate, since we have the context able to do it.
Patrick Steinhardt ecf4f33a 2018-02-08T11:14:48 Convert usage of `git_buf_free` to new `git_buf_dispose`
Tomás Pollak 054e4c08 2018-01-31T14:28:25 Set ctime/mtime nanosecs to 0 if USE_NSEC is not defined
Tomás Pollak 752006dd 2018-01-30T23:21:19 Honor 'GIT_USE_NSEC' option in `filesystem_iterator_set_current` This should have been part of PR #3638. Without this we still get nsec-related errors, even when using -DGIT_USE_NSEC: error: ‘struct stat’ has no member named ‘st_mtime_nsec’
Edward Thomson 9e94b6af 2017-12-30T00:12:46 iterator: cleanups with symlink dir handling Perform some error checking when examining symlink directories.
Andy Doan e9628e7b 2017-10-30T11:38:33 branches: Check symlinked subdirectories Native Git allows symlinked directories under .git/refs. This change allows libgit2 to also look for references that live under symlinked directories. Signed-off-by: Andy Doan <andy@opensourcefoundries.com>
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Patrick Steinhardt c77a55a9 2016-11-14T10:05:31 common: use PRIuZ for size_t in `giterr_set` calls
Jason Haslam 7a3f1de5 2016-08-22T09:27:47 filesystem_iterator: fixed double free on error
Edward Thomson db22a91b 2016-04-21T10:58:22 iterator: ignore submodule in has_ended
Edward Thomson d47f7e1c 2016-04-02T13:03:09 iterator: support trailing `/` in start for submod Allow callers to specify a start path with a trailing slash to match a submodule, instead of just a directory. This is for some legacy behavior that's sort of dumb, but there it is.
Carlos Martín Nieto 1cac688d 2016-04-01T00:29:51 Merge pull request #3719 from libgit2/ethomson/submodule_status WD iterator: properly identify submodules
Edward Thomson 4df6ddaa 2016-03-31T15:05:34 iterator: use correct search function
Edward Thomson 97054833 2016-03-30T17:41:08 leaks: fix some iterator leaks
Carlos Martín Nieto f5c874a4 2016-03-29T14:47:31 Plug a few leaks
Marc Strapetz d6713ec6 2016-03-22T10:30:07 iterator: comment fixed
Marc Strapetz f4777058 2016-03-22T10:29:41 iterator: unused includes removed
Edward Thomson 9eb9e5fa 2016-03-21T17:19:24 iterator: cleanups Remove some unused functions, refactor some ugliness.
Edward Thomson 35877463 2016-03-21T17:03:00 iterator: refactor empty iterator to new style
Edward Thomson 247e3b43 2016-03-21T16:51:45 iterator: mandate `advance_over` Since the three iterators implement `advance_over` differently, mandate it and implement each.
Edward Thomson 0ef0b71c 2016-03-21T12:54:47 iterator: refactor index iterator
Edward Thomson 82a1aab6 2016-03-18T12:59:35 iterator: move the index into the iterator itself
Edward Thomson 4c88198a 2016-03-16T10:17:20 iterator: test that we're at the end of iteration Ensure that we have hit the end of iteration; previously we tested that we saw all the values that we expected to see. We did not then ensure that we were at the end of the iteration (and that there were subsequently values in the iteration that we did *not* expect.)
Edward Thomson 0e0589fc 2016-03-10T00:04:26 iterator: combine fs+workdir iterators more completely Drop some of the layers of indirection between the workdir and the filesystem iterators. This makes the code a little bit easier to follow, and reduces the number of unnecessary allocations a bit as well. (Prior to this, when we filter entries, we would allocate them, filter them and then free them; now we do the filtering before allocation.) Also, rename `git_iterator_advance_over_with_status` to just `git_iterator_advance_over`. Mostly because it's a fucking long-ass function name otherwise.
Edward Thomson be30387e 2016-02-25T16:05:18 iterators: refactored tree iterator Refactored the tree iterator to never recurse; simply process the next entry in order in `advance`. Additionally, reduce the number of allocations and sorting as much as possible to provide a ~30% speedup on case-sensitive iteration. (The gains for case-insensitive iteration are less majestic.)
Edward Thomson f0224772 2016-02-17T18:04:19 git_object_dup: introduce typesafe versions
Edward Thomson 684b35c4 2016-02-25T15:11:14 iterator: disambiguate reset and reset_range Disambiguate the reset and reset_range functions. Now reset_range with a NULL path will clear the start or end; reset will leave the existing start and end unchanged.
Edward Thomson ac05086c 2016-02-25T14:51:23 iterator: drop unused/unimplemented `seek`
Carlos Martín Nieto 60a194aa 2016-03-20T11:00:12 tree: re-use the id and filename in the odb object Instead of copying over the data into the individual entries, point to the originals, which are already in a format we can use.
Carlos Martín Nieto 594a5d12 2016-02-18T12:28:06 Merge pull request #3619 from ethomson/win32_forbidden win32: allow us to read indexes with forbidden paths on win32
Edward Thomson 4fea9cff 2016-02-16T13:08:55 iterator: assert tree_iterator has a frame Although a `tree_iterator` that failed to be properly created does not have a frame, all other `tree_iterator`s should. Do not call `pop` in the failure case, but assert that in all other cases there is a frame.
Colin Xu a218b2f6 2016-01-22T16:03:37 Validate pointer before access the member. When Git repository at network locations, sometimes git_iterator_for_tree fails at iterator__update_ignore_case so it goes to git_iterator_free. Null pointer will crash the process if not check. Signed-off-by: Colin Xu <colin.xu@gmail.com>
Arthur Schreiber 3679ebae 2016-02-11T23:37:52 Horrible fix for #3173.
Vicent Marti 1e5e02b4 2015-10-27T17:26:04 pool: Simplify implementation
Edward Thomson 26d7cf6e 2015-09-11T18:27:04 iterator: loop fs_iterator advance (don't recurse)
Edward Thomson a1859e21 2015-09-11T17:38:28 iterator: advance the tree iterator smartly While advancing the tree iterator, if we advance over things that we aren't interested in, then call `current`. Which may *itself* call advance. While advancing the tree iterator, if we advance over things that we aren't interested in, then call `current`. Which may *itself* call advance. While advancing the tree iterator, if we advance over things that we aren't interested in, then call `current`. Which may *itself* call advance. While advancing the tree iterator, if we advance over things that we aren't interested in, then call `current`. Which may *itself* call advance. While advancing the tree iterator, if we advance over things that we aren't interested in, then call `current`. Which may *itself* call advance. Error: stack overflow.
Edward Thomson d53c8880 2015-08-30T19:25:47 iterator: saner pathlist matching for idx iterator Some nicer refactoring for index iteration walks. The index iterator doesn't binary search through the pathlist space, since it lacks directory entries, and would have to binary search each index entry and all its parents (eg, when presented with an index entry of `foo/bar/file.c`, you would have to look in the pathlist for `foo/bar/file.c`, `foo/bar` and `foo`). Since the index entries and the pathlist are both nicely sorted, we walk the index entries in lockstep with the pathlist like we do for other iteration/diff/merge walks.
Edward Thomson 1af84271 2015-08-30T18:35:57 tree_iterator: use a pathlist
Edward Thomson 4a0dbeb0 2015-08-30T17:06:26 diff: use new iterator pathlist handling When using literal pathspecs in diff with `GIT_DIFF_DISABLE_PATHSPEC_MATCH` turn on the faster iterator pathlist handling. Updates iterator pathspecs to include directory prefixes (eg, `foo/`) for compatibility with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`.
Edward Thomson 6c9352bf 2015-08-28T18:30:39 iterator: sort subdirs properly with pathlist When given a pathlist, don't assume that directories sort before files. Walk through any list of entries sorting before us to make sure that we've exhausted all entries that *aren't* directories. Eg, if we're searching for 'foo/bar', and we have a 'foo.c', keep advancing the pathlist to keep looking for an entry prefixed with 'foo/'.
Edward Thomson ef206124 2015-07-28T19:55:37 Move filelist into the iterator handling itself.
Edward Thomson ed1c6446 2015-07-28T11:41:27 iterator: use an options struct instead of args
Edward Thomson ef4857c2 2015-08-03T16:50:27 errors: tighten up git_error_state OOMs a bit more When an error state is an OOM, make sure that we treat is specially and do not try to free it.
Carlos Martín Nieto 12786e0f 2015-07-26T17:19:22 iterator: skip over errors in diriter init An error here will typically mean that the directory was removed between the time we iterated the parent and the time we wanted to visit it in which case we should ignore it. Other kinds of errors such as permissions (or transient errors) also better dealt with by pretending we didn't see it.
Edward Thomson dd6b24b1 2015-07-02T10:36:15 iterator_walk: cast away constness for free
Edward Thomson ded4ccab 2015-06-29T15:16:22 iterator_walk: drop unused variable
Carlos Martín Nieto 24fa21f3 2015-06-26T18:59:53 index, iterator, fetchhead: plug leaks
Edward Thomson 8960dc1e 2015-06-24T18:10:30 iterator: provide git_iterator_walk Provide `git_iterator_walk` to walk each iterator in lockstep, returning each iterator's idea of the contents of the next path.
Carlos Martín Nieto ff475375 2015-06-17T14:34:10 diff: check files with the same or newer timestamps When a file on the workdir has the same or a newer timestamp than the index, we need to perform a full check of the contents, as the update of the file may have happened just after we wrote the index. The iterator changes are such that we can reach inside the workdir iterator from the diff, though it may be better to have an accessor instead of moving these structs into the header.
Carlos Martín Nieto 82a7a24c 2015-06-08T15:22:01 Merge pull request #3165 from ethomson/downcase Downcase
Edward Thomson 75a4636f 2015-05-29T16:56:38 git__tolower: a tolower() that isn't dumb Some brain damaged tolower() implementations appear to want to take the locale into account, and this may require taking some insanely aggressive lock on the locale and slowing down what should be the most trivial of trivial calls for people who just want to downcase ASCII.
Edward Thomson 9f545b9d 2015-05-19T11:23:59 introduce `git_index_entry_is_conflict` It's not always obvious the mapping between stage level and conflict-ness. More importantly, this can lead otherwise sane people to write constructs like `if (!git_index_entry_stage(entry))`, which (while technically correct) is unreadable. Provide a nice method to help avoid such messy thinking.
Edward Thomson aa3af01d 2015-05-13T15:52:21 index iterator: optionally include conflicts
Edward Thomson f63a1b72 2015-04-29T17:23:02 git_path_diriter: use FindFirstFile in win32 Using FindFirstFile and FindNextFile in win32 allows us to use the directory information that is returned, instead of us having to get the file attributes all over again, which is a distinct cost savings on win32.
Edward Thomson 5c387b6c 2015-04-29T14:31:59 git_path_diriter: next shouldn't take path ptr The _next method shouldn't take a path pointer (and a path_len pointer) as 100% of current users use the full path and ignore the filename. Plus let's add some docs and a unit test.
Edward Thomson 7ef005f1 2015-04-29T14:04:01 git_path_dirload_with_stat: moved to fs_iterator
Edward Thomson 35c1d207 2015-04-29T14:03:20 git_win32_path_dirload_with_stat: removed
J Wyman 1920ee4e 2015-03-26T18:10:24 Improvements to status performance on Windows. Changed win32/path_w32.c to utilize NTFS' FindFirst..FindNext data instead of doing an lstat per file. Avoiding unnecessary directory opens and file scans reduces IO, improving overall performance. Effect is magnified due to NTFS being a kernel mode file system (as opposed to user mode).
J Wyman 4c09e19a 2015-03-30T14:07:44 Improvements to ignore performance on Windows. Minimizing the number directory and file opens, minimizes the amount of IO thus reducing the overall cost of performing ignore operations.
Edward Thomson f1453c59 2015-02-12T12:19:37 Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
Edward Thomson 392702ee 2015-02-09T23:41:13 allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
Carlos Martín Nieto f7fcb18f 2014-11-23T14:12:54 Plug leaks Valgrind is now clean except for libssl and libgcrypt.
Carlos Martín Nieto 62a617dc 2014-11-06T16:16:46 iterator: submodules are determined by an index or tree We cannot know from looking at .gitmodules whether a directory is a submodule or not. We need the index or tree we are comparing against to tell us. Otherwise we have to assume the entry in .gitmodules is stale or otherwise invalid. Thus we pass the index of the repository into the workdir iterator, even if we do not want to compare against it. This follows what git does, which even for `git diff <tree>`, it will consider staged submodules as such.
Russell Belfer f554611a 2014-05-06T12:41:26 Improve checks for ignore containment The diff code was using an "ignored_prefix" directory to track if a parent directory was ignored that contained untracked files alongside tracked files. Unfortunately, when negative ignore rules were used for directories inside ignored parents, the wrong rules were applied to untracked files inside the negatively ignored child directories. This commit moves the logic for ignore containment into the workdir iterator (which is a better place for it), so the ignored-ness of a directory is contained in the frame stack during traversal. This allows a child directory to override with a negative ignore and yet still restore the ignored state of the parent when we traverse out of the child. Along with this, there are some problems with "directory only" ignore rules on container directories. Given "a/*" and "!a/b/c/" (where the second rule is a directory rule but the first rule is just a generic prefix rule), then the directory only constraint was having "a/b/c/d/file" match the first rule and not the second. This was fixed by having ignore directory-only rules test a rule against the prefix of a file with LEADINGDIR enabled. Lastly, spot checks for ignores using `git_ignore_path_is_ignored` were tested from the top directory down to the bottom to deal with the containment problem, but this is wrong. We have to test bottom to top so that negative subdirectory rules will be checked before parent ignore rules. This does change the behavior of some existing tests, but it seems only to bring us more in line with core Git, so I think those changes are acceptable.
Russell Belfer cd424ad5 2014-04-28T16:39:53 Add GIT_STATUS_OPT_UPDATE_INDEX and use trace API This adds an option to refresh the stat cache while generating status. It also rips out the GIT_PERF stuff I had an makes use of the trace API to keep statistics about what happens during diff.
Russell Belfer 9c8ed499 2014-04-29T15:05:58 Remove trace / add git_diff_perfdata struct + api
Russell Belfer b23b112d 2014-04-29T11:29:49 Add payloads, bitmaps to trace API This is a proposed adjustment to the trace APIs. This makes the trace levels into a bitmask so that they can be selectively enabled and adds a callback-level payload, plus a message-level payload. This makes it easier for me to a GIT_TRACE_PERF callbacks that are simply bypassed if the PERF level is not set.
Russell Belfer 240f4af3 2014-04-28T14:04:29 Add build option for diff internal statistics
Russell Belfer 8ef4e11a 2014-04-28T14:16:26 Skip diff oid calc when size definitely changed When we think the stat cache in the index seems valid and the size or mode of a file has definitely changed, then don't bother trying to recalculate the OID of the workdir bits to confirm that it is modified - just accept that it is modified. This can result in files that show as modified with no actual diff, but the behavior actually appears to match Git on the command line. This also includes a minor optimization to not perform a submodule lookup on the ".git" directory itself.
Russell Belfer a409acef 2014-04-24T11:59:50 Handle explicitly ignored dir slightly differently When considering status of untracked directories, if we find an explicitly ignored item, even if it is a directory, treat the parent as an IGNORED item. It was accidentally being treated as an EMPTY item because we were not looking into the ignored subdir.
Russell Belfer 219c89d1 2014-04-23T16:28:45 Treat ignored, empty, and untracked dirs different In the iterator, distinguish between ignores and empty directories so that diff and status can ignore empty directories, but checkout and stash can treat them as untracked items.
Russell Belfer 37da3685 2014-04-22T21:51:54 Make checkout match diff for untracked/ignored dir When diff finds an untracked directory, it emulates Git behavior by looking inside the directory to see if there are any untracked items inside it. If there are only ignored items inside the dir, then diff considers it ignored, even if there is no direct ignore rule for it. Checkout was not copying this behavior - when it found an untracked directory, it just treated it as untracked. Unfortunately, when combined with GIT_CHECKOUT_REMOVE_UNTRACKED, this made is seem that checkout (and stash, which uses checkout) was removing ignored items when you had only asked it to remove untracked ones. This commit moves the logic for advancing past an untracked dir while scanning for non-ignored items into an iterator helper fn, and uses that for both diff and checkout.
Russell Belfer 52bb0476 2014-03-14T13:53:15 Clean up index snapshot function naming Clear up some of the various "find" functions and the snapshot API naming to be things I like more.
Russell Belfer 3b4c401a 2014-02-10T13:20:08 Decouple index iterator sort from index This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the index data itself. To take advantage of this, I had to export a number of the internal index entry comparison functions. I also wrote some new tests to exercise the capability.
Russell Belfer 54edbb98 2014-02-07T16:48:27 Add index snapshot and use it for iterator
Russell Belfer 3dbee456 2014-02-07T14:10:35 Some index internals refactoring Again, laying groundwork for some index iterator changes, this contains a bunch of code refactorings for index internals that should make it easier down the line to add locking around index modifications. Also this removes the redundant prefix_position function and fixes some potential memory leaks.
Russell Belfer 7dcd42a5 2014-03-31T13:31:01 Cleanups
Russell Belfer c856f8c5 2014-03-31T12:27:05 Fix submodule sorting in workdir iterator With the changes to how git_path_dirload_with_stat handles things that look like submodules, submodules could end up sorted in the wrong order with the workdir iterator. This moves the submodule check earlier in the iterator processing of a new directory so that the submodule name updates will happen immediately and the sort order will be correct.
Russell Belfer d3bc95fd 2014-03-25T12:37:05 Update behavior for untracked sub-repos When a directory containing a .git directory (or even just a plain gitlink) was found, libgit2 was going out of its way to treat it specially. This seemed like it was necessary because the diff code was not originally emulating Git's behavior for untracked directories correctly (i.e. scanning for ignored vs untracked items inside). Now that libgit2 diff mimics Git's untracked directory behavior, the special handling for contained Git repos is actually incorrect and this commit rips it out.
Carlos Martín Nieto d541170c 2014-01-24T11:36:41 index: rename an entry's id to 'id' This was not converted when we converted the rest, so do it now.
Russell Belfer 9cfce273 2013-12-12T12:11:38 Cleanups, renames, and leak fixes This renames git_vector_free_all to the better git_vector_free_deep and also contains a couple of memory leak fixes based on valgrind checks. The fixes are specifically: failure to free global dir path variables when not compiled with threading on and failure to free filters from the filter registry that had not be initialized fully.
Russell Belfer fcd324c6 2013-12-06T15:04:31 Add git_vector_free_all There are a lot of places that we call git__free on each item in a vector and then call git_vector_free on the vector itself. This just wraps that up into one convenient helper function.