src/tree.c


Log

Author Commit Date CI Message
Patrick Steinhardt 06abbb7f 2017-03-27T13:14:48 treebuilder: exit early if running OOM in `write_with_buffer` While writing the tree inside of a buffer, we check whether the buffer runs out of memory after each tree entry. While we set the error code as soon as we detect the OOM situation, we happily proceed iterating over the entries. This is not useful at all, as we will try to write into the buffer repeatedly, which cannot work. Fix this by exiting as soon as we are OOM.
Patrick Steinhardt 8d1e71f5 2017-03-27T13:14:05 treebuilder: remove shadowing variable in `write_with_buffer` The `git_tree_entry *entry` variable is defined twice inside of this function. While this is not a problem currently, remove the shadowing variable to avoid future confusion.
Patrick Steinhardt 4f9327fa 2017-03-27T13:11:38 treebuilder: fix memory leaks in `write_with_buffer` While we detect errors in `git_treebuilder_write_with_buffer`, we just exit directly instead of freeing allocated memory. Fix this by remembering error codes and skipping forward to the function's cleanup code.
Patrick Steinhardt 13c3bc9a 2017-01-27T14:32:23 strmap: remove GIT__USE_STRMAP macro
Patrick Steinhardt 73028af8 2017-01-27T14:20:24 khash: avoid using macro magic to get return address
Edward Thomson 44e8af8f 2017-01-21T22:51:50 Merge pull request #3892 from mitesch/shared_buffer Use a shared buffer in calls of git_treebuilder_write to avoid heap contention
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Michael Tesch 87aaefe2 2016-08-09T12:23:19 write_tree: use shared buffer for writing trees The function to write trees allocates a new buffer for each tree. This causes problems with performance when performing a lot of actions involving writing trees, e.g. when doing many merges. Fix the issue by instead handing in a shared buffer, which is then re-used across the calls without having to re-allocate between calls.
Carlos Martín Nieto 89776585 2016-11-14T12:44:52 tree: look for conflicts in the new tree when updating We look at whether we're trying to replace a blob with a tree during the update phase, but we fail to look at whether we've just inserted a blob where we're now trying to insert a tree. Update the check to look at both places. The test for this was previously succeeding due to the bu where we did not look at the sorted output.
Carlos Martín Nieto b85929c5 2016-11-14T12:44:01 tree: use the sorted update list in our loop The loop is made with the assumption that the inputs are sorted and not using it leads to bad outputs.
Patrick Steinhardt 901434b0 2016-11-14T10:07:37 common: cast precision specifiers to int
Patrick Steinhardt 4974e3a5 2016-10-07T09:18:55 tree: validate filename and OID length when parsing object When parsing tree entries from raw object data, we do not verify that the tree entry actually has a filename as well as a valid object ID. Fix this by asserting that the filename length is non-zero as well as asserting that there are at least `GIT_OID_RAWSZ` bytes left when parsing the OID.
Edward Thomson fdf14637 2016-05-26T00:58:43 Merge pull request #3792 from edquist/misc Fix comment for GIT_FILEMODE_LINK
Carlos Martín Nieto a2cb4713 2016-05-24T14:30:43 tree: handle removal of all entries in the updater When we remove all entries in a tree, we should remove that tree from its parent rather than include the empty tree.
Carlos Martín Nieto 53412305 2016-05-19T15:29:53 tree: plug leaks in the tree updater
Carlos Martín Nieto 6ee08d2c 2016-05-19T15:22:02 tree: use the basename for the entry removal When we want to remove the file, use the basename as the name of the entry to remove, instead of the full one, which includes the directories we've inserted into the stack.
Carl Edquist c8fb2e15 2016-05-18T16:00:01 Fix comment for GIT_FILEMODE_LINK 0120000 is symbolic link, not commit
Carlos Martín Nieto 9464f9eb 2016-05-02T17:36:58 Introduce a function to create a tree based on a different one Instead of going through the usual steps of reading a tree recursively into an index, modifying it and writing it back out as a tree, introduce a function to perform simple updates more efficiently. `git_tree_create_updated` avoids reading trees which are not modified and supports upsert and delete operations. It is not as versatile as modifying the index, but it makes some common operations much more efficient.
Carlos Martín Nieto f5c874a4 2016-03-29T14:47:31 Plug a few leaks
Edward Thomson e2e4bae9 2016-03-22T00:18:44 tree: drop the now-unnecessary entries vector Remove the now-unnecessary entries vector. Add `git_array_search` to binary search through an array to accomplish this.
Carlos Martín Nieto 4ed9e939 2016-03-20T12:01:45 tree: store the entries in a growable array Take advantage of the constant size of tree-owned arrays and store them in an array instead of a pool. This still lets us free them all at once but lets the system allocator do the work of fitting them in.
Carlos Martín Nieto 60a194aa 2016-03-20T11:00:12 tree: re-use the id and filename in the odb object Instead of copying over the data into the individual entries, point to the originals, which are already in a format we can use.
Carlos Martín Nieto ea5bf6bb 2016-03-04T12:34:38 treebuilder: don't try to verify submodules exist in the odb Submodules don't exist in the objectdb and the code is making us try to look for a blob with its commit id, which is obviously not going to work. Skip the test if the user wants to insert a submodule.
Edward Thomson 2bbc7d3e 2016-02-23T15:00:27 treebuilder: validate tree entries (optionally) When `GIT_OPT_ENABLE_STRICT_OBJECT_CREATION` is turned on, validate the tree and parent ids given to treebuilder insertion.
Edward Thomson aadad405 2016-02-11T14:28:31 tree: zap warnings around `size_t` vs `uint16_t`
Carlos Martín Nieto fc436469 2015-12-06T22:51:00 tree: mark a tree as already sorted The trees are sorted on-disk, so we don't have to go over them again. This cuts almost a fifth of time spent parsing trees.
Carlos Martín Nieto 0174f21b 2015-12-02T18:56:31 tree: use a specialised mode parse function Instead of going out to strtol, which is made to parse generic numbers, copy a parse function from git which is specialised for file modes.
Patrick Steinhardt 9487585d 2015-12-01T14:19:29 tree: mark cloned tree entries as un-pooled When duplicating a `struct git_tree_entry` with `git_tree_entry_dup` the resulting structure is not allocated inside a memory pool. As we do a 1:1 copy of the original struct, though, we also copy the `pooled` field, which is set to `true` for pooled entries. This results in a huge memory leak as we never free tree entries that were duplicated from a pooled tree entry. Fix this by marking the newly duplicated entry as un-pooled.
Carlos Martín Nieto 95ae3520 2015-11-30T17:32:18 tree: ensure the entry filename fits in 16 bits Return an error in case the length is too big. Also take this opportunity to have a single allocating function for the size and overflow logic.
Carlos Martín Nieto ee42bb0e 2015-11-28T19:18:29 tree: make path len uint16_t and avoid holes This reduces the size of the struct from 32 to 26 bytes, and leaves a single padding byte at the end of the struct (which comes from the zero-length array).
Carlos Martín Nieto 2580077f 2015-11-15T00:44:02 tree: calculate the filename length once We already know the size due to the `memchr()` so use that information instead of calling `strlen()` on it.
Carlos Martín Nieto ed970748 2015-11-14T23:50:06 tree: pool the entry memory allocations These are rather small allocations, so we end up spending a non-trivial amount of time asking the OS for memory. Since these entries are tied to the lifetime of their tree, we can give the tree a pool so we speed up the allocations.
Carlos Martín Nieto 7132150d 2015-11-14T23:46:21 tree: avoid advancing over the filename multiple times We've already looked at the filename with `memchr()` and then used `strlen()` to allocate the entry. We already know how much we have to advance to get to the object id, so add the filename length instead of looking at each byte again.
Carlos Martín Nieto 84511143 2015-03-12T01:49:07 tree: add more correct error messages for not found Don't use the full path, as that's not what we are asserting does not exist, but just the subpath we were looking up.
Stefan Widgren c8e02b87 2015-02-15T21:07:05 Remove extra semicolon outside of a function Without this change, compiling with gcc and pedantic generates warning: ISO C does not allow extra ‘;’ outside of a function.
Edward Thomson f1453c59 2015-02-12T12:19:37 Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
Edward Thomson 2884cc42 2015-02-11T09:39:38 overflow checking: don't make callers set oom Have the ALLOC_OVERFLOW testing macros also simply set_oom in the case where a computation would overflow, so that callers don't need to.
Edward Thomson 392702ee 2015-02-09T23:41:13 allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
Carlos Martín Nieto 208a2c8a 2014-12-27T12:09:11 treebuilder: rename _create() to _new() This function is a constructor, so let's name it like one and leave _create() for the reference functions, which do create/write the reference.
Edward Thomson dce7b1a4 2014-12-16T19:24:04 treebuilder: take a repository for path validation Path validation may be influenced by `core.protectHFS` and `core.protectNTFS` configuration settings, thus treebuilders can take a repository to influence their configuration.
Vicent Marti 62155257 2014-11-25T00:14:52 tree: Check for `.git` with case insensitivy
Carlos Martín Nieto 7465e873 2014-09-29T09:07:41 index: fill the tree cache on write-tree An obvious place to fill the tree cache is on write-tree, as we're guaranteed to be able to fill in the whole tree cache. The way this commit does this is not the most efficient, as we read the root tree from the odb instead of filling in the cache as we go along, but it fills the cache such that successive operations (and persisting the index to disk) will be able to take advantage of the cache, and it reuses the code we already have for filling the cache. Filling in the cache as we create the trees would require some reallocation of the children vector, which is currently not possible with out pool implementation. A different data structure would likely allow us to perform this operation at a later date.
Carlos Martín Nieto c2f8b215 2014-09-28T07:00:49 index: write out the tree cache extension Keeping the cache around after read-tree is only one part of the optimisation opportunities. In order to share the cache between program instances, we need to write the TREE extension to the index. Do so, taking the opportunity to rename 'entries' to 'entry_count' to match the name given in the format description. The included test is rather trivial, but works as a sanity check.
Carlos Martín Nieto 966fb207 2014-06-25T21:25:44 tree: free in error conditions As reported by coverity, we would leak some memory in error conditions.
Carlos Martín Nieto fcc60066 2014-06-09T22:59:32 treentry: no need for manual size book-keeping We can simply ask the hasmap.
Carlos Martín Nieto 978fbb4c 2014-06-09T22:45:23 treebuilder: don't keep removed entries around If the user wants to keep a copy for themselves, they should make a copy. It adds unnecessary complexity to make sure the returned entries are valid until the builder is cleared.
Carlos Martín Nieto 4d3f1f97 2014-06-09T04:38:22 treebuilder: use a map instead of vector to store the entries Finding a filename in a vector means we need to resort it every time we want to read from it, which includes every time we want to write to it as well, as we want to find duplicate keys. A hash-map fits what we want to do much more accurately, as we do not care about sorting, but just the particular filename. We still keep removed entries around, as the interface let you assume they were going to be around until the treebuilder is cleared or freed, but in this case that involves an append to a vector in the filter case, which can now fail. The only time we care about sorting is when we write out the tree, so let's make that the only time we do any sorting.
Carlos Martín Nieto 2c11d2ee 2014-06-09T23:23:53 treebuilder: insert sorted By inserting in the right position, we can keep the vector sorted, making entry insertion almost twice as fast.
Russell Belfer 882c7742 2014-02-04T10:01:37 Convert pqueue to just be a git_vector This updates the git_pqueue to simply be a set of specialized init/insert/pop functions on a git_vector. To preserve the pqueue feature of having a fixed size heap, I converted the "sorted" field in git_vectors to a more general "flags" field so that pqueue could mix in it's own flag. This had a bunch of ramifications because a number of places were directly looking at the vector "sorted" field - I added a couple new git_vector helpers (is_sorted, set_sorted) so the specific representation of this information could be abstracted.
Carlos Martín Nieto f000ee4e 2014-01-24T18:23:46 tree: remove legacy 'oid' naming Rename git_tree_entry_byoid() to _byid() as per the convention.
Carlos Martín Nieto d541170c 2014-01-24T11:36:41 index: rename an entry's id to 'id' This was not converted when we converted the rest, so do it now.
Arthur Schreiber 529f342a 2014-01-14T21:33:59 Align git_tree_entry_dup.
Russell Belfer 26c1cb91 2013-12-09T09:44:03 One more rename/cleanup for callback err functions
Russell Belfer 25e0b157 2013-12-06T15:07:57 Remove converting user error to GIT_EUSER This changes the behavior of callbacks so that the callback error code is not converted into GIT_EUSER and instead we propagate the return value through to the caller. Instead of using the giterr_capture and giterr_restore functions, we now rely on all functions to pass back the return value from a callback. To avoid having a return value with no error message, the user can call the public giterr_set_str or some such function to set an error message. There is a new helper 'giterr_set_callback' that functions can invoke after making a callback which ensures that some error message was set in case the callback did not set one. In places where the sign of the callback return value is meaningful (e.g. positive to skip, negative to abort), only the negative values are returned back to the caller, obviously, since the other values allow for continuing the loop. The hardest parts of this were in the checkout code where positive return values were overloaded as meaningful values for checkout. I fixed this by adding an output parameter to many of the internal checkout functions and removing the overload. This added some code, but it is probably a better implementation. There is some funkiness in the network code where user provided callbacks could be returning a positive or a negative value and we want to rely on that to cancel the loop. There are still a couple places where an user error might get turned into GIT_EUSER there, I think, though none exercised by the tests.
Russell Belfer dab89f9b 2013-12-04T21:22:57 Further EUSER and error propagation fixes This continues auditing all the places where GIT_EUSER is being returned and making sure to clear any existing error using the new giterr_user_cancel helper. As a result, places that relied on intercepting GIT_EUSER but having the old error preserved also needed to be cleaned up to correctly stash and then retrieve the actual error. Additionally, as I encountered places where error codes were not being propagated correctly, I tried to fix them up. A number of those fixes are included in the this commit as well.
Carlos Martín Nieto 13f670a5 2013-04-15T09:07:57 tree: allow retrieval of raw attributes When a tool needs to recreate the tree object (for example an interface to another VCS), it needs to use the raw attributes, forgoing any normalization.
wilke d7fc2eb2 2013-09-13T21:36:39 Fix memory leak in git_tree_walk on error or when stopping the walk from the supplied callback
wilke 4e01e302 2013-09-13T21:21:33 Prevent git_tree_walk 'skip entry' callback return code from leaking through as the return value of git_tree_walk
Russell Belfer a7fcc44d 2013-09-05T16:14:32 Better macro name for is-exec-bit-set test
Russell Belfer f240acce 2013-09-05T11:20:12 Add more file mode permissions macros This adds some more macros for some standard operations on file modes, particularly related to permissions, and then updates a number of places around the code base to use the new macros.
Russell Belfer 114f5a6c 2013-06-10T10:10:39 Reorganize diff and add basic diff driver This is a significant reorganization of the diff code to break it into a set of more clearly distinct files and to document the new organization. Hopefully this will make the diff code easier to understand and to extend. This adds a new `git_diff_driver` object that looks of diff driver information from the attributes and the config so that things like function content in diff headers can be provided. The full driver spec is not implemented in the commit - this is focused on the reorganization of the code and putting the driver hooks in place. This also removes a few #includes from src/repository.h that were overbroad, but as a result required extra #includes in a variety of places since including src/repository.h no longer results in pulling in the whole world.
Russell Belfer 58206c9a 2013-05-16T10:38:27 Add cat-file example and increase const use in API This adds an example implementation that emulates git cat-file. It is a convenient and relatively simple example of getting data out of a repository. Implementing this also revealed that there are a number of APIs that are still not using const pointers to objects that really ought to be. The main cause of this is that `git_vector_bsearch` may need to call `git_vector_sort` before doing the search, so a const pointer to the vector is not allowed. However, for tree objects, with a little care, we can ensure that the vector of tree entries is always sorted and allow lookups to take a const pointer. Also, the missing const in commit objects just looks like an oversight.
Russell Belfer b60d95c7 2013-05-01T15:55:54 clarify error propogation
Vicent Marti 0b726701 2013-04-30T13:13:38 object: Explicitly define helper API methods for all obj types
Russell Belfer 203d5b0e 2013-04-29T18:20:58 Some cleanups Removed useless prototype and renamed object typecast functions declaration macro.
Russell Belfer d7761102 2013-04-29T14:22:06 Standardize cast versions of git_object accessors This removes the GIT_INLINE versions of the simple git_object accessors and standardizes them with a helper macro in src/object.h to build the function bodies.
Russell Belfer 116bbdf0 2013-04-16T12:08:21 clean up tree pointer casting
Russell Belfer 3f27127d 2013-04-16T11:51:02 Simplify object table parse functions This unifies the object parse functions into one signature that takes an odb_object.
Russell Belfer 78606263 2013-04-15T00:05:44 Add callback to git_objects_table This adds create and free callback to the git_objects_table so that more of the creation and destruction of objects can be table driven instead of using switch statements. This also makes the semantics of certain object creation functions consistent so that we can make better use of function pointers. This also fixes a theoretical error case where an object allocation fails and we end up storing NULL into the cache.
Russell Belfer badd85a6 2013-04-10T17:10:17 Use git_odb_object_data/_size whereever possible This uses the odb object accessors so we can change the internals more easily...
Vicent Marti 8842c75f 2013-04-03T22:30:07 What has science done.
Carlos Martín Nieto f90391ea 2013-04-18T14:47:54 treebuilder: don't overwrite the error message
Russell Belfer 0c468633 2013-03-14T13:40:15 Improved tree iterator internals This updates the tree iterator internals to be more efficient. The tree_iterator_entry objects are now kept as pointers that are allocated from a git_pool, so that we may use git__tsort_r for sorting (which is better than qsort, given that the tree is likely mostly ordered already). Those tree_iterator_entry objects now keep direct pointers to the data they refer to instead of keeping indirect index values. This simplifies a lot of the data structure traversal code. This also adds bsearch to find the start item position for range- limited tree iterators, and is more explicit about using git_path_cmp instead of reimplementing it. The git_path_cmp changed a bit to make it easier for tree_iterators to use it (but it was barely being used previously, so not a big deal). This adds a git_pool_free_array function that efficiently frees a list of pool allocated pointers (which the tree_iterator keeps). Also, added new tests for the git_pool free list functionality that was not previously being tested (or used).
Philip Kelley cb53669e 2013-03-01T16:38:13 Rename function to __ prefix
Philip Kelley 3f0d0c85 2013-03-01T15:44:18 Disable ignore_case when writing the index to a tree
Russell Belfer e2237179 2013-02-20T10:58:56 Some code cleanups in tree.c This replaces most of the explicit vector iteration with calls to git_vector_foreach, adds in some git__free and giterr_clear calls to clean up during some error paths, and a couple of other code simplifications.
Russell Belfer 93ab370b 2013-02-20T10:50:01 Store treebuilder length separately from entries vec The treebuilder entries vector flags removed items which means we can't rely on the entries vector length to accurately get the number of entries. This adds an entrycount value and maintains it while updating the treebuilder entries.
nulltoken 3ad05221 2013-02-05T16:52:56 Fix MSVC compilation warnings Fix #1308
Russell Belfer 4657fc1c 2013-01-29T13:54:08 Merge pull request #1285 from phkelley/vector Vector improvements and their fallout
John Wiegley 5fb98206 2013-01-28T15:56:04 Added git_treebuilder_entrycount Conflicts: src/tree.c
Philip Kelley 11d9f6b3 2013-01-27T14:17:07 Vector improvements and their fallout
Russell Belfer 98527b5b 2013-01-09T16:03:35 Add git_tree_entry_cmp and git_tree_entry_icmp This adds a new external API git_tree_entry_cmp and a new internal API git_tree_entry_icmp for sorting tree entries. The case insensitive one is internal only because general users should never be seeing case-insensitively sorted trees.
Edward Thomson 359fc2d2 2013-01-08T17:07:25 update copyrights
Russell Belfer 91e7d263 2012-12-10T15:29:44 Fix iterator reset and add reset ranges The `git_iterator_reset` command has not been working in all cases particularly when there is a start and end range. This fixes it and adds tests for it, and also extends it with the ability to update the start/end range strings when an iterator is reset.
Russell Belfer 9950d27a 2012-12-06T13:26:58 Clean up iterator APIs This removes the need to explicitly pass the repo into iterators where the repo is implied by the other parameters. This moves the repo to be owned by the parent struct. Also, this has some iterator related updates to the internal diff API to lay the groundwork for checkout improvements.
Carlos Martín Nieto f1c75b94 2012-12-07T15:16:41 tree: relax the filemode parser There are many different broken filemodes in the wild so we need to protect against them and give something useful up the chain. Don't fail when reading a tree from the ODB but normalize the mode as best we can. As 664 is no longer a mode that we consider to be valid and gets normalized to 644, we can stop accepting it in the treebuilder. The library won't expose it to the user, so any invalid modes are a bug.
Vicent Martí e2934db2 2012-11-29T02:05:46 Merge pull request #1090 from arrbee/ignore-invalid-by-default Ignore invalid entries by default
Russell Belfer 16248ee2 2012-11-21T11:03:07 Fix up some missing consts in tree & index This fixes some missed places where we can apply const-ness to various public APIs. There are still some index and tree APIs that cannot take const pointers because we sort our `git_vectors` lazily and so we can't reliably bsearch the index and tree content without applying a `git_vector_sort()` first. This also fixes some missed places where size_t can be used and where const can be applied to a couple internal functions.
Russell Belfer a8122b5d 2012-11-21T15:39:03 Fix warnings on Win64 build
Ben Straub f45d51ff 2012-11-20T19:57:46 API updates for index.h
Russell Belfer e120123e 2012-11-20T14:01:46 API review / update for tree.h
Russell Belfer cfeef7ce 2012-11-19T13:40:08 Minor optimization to tree entry validity check This checks for a leading '.' before looking for the invalid tree entry names. Even on pretty high levels of optimization, this seems to make a measurable improvement. I accidentally used && in the check initially instead of || and while debugging ended up improving the error reporting of issues with adding tree entries. I thought I'd leave those changes, too.
Scott J. Goldman 0d778b1a 2012-11-18T16:52:04 Catch invalid filenames in append_entry() This prevents the index api from calling write_tree() with a bogus tree.
Scott J. Goldman 19af78bb 2012-11-18T15:15:24 Prevent creating `..`, `.`, and `.git` with tree builder As per core git.
nulltoken f92bcaea 2012-11-08T17:39:23 index: prevent tree creation from a non merged state Fix libgit2/libgit2sharp#243
Vicent Marti 43eeca04 2012-11-01T20:24:43 index: Fix tests
Vicent Marti 276ea401 2012-11-01T20:15:53 index: Add git_index_write_tree
Edward Thomson f45ec1a0 2012-10-29T20:04:21 index refactoring
Russell Belfer 0d64bef9 2012-10-05T15:56:57 Add complex checkout test and then fix checkout This started as a complex new test for checkout going through the "typechanges" test repository, but that revealed numerous issues with checkout, including: * complete failure with submodules * failure to create blobs with exec bits * problems when replacing a tree with a blob because the tree "example/" sorts after the blob "example" so the delete was being processed after the single file blob was created This fixes most of those problems and includes a number of other minor changes that made it easier to do that, including improving the TYPECHANGE support in diff/status, etc.
nulltoken 9d7ac675 2012-08-21T11:45:16 tree entry: rename git_tree_entry_attributes() into git_tree_entry_filemode()