src/tree.c


Log

Author Commit Date CI Message
Luc Bertrand 8f643ce8 2011-08-03T13:44:28 Remove duplicated sort
Kirill A. Shutemov 0cbbdc26 2011-07-15T17:56:48 tree: fix cast warnings /home/kas/git/public/libgit2/src/tree.c: In function ‘entry_search_cmp’: /home/kas/git/public/libgit2/src/tree.c:47:36: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual] /home/kas/git/public/libgit2/src/tree.c: In function ‘git_treebuilder_remove’: /home/kas/git/public/libgit2/src/tree.c:443:31: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual] Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
nulltoken f4ad64c1 2011-07-13T07:58:17 tree: fix insertion of entries with invalid filenames
Vicent Marti e6629d83 2011-07-13T03:36:03 tree: More accurate matching on entries The old matcher was returning fake matches when given stupid entry names. E.g. `git2` could be matched by `git2 /`, `git2/foobar`, git2/////` and other stupid stuff
Vicent Marti 761aa2aa 2011-07-13T02:49:47 tree: Fix wrong sort order when querying entries Fixes #127 (that was quite an outstanding issue). Rationale: The tree objects on Git are stored and read following a very specific sorting algorithm that places folders before files. That original sort was the sort we were storing on memory, but this sort was being queried with a binary search that used a simple `strcmp` for comparison, so there were many instances where the search was failing. Obviously, the most straightforward way to fix this is changing the binary search CB to use the same comparison method as the sorting CB. The problem with this is that the binary search callback compares a path and an entry, so there is no way to know if the given path is a folder or a standard file. How do we work around this? Instead of splitting the `entry_byname` method in two (one for searching directories and one for searching normal files), we just assume that the path we are searching for is of the same kind as the path it's being compared at the moment. return git_futils_cmp_path( ksearch->filename, ksearch->filename_len, entry->attr & 040000, entry->filename, entry->filename_len, entry->attr & 040000); Since there cannot be a folder and a regular file with the same name on the same tree, the most basic equality check will always fail for all comparsions, until our path is compared with the actual entry we are looking for; in this case, the matching will succeed with the file type of the entry -- whatever it was initially. I hope that makes sense. PS: While I was at it, I switched the cmp methods to use cached values for the length of each filename. That makes searches and sorts retardedly fast -- I was wondering the reason of the performance hiccups on massive trees; it's because of 2*strlen for each comparsion call.
Vicent Marti afeecf4f 2011-07-09T02:10:46 odb: Direct writes are back DIRECT WRITES ARE BACK AND FASTER THAN EVER. The streaming writer to the ODB was an overkill for the smaller objects like Commit and Tags; most of the streaming logic was taking too long. This commit makes Commits, Tags and Trees to be built-up in memory, and then written to disk in 2 pushes (header + data), instead of streaming everything. This is *always* faster, even for big files (since the git_filebuf class still does streaming writes when the memory cache overflows). This is also a gazillion lines of code smaller, because we don't have to precompute the final size of the object before starting the stream (this was kind of defeating the point of streaming, anyway). Blobs are still written with full streaming instead of loading them in memory, since this is still the fastest way. A new `git_buf` class has been added. It's missing some features, but it'll get there.
Vicent Marti de18f276 2011-07-07T01:46:20 vector: Timsort all of the things Drop the GLibc implementation of Merge Sort and replace it with Timsort. The algorithm has been tuned to work on arrays of pointers (void **), so there's no longer a need to abstract the byte-width of each element in the array. All the comparison callbacks now take pointers-to-elements, not pointers-to-pointers, so there's now one less level of dereferencing. E.g. int index_cmp(const void *a, const void *b) { - const git_index_entry *entry_a = *(const git_index_entry **)(a); + const git_index_entry *entry_a = (const git_index_entry *)(a); The result is up to a 40% speed-up when sorting vectors. Memory usage remains lineal. A new `bsearch` implementation has been added, whose callback also supplies pointer-to-elements, to uniform the Vector API again.
Vicent Marti f79026b4 2011-07-04T11:43:34 fileops: Cleanup Cleaned up the structure of the whole OS-abstraction layer. fileops.c now contains a set of utility methods for file management used by the library. These are abstractions on top of the original POSIX calls. There's a new file called `posix.c` that contains emulations/reimplementations of all the POSIX calls the library uses. These are prefixed with `p_`. There's a specific posix file for each platform (win32 and unix). All the path-related methods have been moved from `utils.c` to `path.c` and have their own prefix.
Kirill A. Shutemov 932d1baf 2011-06-30T19:52:34 cleanup: remove trailing spaces Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Vicent Marti fa48608e 2011-06-16T02:36:21 oid: Rename methods Yeah. Finally. Fuck the old names, this ain't POSIX and they don't make any sense at all.
Vicent Martí 1097dacd 2011-06-06T18:33:38 Merge pull request #240 from Romain-Geissler/tree-object-type Tree: Added a function that returns the type of a tree entry.
Romain Geissler ff9a4c13 2011-06-06T17:14:30 Tree: Added a function that returns the type of a tree entry.
Romain Geissler c5d8745f 2011-06-06T10:55:54 Tree: Some more size_t to unsigned int type change.
Romain Geissler e5c80097 2011-06-05T21:18:05 Tree: API uniformasation: Use unsigned int for all index number.
Jakob Pfender bc06a4ee 2011-05-19T15:10:54 tree.c: Move to new error handling mechanism
schu d6de92b6 2011-05-11T12:40:04 Move tree.c to the new error handling Signed-off-by: schu <schu-github@schulog.org>
Sergey Nikishin 555ce568 2011-04-26T13:22:45 Fix tree-entry attribute convertion (fix corrupted trees) Magic constant replaced by direct to-string covertion because of: 1) with value length 6 (040000 - subtree) final tree will be corrupted; 2) for wrong values length <6 final tree will be corrupted too.
Vicent Marti c6e65aca 2011-04-09T15:22:11 Properly check `strtol` for errors We are now using a custom `strtol` implementation to make sure we're not missing any overflow errors.
Shuhei Tanuma 98ac6780 2011-04-06T02:22:24 fix git_treebuilder_insert probrem. couldn't add new entry when inserting new one with `git_treebuilder_insert`.
Vicent Marti 0ad6efa1 2011-04-04T19:24:19 Build & write custom trees in memory
Vicent Marti 29e1789b 2011-04-04T12:14:03 Fix the git_tree_write implementation
Sarath Lakshman 47d8ec56 2011-04-03T17:18:56 New external API method: `git_tree_create` Creates a tree by scanning the index file. The method handles recursive creation of trees for subdirectories and adds them to the parent tree.
Vicent Marti 720d5472 2011-04-02T12:42:04 Change `parse` methods to const buffer Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 72a3fe42 2011-03-18T19:38:49 I broke your bindings Hey. Apologies in advance -- I broke your bindings. This is a major commit that includes a long-overdue redesign of the whole object-database structure. This is expected to be the last major external API redesign of the library until the first non-alpha release. Please get your bindings up to date with these changes. They will be included in the next minor release. Sorry again! Major features include: - Real caching and refcounting on parsed objects - Real caching and refcounting on objects read from the ODB - Streaming writes & reads from the ODB - Single-method writes for all object types - The external API is now partially thread-safe The speed increases are significant in all aspects, specially when reading an object several times from the ODB (revwalking) and when writing big objects to the ODB. Here's a full changelog for the external API: blob.h ------ - Remove `git_blob_new` - Remove `git_blob_set_rawcontent` - Remove `git_blob_set_rawcontent_fromfile` - Rename `git_blob_writefile` -> `git_blob_create_fromfile` - Change `git_blob_create_fromfile`: The `path` argument is now relative to the repository's working dir - Add `git_blob_create_frombuffer` commit.h -------- - Remove `git_commit_new` - Remove `git_commit_add_parent` - Remove `git_commit_set_message` - Remove `git_commit_set_committer` - Remove `git_commit_set_author` - Remove `git_commit_set_tree` - Add `git_commit_create` - Add `git_commit_create_v` - Add `git_commit_create_o` - Add `git_commit_create_ov` tag.h ----- - Remove `git_tag_new` - Remove `git_tag_set_target` - Remove `git_tag_set_name` - Remove `git_tag_set_tagger` - Remove `git_tag_set_message` - Add `git_tag_create` - Add `git_tag_create_o` tree.h ------ - Change `git_tree_entry_2object`: New signature is `(git_object **object_out, git_repository *repo, git_tree_entry *entry)` - Remove `git_tree_new` - Remove `git_tree_add_entry` - Remove `git_tree_remove_entry_byindex` - Remove `git_tree_remove_entry_byname` - Remove `git_tree_clearentries` - Remove `git_tree_entry_set_id` - Remove `git_tree_entry_set_name` - Remove `git_tree_entry_set_attributes` object.h ------------ - Remove `git_object_new - Remove `git_object_write` - Change `git_object_close`: This method is now *mandatory*. Not closing an object causes a memory leak. odb.h ----- - Remove type `git_rawobj` - Remove `git_rawobj_close` - Rename `git_rawobj_hash` -> `git_odb_hash` - Change `git_odb_hash`: New signature is `(git_oid *id, const void *data, size_t len, git_otype type)` - Add type `git_odb_object` - Add `git_odb_object_close` - Change `git_odb_read`: New signature is `(git_odb_object **out, git_odb *db, const git_oid *id)` - Change `git_odb_read_header`: New signature is `(size_t *len_p, git_otype *type_p, git_odb *db, const git_oid *id)` - Remove `git_odb_write` - Add `git_odb_open_wstream` - Add `git_odb_open_rstream` odb_backend.h ------------- - Change type `git_odb_backend`: New internal signatures are as follows int (* read)(void **, size_t *, git_otype *, struct git_odb_backend *, const git_oid *) int (* read_header)(size_t *, git_otype *, struct git_odb_backend *, const git_oid *) int (* writestream)(struct git_odb_stream **, struct git_odb_backend *, size_t, git_otype) int (* readstream)( struct git_odb_stream **, struct git_odb_backend *, const git_oid *) - Add type `git_odb_stream` - Add enum `git_odb_streammode` Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti b5c5f0f8 2011-03-16T23:59:09 Fix headers for the new Revision Walker The "oid.h" header is now included instead of "object.h". The old "revwalk.h" header has been removed; it was empty.
Vicent Marti 545a6915 2011-03-05T13:45:05 Change interface for Tree Index attr (always unsigned) Signed-off-by: Vicent Marti <tanoku@gmail.com>
Sakari Jokinen 9de27ad0 2011-02-25T19:05:29 Check for valid range of attributes for tree entry
Vicent Marti 86d7e1ca 2011-02-28T12:46:13 Fix searching in git_vector We now store only one sorting callback that does entry comparison. This is used when sorting the entries using a quicksort, and when looking for a specific entry with the new search methods. The following search methods now exist: git_vector_search(vector, entry) git_vector_search2(vector, custom_search_callback, key) git_vector_bsearch(vector, entry) git_vector_bsearch2(vector, custom_search_callback, key) The sorting state of the vector is now stored internally. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 48c27f86 2011-02-28T16:51:17 Implement reference counting for git_objects All `git_object` instances looked up from the repository are reference counted. User is expected to use the new `git_object_close` when an object is no longer needed to force freeing it. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 5de079b8 2011-02-28T12:12:26 Change the object creation/lookup API The methods previously known as git_repository_lookup git_repository_newobject git_repository_lookup_ref are now part of their respective namespaces: git_object_lookup git_object_new git_reference_lookup This makes the API more consistent with the new references API. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti ccef1c9d 2011-02-27T22:09:36 Move the path comparison method to fileops.c Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 122c3405 2011-02-07T18:25:42 Git trees are now always lazily sorted Removed `git_tree_add_entry_unsorted`. Now the `git_tree_add_entry` method doesn't sort the entries array by default; entries are only sorted lazily when required. This is done automatically by the library (the `git_tree_sort_entries` call has been removed). This should improve performance. No point on sorting entries all the time, anyway. Signed-off-by: Vicent Marti <tanoku@gmail.com>
John Wiegley e769e025 2011-02-07T00:11:17 Git does not like zero padded file attributes (git fsck)
John Wiegley 5bf42916 2011-02-07T00:11:00 Further correction to tree entry sorting (for git fsck)
nulltoken 40be9ae0 2011-02-05T15:03:48 Fixes a Win32/MSVC compilation issue.
Vicent Marti 4569bfa5 2011-02-05T09:11:17 Keep the tree entries always internally sorted Don't allow access to any tree entries whilst the entries array is unsorted. We keep track on when the array is unsorted, and any methods that access the array while it is unsorted now sort the array before accessing it. Signed-off-by: Vicent Marti <tanoku@gmail.com>
John Wiegley 35786cb7 2011-02-02T19:00:26 Use Git's own tree entry sorting algorithm If plain strcmp is used, as this code did before, the final sorting may end up different from what git-add would do (for example, 'boost' appearing before 'boost-build.jam', because Git sorts as if it were spelled 'boost/'). If the sorting is incorrect like this, Git 1.7.4 insists that unmodified files have been modified. For example, my test repository has these four entries: drwxr-xr-x 199 johnw wheel 6766 Feb 2 17:21 boost -rw-r--r-- 1 johnw wheel 849 Feb 2 17:22 boost-build.jam -rw-r--r-- 1 johnw wheel 989 Feb 2 17:21 boost.css -rw-r--r-- 1 johnw wheel 6308 Feb 2 17:21 boost.png Here is the output from git-ls-tree for these files, in a commit tree created using git-add and git-commit: 100644 blob 8b8775433aef73e9e12609610ae2e35cf1e7ec2c boost-build.jam 100644 blob 986c4050fa96d825a1311c8e871cdcc9a3e0d2c3 boost.css 100644 blob b4d51fcd5c9149fd77f5ca6ed2b6b1b70e8fe24f boost.png 040000 tree 46537eeaa4d577010f19b1c9e940cae9a670ff5c boost Here is the output for the same commit produced using libgit2: 040000 tree c27c0fd1436f28a6ba99acd0a6c17d178ed58288 boost 100644 blob 8b8775433aef73e9e12609610ae2e35cf1e7ec2c boost-build.jam 100644 blob 986c4050fa96d825a1311c8e871cdcc9a3e0d2c3 boost.css 100644 blob b4d51fcd5c9149fd77f5ca6ed2b6b1b70e8fe24f boost.png Due to this reordering, git-status claims the three blobs are always modified, no matter what I do using git-read-tree or git-reset or git-checkout to update the index.
John Wiegley 89f9fc6f 2011-01-28T02:41:59 Make git_tree_clear_entries visible to the user
John Wiegley 75e051c6 2011-01-27T14:20:23 Added git_tree_add_entry_unsorted and git_tree_sort_entries
Vicent Marti b29e8f19 2011-01-29T02:12:59 Return the created entry in git_tree_add_entry() Yes, we are breaking the API. Alpha software, deal with it. We need a way of getting a pointer to each newly added entry to the index, because manually looking up the entry after creation is outrageously expensive. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti c8f5ff8f 2011-01-20T14:43:27 Fix initialization of in-memory trees In-memory tree objects were not being properly initialized, because the internal entries vector was created on the 'parse' method. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 44908fe7 2010-12-06T23:03:16 Change the library include file Libgit2 is now officially include as #include "<git2.h>" or indidividual files may be included as #include <git2/index.h> Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti d12299fe 2010-12-03T22:22:10 Change include structure for the project The maze with include dependencies has been fixed. There is now a global include: #include <git.h> The git_odb_backend API has been exposed. Signed-off-by: Vicent Marti <tanoku@gmail.com>
nulltoken 6f02c3ba 2010-12-05T20:18:56 Small source code readability improvements. Replaced magic number "0" with GIT_SUCCESS constant wherever it made sense.
Vicent Marti c4034e63 2010-12-02T04:31:54 Refactor all 'vector' functions into common code All the operations on the 'git_index_entry' array and the 'git_tree_entry' array have been refactored into common code in the src/vector.c file. The new vector methods support: - insertion: O(1) (avg) - deletion: O(n) - searching: O(logn) - sorting: O(logn) - r. access: O(1) Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 1795f879 2010-11-05T03:20:17 Improve error handling All initialization functions now return error codes instead of pointers. Error codes are now properly propagated on most functions. Several new and more specific error codes have been added in common.h Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 58519018 2010-10-28T02:07:18 Fix internal memory management on the library String mememory is now managed in a much more sane manner. Fixes include: - git_person email and name is no longer limited to 64 characters - git_tree_entry filename is no longer limited to 255 characters - raw objects are properly opened & closed the minimum amount of times required for parsing - unit tests no longer leak - removed 5 other misc memory leaks as reported by Valgrind - tree writeback no longer segfaults on rare ocassions The git_person struct is no longer public. It is now managed by the library, and getter methods are in place to access its internal attributes. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti e4def81a 2010-10-08T13:52:17 Fix issue 3 (memory corruption resize_tree_array) The tree array wasn't being initialized when instantiating a tree object in memory instead of loading it from disk. New unit tests added to check for the problem. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti c4b5bedc 2010-10-07T00:07:32 Fix possible segfaults in src/tree.c (issue 1) git_tree_entry_byname was dereferencing a NULL pointer when the searched file couldn't be found on the tree. New test cases have been added to check for entry access methods. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 2a884588 2010-09-21T17:17:10 Add write-back support for git_tree All the setter methods for git_tree have been added, including the setters for attributes on each git_tree_entry and methods to add/remove entries of the tree. Modified trees and trees created in-memory from scratch can be written back to the repository using git_object_write(). Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti d45b4a9a 2010-09-20T21:39:11 Add support for in-memory objects All repository objects can now be created from scratch in memory using either the git_object_new() method, or the corresponding git_XXX_new() for each object. So far, only git_commits can be written back to disk once created in memory. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti f49a2e49 2010-09-19T03:21:06 Give object structures more descriptive names The 'git_obj' structure is now called 'git_rawobj', since it represents a raw object read from the ODB. The 'git_repository_object' structure is now called 'git_object', since it's the base object class for all objects. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti a7a7ddbe 2010-09-18T19:16:04 Add generic methods for object writeback git_repository_object has now several internal methods to write back the object information in the repository. - git_repository__dbo_prepare_write() Prepares the DBO object to be modified - git_repository__dbo_write() Writes new bytes to the DBO object - git_repository__dbo_writeback() Writes back the changes to the repository Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 370ce569 2010-08-14T20:35:10 Fix: do not export custom types in the extern API Some compilers give linking problems when exporting 'uint32_t' as a return type in the external API. Use generic types instead. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti f2408cc2 2010-08-12T19:59:32 Fix object handling in git_repository All loaded objects through git_repository_lookup are properly parsed & free'd on failure. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 003c2690 2010-08-12T18:45:31 Finish the tree object API The interface for loading and parsing tree objects from a repository has been completed with all the required accesor methods for attributes, support for manipulating individual tree entries and a new unit test t0901-readtree which tries to load and parse a tree object from a repository. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 3315782c 2010-08-08T14:12:17 Redesigned the walking/object lookup interface The old 'git_revpool' object has been removed and split into two distinct objects with separate functionality, in order to have separate methods for object management and object walking. * A new object 'git_repository' does the high-level management of a repository's objects (commits, trees, tags, etc) on top of a 'git_odb'. Eventually, it will also manage other repository attributes (e.g. tag resolution, references, etc). See: src/git/repository.h * A new external method 'git_repository_lookup(repo, oid, type)' has been added to the 'git_repository' API. All object lookups (git_XXX_lookup()) are now wrappers to this method, and duplicated code has been removed. The method does automatic type checking and returns a generic 'git_revpool_object' that can be cast to any specific object. See: src/git/repository.h * The external methods for object parsing of repository objects (git_XXX_parse()) have been removed. Loading objects from the repository is now managed through the 'lookup' functions. These objects are loaded with minimal information, and the relevant parsing is done automatically when the user requests any of the parsed attributes through accessor methods. An attribute has been added to 'git_repository' in order to force the parsing of all the repository objects immediately after lookup. See: src/git/commit.h See: src/git/tag.h See: src/git/tree.h * The previous walking functionality of the revpool is now found in 'git_revwalk', which does the actual revision walking on a repository; the attributes when walking through commits in a database have been decoupled from the actual commit objects. This increases performance when accessing commits during the walk and allows to have several 'git_revwalk' instances working at the same time on top of the same repository, without having to load commits in memory several times. See: src/git/revwalk.h * The old 'git_revpool_table' has been renamed to 'git_hashtable' and now works as a generic hashtable with support for any kind of object and custom hash functions. See: src/hashtable.h * All the relevant unit tests have been updated, renamed and grouped accordingly. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti d8603ed9 2010-07-10T16:51:15 Add parsing of tree file contents. The basic information (pointed trees and blobs) of each tree object in a revision pool can now be parsed and queried. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 225fe215 2010-06-18T13:06:34 Add support for tree objects in revision pools Commits now store pointers to their tree objects. Tree objects now work as separate git_revpool_object entities. Tree objects can be loaded and parsed inedependently from commits. Signed-off-by: Vicent Marti <tanoku@gmail.com>