src/index.c


Log

Author Commit Date CI Message
Vicent Marti de18f276 2011-07-07T01:46:20 vector: Timsort all of the things Drop the GLibc implementation of Merge Sort and replace it with Timsort. The algorithm has been tuned to work on arrays of pointers (void **), so there's no longer a need to abstract the byte-width of each element in the array. All the comparison callbacks now take pointers-to-elements, not pointers-to-pointers, so there's now one less level of dereferencing. E.g. int index_cmp(const void *a, const void *b) { - const git_index_entry *entry_a = *(const git_index_entry **)(a); + const git_index_entry *entry_a = (const git_index_entry *)(a); The result is up to a 40% speed-up when sorting vectors. Memory usage remains lineal. A new `bsearch` implementation has been added, whose callback also supplies pointer-to-elements, to uniform the Vector API again.
Kirill A. Shutemov 8cc16e29 2011-06-30T23:22:42 index: speedup git_index_append()/git_index_append2() git_index_find() in index_insert() is useless if replace is not requested (append). Do not call it in this case. It speedup git_index_append() *dramatically* on large indexes. $ cat index_test.c int main(int argc, char **argv) { git_index *index; git_repository *repo; git_odb *odb; struct git_index_entry entry; git_oid tree_oid; char tree_hex[41]; int i; git_repository_init(&repo, "/tmp/myrepo", 0); odb = git_repository_database(repo); git_repository_index(&index, repo); memset(&entry, 0, sizeof(entry)); git_odb_write(&entry.oid, odb, "", 0, GIT_OBJ_BLOB); entry.path = "test.file"; for (i = 0; i < 50000; i++) git_index_append2(index, &entry); git_tree_create_fromindex(&tree_oid, index); git_oid_fmt(tree_hex, &tree_oid); tree_hex[40] = '\0'; printf("tree: %s\n", tree_hex); git_index_free(index); git_repository_free(repo); return 0; } Before: $ time ./index_test tree: 43f73659c43b651588cc81459d9e25b08721b95d ./index_test 151.19s user 0.05s system 99% cpu 2:31.78 total After: $ time ./index_test tree: 43f73659c43b651588cc81459d9e25b08721b95d ./index_test 0.05s user 0.00s system 94% cpu 0.059 total About 2573 times speedup on this test :) Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Kirill A. Shutemov 245adf4f 2011-07-02T01:08:42 index: introduce git_index_uniq() function It removes all entries with equal path except last added. On large indexes git_index_append() + git_index_uniq() before writing is *much* faster, than git_index_add(). Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Vicent Marti f79026b4 2011-07-04T11:43:34 fileops: Cleanup Cleaned up the structure of the whole OS-abstraction layer. fileops.c now contains a set of utility methods for file management used by the library. These are abstractions on top of the original POSIX calls. There's a new file called `posix.c` that contains emulations/reimplementations of all the POSIX calls the library uses. These are prefixed with `p_`. There's a specific posix file for each platform (win32 and unix). All the path-related methods have been moved from `utils.c` to `path.c` and have their own prefix.
Kirill A. Shutemov 932d1baf 2011-06-30T19:52:34 cleanup: remove trailing spaces Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Vicent Marti fa48608e 2011-06-16T02:36:21 oid: Rename methods Yeah. Finally. Fuck the old names, this ain't POSIX and they don't make any sense at all.
Sebastian Schuberth c1802641 2011-06-10T13:56:24 Prefer to use file mode defines instead of raw numbers
Vicent Marti ae496955 2011-06-08T17:03:41 windows: Fix Symlink issues Handle Symlinks if they can be handled in Win32. This is not even compiled. Needs review. The lstat implementation is modified from core Git. The readlink implementation is modified from PHP.
Jakob Pfender fdd1e04c 2011-06-07T14:10:06 fileops: Allow differentiation between deep and shallow exists() When calling gitfo_exists() on a symbolic link, sometimes we need to simply check whether the link exists and sometimes we need to check whether the file pointed to by the symlink exists. Introduce a new function gitfo_shallow_exists that only checks if the link exists and revert gitfo_exists to the original functionality of checking whether the file pointed to by the link exists.
Jakob Pfender b74c867a 2011-06-07T11:11:09 blob: Stat path inside git_blob_create_fromfile 00582bc introduced a change that required the caller of git_blob_create_fromfile() to pass a struct stat with the stat information for the file. Several developers pointed out that this would make life hard for the bindings developers as struct stat isn't widely supported by other languages. Make git_blob_create_fromfile() stat the path itself, eliminating the need for the file to be stat'ed by the caller. This makes index_init_entry() more costly as the file will be stat'ed twice but makes life easier for everyone else.
Jakob Pfender c1a2a14e 2011-05-25T16:16:41 index: Correctly write entry mode The entry mode flags for an entry created from a path name were not correctly written if the entry was a symlink. The st_mode of a statted symlink is 0120777, however git requires the mode to read 0120000, because it does not care about permissions of symlinks. Introduce index_create_mode() that correctly writes the mode flags in the form expected by git.
Jakob Pfender 1869b31e 2011-05-25T16:11:57 index/fileops: Correctly process symbolic links gitfo_exists() used to error out if the given file was a symbolic link, due to access() returning an error code. This is not expected behaviour, as gitfo_exists() should only check whether the file itself exists, not its link target if it is a symbolic link. Fix this by calling gitfo_lstat() instead, which is just a wrapper for lstat(). Also fix the same error in index_init_entry().
Jakob Pfender 4d7905c5 2011-05-25T16:04:29 blob: Require stat information for git_blob_create_fromfile() In order to be able to write symlinks with git_blob_create_fromfile(), we need to check whether the file to be written is a symbolic link or not. Since the calling function of git_blob_create_fromfile() is likely to have stated the file before calling, we make it pass the stat. The reason for this is that writing symbolic link blobs is significantly different from writing ordinary files - we do not want to open the link destination but instead want to write the link itself, regardless of whether it exists or not. Previously, index_init_entry() used to error out if the file to be added was a symlink that pointed to a nonexistent file. Fix this behaviour to add the file regardless of whether it exists. This mimics git.git's behaviour.
Romain Geissler f11e0797 2011-06-05T21:19:03 Index: API uniformisation: Use unsigned int for all index number. Feature Added: Search an unmerged entry by path (git_index_get_unmerged renamed to git_index_get_unmerged_bypath) or by index (git_index_get_unmerged_byindex).
Vicent Marti 3a42e0a3 2011-06-03T21:38:55 index: Add `git_index_entry_stage` method As suggested by Romain-Geissler
Vicent Martí a7fdce62 2011-06-01T12:53:16 Merge pull request #223 from carlosmn/valgrind Plug a leak in the index unmerged entries vector
Vicent Marti 786ad84f 2011-06-01T18:51:54 index: Cleanup tree parsing
Carlos Martín Nieto a02fc2cd 2011-05-24T15:24:45 index: correctly parse invalidated TREE extensions A TREE extension with an entry count of -1 means that it was invalidated and we should ignore it. Do so instead of returning an error. This fixes issue #202 Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Carlos Martín Nieto cdd9fd47 2011-05-24T14:55:34 Allow read_tree_internal to return an error code There are two reasons why read_tree_internal might return a NULL tree. The first one is a corrupt index, but the second one is an invalidated TREE extension. Up to now, its only way to communicate with its caller was through the return value being NULL or not. Allow read_tree_internal to report its exit status independently from the tree pointer. Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Vicent Marti f7e59c4d 2011-06-01T18:34:21 index: Change the memory management for repo indexes The `git_repository_index` call now returns a brand new index that must be manually free'd.
Carlos Martín Nieto 71da57ae 2011-05-31T16:49:15 Plug a leak in the index unmerged entries vector Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Vicent Marti 8146fe7c 2011-05-23T21:41:13 index: Fix unused error messages
Jakob Pfender d320c52d 2011-05-19T15:59:18 index.c: Move to new error handling mechanism
Vicent Marti f4e2aca2 2011-05-19T20:38:17 index: Fix issues in the unmerged entries API
Jakob Pfender 9d27fd3b 2011-05-17T16:51:37 index.c: Fix typo git__rethrow was missing an underscore.
Jakob Pfender c90bfec7 2011-05-17T16:18:56 Move index.c to new error handling mechanism
Jakob Pfender 050e8877 2011-05-17T15:31:05 Merge branch 'development' into unmerged
Jason R. McNeil 773bc20d 2011-05-03T22:22:42 Fix misspelling of git_index_append2 (was git_index_apppend2).
Vicent Marti 1648fbd3 2011-05-02T01:12:53 Re-apply missing patches
Jakob Pfender e3c7786b 2011-04-28T17:31:13 index.c: Remove duplicate function declaration read_unmerged_internal() was present twice.
Vicent Marti f7a5058a 2011-04-24T00:31:43 index: Refactor add/replace methods Removed the optional `replace` argument, we now have 4 add methods: `git_index_add`: add or update from path `git_index_add2`: add or update from struct `git_index_append`: add without replacing from path `git_index_append2`: add without replacing from struct Yes, this breaks the bindings.
Jakob Pfender 4c0b6a6d 2011-04-21T10:54:54 index: Add API for unmerged entries New external functions: - git_index_unmerged_entrycount: Counts the unmerged entries in the index - git_index_get_unmerged: Gets an unmerged entry from the index by name New internal functions: - read_unmerged: Wrapper for read_unmerged_internal - read_unmerged_internal: Reads unmerged entries from the index if the index has the INDEX_EXT_UNMERGED_SIG set - unmerged_srch: Search function for unmerged vector - unmerged_cmp: Compare function for unmerged vector New data structures: - git_index now contains a git_vector unmerged that stores unmerged entries - git_index_entry_unmerged: Representation of an unmerged file entry. It represents all three versions of the file at the same time, with one name, three modes and three OIDs
Jakob Pfender 729b6f49 2011-04-21T10:40:54 index: Allow user to toggle whether to replace an index entry When in the middle of a merge, the index needs to contain several files with the same name. git_index_insert() used to prevent this by not adding a new entry if an entry with the same name already existed.
Jakob Pfender 1eb0f68e 2011-04-11T12:38:50 merge branch development
Vicent Marti c6e65aca 2011-04-09T15:22:11 Properly check `strtol` for errors We are now using a custom `strtol` implementation to make sure we're not missing any overflow errors.
Jakob Pfender fd279b26 2011-04-07T16:58:42 index.c: Correctly check whether index contains extended entries Although write_index() supports writing extended header versions for index, this was never done as there was no check for extended index entries. Introduce function is_index_extended() that checks whether an index contains extended entries and check whether an index is extended before writing it to disk, adjusting its version number if necessary.
schu 683581a3 2011-03-28T17:59:13 index.c: Fix tiny typos
Jakob Pfender 3bdc0d4c 2011-03-24T15:32:24 index.c: Read index after initialization The current behaviour of git_index_open{bare,inrepo}() is unexpected. When an index is opened, an in-memory index object is created that is linked to the index discovered by git_repository_open(). However, this index object is empty, as the on-disk index is not read. To fully open the on-disk index file, git_index_read() has to be called. This leads to confusing behaviour. Consider the following code: git_index *idx; git_index_open_inrepo(&idx, repo); git_index_write(idx); You would expect this to have no effect, as the index is never ostensibly manipulated. However, what actually happens is that the index entries are removed from the on-disk index because the empty in-memory index object created by open_inrepo() is written back to the disk. This patch reads the index after opening it.
nulltoken 56d8ca26 2011-03-20T18:36:25 Switch from time_t to git_time_t git_time_t is defined as a signed 64 integer. This allows a true predictable multiplatform behavior.
Vicent Marti 72a3fe42 2011-03-18T19:38:49 I broke your bindings Hey. Apologies in advance -- I broke your bindings. This is a major commit that includes a long-overdue redesign of the whole object-database structure. This is expected to be the last major external API redesign of the library until the first non-alpha release. Please get your bindings up to date with these changes. They will be included in the next minor release. Sorry again! Major features include: - Real caching and refcounting on parsed objects - Real caching and refcounting on objects read from the ODB - Streaming writes & reads from the ODB - Single-method writes for all object types - The external API is now partially thread-safe The speed increases are significant in all aspects, specially when reading an object several times from the ODB (revwalking) and when writing big objects to the ODB. Here's a full changelog for the external API: blob.h ------ - Remove `git_blob_new` - Remove `git_blob_set_rawcontent` - Remove `git_blob_set_rawcontent_fromfile` - Rename `git_blob_writefile` -> `git_blob_create_fromfile` - Change `git_blob_create_fromfile`: The `path` argument is now relative to the repository's working dir - Add `git_blob_create_frombuffer` commit.h -------- - Remove `git_commit_new` - Remove `git_commit_add_parent` - Remove `git_commit_set_message` - Remove `git_commit_set_committer` - Remove `git_commit_set_author` - Remove `git_commit_set_tree` - Add `git_commit_create` - Add `git_commit_create_v` - Add `git_commit_create_o` - Add `git_commit_create_ov` tag.h ----- - Remove `git_tag_new` - Remove `git_tag_set_target` - Remove `git_tag_set_name` - Remove `git_tag_set_tagger` - Remove `git_tag_set_message` - Add `git_tag_create` - Add `git_tag_create_o` tree.h ------ - Change `git_tree_entry_2object`: New signature is `(git_object **object_out, git_repository *repo, git_tree_entry *entry)` - Remove `git_tree_new` - Remove `git_tree_add_entry` - Remove `git_tree_remove_entry_byindex` - Remove `git_tree_remove_entry_byname` - Remove `git_tree_clearentries` - Remove `git_tree_entry_set_id` - Remove `git_tree_entry_set_name` - Remove `git_tree_entry_set_attributes` object.h ------------ - Remove `git_object_new - Remove `git_object_write` - Change `git_object_close`: This method is now *mandatory*. Not closing an object causes a memory leak. odb.h ----- - Remove type `git_rawobj` - Remove `git_rawobj_close` - Rename `git_rawobj_hash` -> `git_odb_hash` - Change `git_odb_hash`: New signature is `(git_oid *id, const void *data, size_t len, git_otype type)` - Add type `git_odb_object` - Add `git_odb_object_close` - Change `git_odb_read`: New signature is `(git_odb_object **out, git_odb *db, const git_oid *id)` - Change `git_odb_read_header`: New signature is `(size_t *len_p, git_otype *type_p, git_odb *db, const git_oid *id)` - Remove `git_odb_write` - Add `git_odb_open_wstream` - Add `git_odb_open_rstream` odb_backend.h ------------- - Change type `git_odb_backend`: New internal signatures are as follows int (* read)(void **, size_t *, git_otype *, struct git_odb_backend *, const git_oid *) int (* read_header)(size_t *, git_otype *, struct git_odb_backend *, const git_oid *) int (* writestream)(struct git_odb_stream **, struct git_odb_backend *, size_t, git_otype) int (* readstream)( struct git_odb_stream **, struct git_odb_backend *, const git_oid *) - Add type `git_odb_stream` - Add enum `git_odb_streammode` Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 971c90be 2011-02-28T16:54:13 Do not free the index if it's owned by a repository Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 86d7e1ca 2011-02-28T12:46:13 Fix searching in git_vector We now store only one sorting callback that does entry comparison. This is used when sorting the entries using a quicksort, and when looking for a specific entry with the new search methods. The following search methods now exist: git_vector_search(vector, entry) git_vector_search2(vector, custom_search_callback, key) git_vector_bsearch(vector, entry) git_vector_bsearch2(vector, custom_search_callback, key) The sorting state of the vector is now stored internally. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 817c2820 2011-02-21T17:05:16 Rewrite all file IO for more performance The new `git_filebuf` structure provides atomic high-performance writes to disk by using a write cache, and optionally a double-buffered scheme through a worker thread (not enabled yet). Writes can be done 3-layered, like in git.git (user code -> write cache -> disk), or 2-layered, by writing directly on the cache. This makes index writing considerably faster. The `git_filebuf` structure contains all the old functionality of `git_filelock` for atomic file writes and reads. The `git_filelock` structure has been removed. Additionally, the `git_filebuf` API allows to automatically hash (SHA1) all the data as it is written to disk (hashing is done smartly on big chunks to improve performance). Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti e822508a 2011-02-18T10:29:55 Disable threaded index writing by default The interlocking on the write threads was not being done properly (index entries were sometimes written out of order). With proper interlocking, the threaded write is only marginally faster on big index files, and slower on the smaller ones because of the overhead when creating threads. The threaded index writing has been temporarily disabled; after more accurate benchmarks, if might be possible to enable it again only when writing very large index files (> 1000 entries). Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 084c1935 2011-02-17T23:32:22 Fix type truncation in index entries 64-bit types stored in memory have to be truncated into 32 bits when writing to disk. Was causing warnings in MSVC. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 348c7335 2011-02-17T21:32:00 Improve the performance when writing Index files In response to issue #60 (git_index_write really slow), the write_index function has been rewritten to improve its performance -- it should now be in par with the performance of git.git. On top of that, if Posix Threads are available when compiling libgit2, a new threaded writing system will be used (3 separate threads take care of solving byte-endianness, hashing the contents of the index and writing to disk, respectively). For very long Index files, this method is up to 3x times faster than git.git. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 995f9c34 2011-02-09T12:43:19 Use the new git__joinpath to build paths in methods The `git__joinpath` function has been changed to use a statically allocated buffer; we assume the buffer to be 4096 bytes, because fuck you. The new method also supports an arbritrary number of paths to join, which may come in handy in the future. Some methods which were manually joining paths with `strcpy` now use the new function, namely those in `index.c` and `refs.c`. Based on Emeric Fermas' original patch, which was using the old `git__joinpath` because I'm stupid. Thanks! Signed-off-by: Vicent Marti <tanoku@gmail.com>
Alex Budovski f0bde7fa 2011-01-11T16:07:45 Revised platform types to use 'best supported' size. This will allow graceful migration to 64 bit file sizes and timestamps should git's binary interface be extended to allow this.
Alex Budovski 0a3bcad0 2011-01-10T14:57:06 Fix Windows build with forced bit truncation. Windows uses a 64 bit time_t by default and assigning to unsigned int causes a 64 -> 32 bit truncation warning. This change forces the truncation, acknowledging the implications detailed in the file comments. Also, blobs are limited to 32 bit file sizes for the same reason (on all platforms).
Vicent Marti a44fc1d4 2010-12-06T23:13:00 Fix type-conversion warnings The types in the git_index_entry struct are now system-defaults, and get truncated to uint32_t's when written back on the index. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 44908fe7 2010-12-06T23:03:16 Change the library include file Libgit2 is now officially include as #include "<git2.h>" or indidividual files may be included as #include <git2/index.h> Signed-off-by: Vicent Marti <tanoku@gmail.com>
nulltoken 6f02c3ba 2010-12-05T20:18:56 Small source code readability improvements. Replaced magic number "0" with GIT_SUCCESS constant wherever it made sense.
Vicent Marti c4034e63 2010-12-02T04:31:54 Refactor all 'vector' functions into common code All the operations on the 'git_index_entry' array and the 'git_tree_entry' array have been refactored into common code in the src/vector.c file. The new vector methods support: - insertion: O(1) (avg) - deletion: O(n) - searching: O(logn) - sorting: O(logn) - r. access: O(1) Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 91e88941 2010-11-29T18:06:22 Properly write Index Entry 'flags_extended' Always write the 'flags_extended' attribute to disk if it's available. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti c3dd69a9 2010-11-17T04:59:11 Fix resizing the index array No longer segfaults when resizing an empty array. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti c3a20d5c 2010-11-14T22:11:46 Add support for 'index add' Actually add files to the index by creating their corresponding blob and storing it on the repository, then getting the hash and updating the index file. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Scott Chacon 0be42199 2010-11-10T13:43:55 accessor for index entry count
Vicent Marti 3f43678e 2010-11-07T01:24:45 Make the Index API public Several private methods of the Index API are now public, including the methods to remove, get and add index entries. All the methods only take an integer value for the position of the entry to get/remove. To get or remove entries based on their path names, look them up first using the git_index_find method. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 1795f879 2010-11-05T03:20:17 Improve error handling All initialization functions now return error codes instead of pointers. Error codes are now properly propagated on most functions. Several new and more specific error codes have been added in common.h Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 6fd195d7 2010-11-02T18:42:42 Change git_repository initialization to use a path The constructor to git_repository is now called 'git_repository_open(path)' and takes a path to a git repository instead of an existing ODB object. Unit tests have been updated accordingly and the two test repositories have been merged into one. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 68535125 2010-07-09T20:19:56 Add support for git index files The new 'git_index' structure is an in-memory representation of a git index on disk; the 'git_index_entry' structures represent each one of the file entries on the index. The following calls for index instantiation have been added: git_index_alloc(): instantiate a new index structure git_index_free(): free an existing index git_index_clear(): clear all the entires in an existing file The following calls for index reading and writing have been added: git_index_read(): update the contents of the index structure from its file on disk. Internally implemented through: git_index__parse() Index files are stored on disk in network byte order; all integer fields inside them are properly converted to the machine's byte order when loading them in memory. The parsing engine also distinguishes between normal index entries and extended entries with 2 extra bytes of flags. The 'TREE' extension for index entries is also loaded into memory: Tree caches stored in Index files are loaded into the 'git_index_tree' structure pointed by the 'tree' pointer inside 'git_index'. 'index->tree' points to the root node of the tree cache; the full tree can be traversed through each of the node's 'tree->children'. Index files can be written back to disk through: git_index_write(): atomic writing of existing index objects backed by internal method git_index__write() The following calls for entry manipulation have been added: git_index_add(): insert an empty entry to the index git_index_find(): search an entry by its path name git_index__append(): appends a new index entry to the end of the list, resizing the entries array if required New index entries are always inserted at the end of the array; since the index entries must be sorted for it to be internally consistent, the index object is only sorted once, and if required, before accessing the whole entriea array (e.g. before writing to disk, before traversing, etc). git_index__remove_pos(): remove an index entry in a specific position git_index__sort(): sort the entries in the array by path name The entries array is sorted stably and in place using an insertion sort, which ought to be the most efficient approach since the entries array is always mostly-sorted. Signed-off-by: Vicent Marti <tanoku@gmail.com>