src/cache.c


Log

Author Commit Date CI Message
Patrick Steinhardt 7d1b1774 2020-02-07T12:50:39 cache: fix invalid memory access in case updating cache entry fails When adding a new entry to our cache where an entry with the same OID exists already, then we only update the existing entry in case it is unparsed and the new entry is parsed. Currently, we do not check the return value of `git_oidmap_set` though when updating the existing entry. As a result, we will _not_ have updated the existing entry if `git_oidmap_set` fails, but have decremented its refcount and incremented the new entry's refcount. Later on, this may likely lead to dereferencing invalid memory. Fix the issue by checking the return value of `git_oidmap_set`. In case it fails, we will simply keep the existing stored instead, even though it's unparsed.
brian m. carlson 770b91b1 2019-07-17T15:59:54 cache: evict items more efficiently When our object cache is full, we pick eight items (or the whole cache, if there are fewer) and evict them. For small cache sizes, this is fine, but when we're dealing with a large number of objects, we can repeatedly exhaust the cache and spend a large amount of time in git_oidmap_iterate trying to find items to evict. Instead, let's assume that if the cache gets full, we have a large number of objects that we're handling, and be more aggressive about evicting items. Let's remove one item for every 2048 items, but not less than 8. This causes us to scale our evictions in proportion to the size of the cache and significantly reduces the time we spend in git_oidmap_iterate. Before this change, a full pack of all the non-blob objects in the Linux repository took in excess of 30 minutes and spent 62.3% of total runtime in odb_read_1 and its children, and 44.3% of the time in git_oidmap_iterate. With this change, the same operation now takes 14 minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total time, whereas git_oidmap_iterate consists of 6.2%. Note that we do spend a little more time inflating objects and a decent amount more time in memcmp. However, overall, the time taken is significantly improved, and time in pack building is now dominated by git_delta_create_from_index (33.7%), which is what we would expect.
Patrick Steinhardt add17435 2019-05-24T15:24:26 cache: fix cache eviction using deallocated key When evicting cache entries, we first retrieve the object that is to be evicted, delete the object and then finally delete the key from the cache. In case where the cache eviction caused us to free the cached object, though, its key will point to invalid memory now when trying to remove it from the cache map. On my system, this causes us to not properly remove the key from the map, as its memory has been overwritten already and thus the key lookup it will fail and we cannot delete it. Fix this by only decrementing the refcount of the evictee after we have removed it from our cache map. Add a test that caused a segfault previous to that change.
Edward Thomson 4069f924 2019-02-22T10:56:08 Merge pull request #4901 from pks-t/pks/uniform-map-api High-level map APIs
Patrick Steinhardt bbdcd450 2019-02-20T10:40:06 cache: fix misnaming of `git_cache_free` Functions that free a structure's contents but not the structure itself shall be named `dispose` in the libgit2 project, but the function `git_cache_free` does not follow this naming pattern. Fix this by renaming it to `git_cache_dispose` and adjusting all callers to make use of the new name.
Patrick Steinhardt 6a9117f5 2018-12-01T10:18:42 cache: use iteration interface for cache eviction To relieve us from memory pressure, we may regularly call `cache_evict_entries` to remove some entries from it. Unfortunately, our cache does not support a least-recently-used mode or something similar, which is why we evict entries completeley at random right now. Thing is, this is only possible due to the map interfaces exposing the entry indices, and we intend to completely remove those to decouple map users from map implementations. As soon as that is done, we are unable to do this random eviction anymore. Convert this to make use of an iterator for now. Obviously, there is no random eviction possible like that anymore, but we'll always start by evicting from the beginning of the map. Due to hashing, one may hope that the selected buckets will be evicted at least in some way unpredictably. But more likely than not, this will not be the case. But let's see what happens and if any users complain about degraded performance. If so, we might come up with a different scheme than random removal, e.g. by using an LRU cache.
Patrick Steinhardt 2e0a3048 2019-01-23T10:48:55 oidmap: introduce high-level setter for key/value pairs Currently, one would use either `git_oidmap_insert` to insert key/value pairs into a map or `git_oidmap_put` to insert a key only. These function have historically been macros, which is why their syntax is kind of weird: instead of returning an error code directly, they instead have to be passed a pointer to where the return value shall be stored. This does not match libgit2's common idiom of directly returning error codes.Furthermore, `git_oidmap_put` is tightly coupled with implementation details of the map as it exposes the index of inserted entries. Introduce a new function `git_oidmap_set`, which takes as parameters the map, key and value and directly returns an error code. Convert all trivial callers of `git_oidmap_insert` and `git_oidmap_put` to make use of it.
Patrick Steinhardt 9694ef20 2018-12-17T09:01:53 oidmap: introduce high-level getter for values The current way of looking up an entry from a map is tightly coupled with the map implementation, as one first has to look up the index of the key and then retrieve the associated value by using the index. As a caller, you usually do not care about any indices at all, though, so this is more complicated than really necessary. Furthermore, it invites for errors to happen if the correct error checking sequence is not being followed. Introduce a new high-level function `git_oidmap_get` that takes a map and a key and returns a pointer to the associated value if such a key exists. Otherwise, a `NULL` pointer is returned. Adjust all callers that can trivially be converted.
Patrick Steinhardt 351eeff3 2019-01-23T10:42:46 maps: use uniform lifecycle management functions Currently, the lifecycle functions for maps (allocation, deallocation, resize) are not named in a uniform way and do not have a uniform function signature. Rename the functions to fix that, and stick to libgit2's naming scheme of saying `git_foo_new`. This results in the following new interface for allocation: - `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an error code if we ran out of memory - `void git_<t>map_free(git_<t>map *map)` to free a map - `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map This commit also fixes all existing callers.
Edward Thomson f673e232 2018-12-27T13:47:34 git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
Edward Thomson 168fe39b 2018-11-28T14:26:57 object_type: use new enumeration names Use the new object_type enumeration names within the codebase.
Patrick Steinhardt 852bc9f4 2018-11-23T19:26:24 khash: remove intricate knowledge of khash types Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types, simply use `size_t` instead. This decouples code from the khash stuff and makes it possible to move the khash includes into the implementation files.
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Patrick Steinhardt 0d716905 2017-01-27T15:23:15 oidmap: remove GIT__USE_OIDMAP macro
Patrick Steinhardt 73028af8 2017-01-27T14:20:24 khash: avoid using macro magic to get return address
Patrick Steinhardt 85d2748c 2017-01-27T14:05:10 khash: avoid using `kh_key`/`kh_val` as lvalue
Patrick Steinhardt a8cd560b 2017-01-25T14:41:17 khash: avoid using `kh_del` directly
Patrick Steinhardt 71a54317 2017-01-25T14:32:23 khash: avoid using `kh_key` directly
Patrick Steinhardt cb18386f 2017-01-25T14:26:58 khash: avoid using `kh_val`/`kh_value` directly
Patrick Steinhardt 76e671a6 2017-01-25T14:20:56 khash: avoid using `kh_exist` directly
Patrick Steinhardt c37b069b 2017-01-25T14:16:35 khash: avoid using `kh_clear` directly
Patrick Steinhardt a853c527 2017-01-25T14:14:32 khash: avoid using `kh_get` directly
Patrick Steinhardt 64e46dc3 2017-01-25T14:14:12 khash: avoid using `kh_end` directly
Patrick Steinhardt 9694d9ba 2017-01-25T14:09:17 khash: avoid using `kh_foreach`/`kh_foreach_value` directly
Patrick Steinhardt 63e914cb 2017-01-25T14:05:24 khash: avoid using `kh_size` directly
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Matthew Plough 768f8be3 2015-06-30T19:00:41 Fix #3094 - improve use of portable size_t/ssize_t format specifiers. The header src/cc-compat.h defines portable format specifiers PRIuZ, PRIdZ, and PRIxZ. The original report highlighted the need to use these specifiers in examples/network/fetch.c. For this commit, I checked all C source and header files not in deps/ and transitioned to the appropriate format specifier where appropriate.
Carlos Martín Nieto 2d73075a 2015-06-10T10:23:08 cache: add a check for a failed allocation Rather minimal change, but it's the kind of thing we should do.
Justin Spahr-Summers 6a211d7c 2014-08-26T15:12:43 Refactor git_cache to use an rwlock This significantly reduces contention when many threads are trying to read from the cache simultaneously.
Vicent Marti 6de9b2ee 2013-06-12T21:10:33 util: It's called `memzero`
Russell Belfer 3e9e6cda 2013-06-07T09:54:33 Add safe memset and use it This adds a `git__memset` routine that will not be optimized away and updates the places where I memset() right before a free() call to use it.
Russell Belfer 1a42dd17 2013-05-31T14:13:11 Mutex init can fail It is obviously quite a serious problem if this happens, but mutex initialization can fail and we should detect it. It's a bit like a memory allocation failure, in that you're probably pretty screwed if this occurs, but at least we'll catch it.
Russell Belfer f658dc43 2013-05-31T14:09:58 Zero memory for major objects before freeing By zeroing out the memory when we free larger objects (i.e. those that serve as collections of other data, such as repos, odb, refdb), I'm hoping that it will be easier for libgit2 bindings to find errors in their object management code.
Russell Belfer 2e62e7c2 2013-05-24T10:33:41 Docs for git_libgit2_opts and cache disable tweak This adds docs for the cache control options to git_libgit2_opts and also tweaks the cache code so that if the cache is disabled, then the next time we attempt to insert something into the cache in question, we will actually clear any old cached objects.
Edward Thomson eb63fda2 2013-04-25T11:52:17 git_atomic_ssize for 64-bit atomics only on 64-bit platforms
Vicent Marti 879458e7 2013-04-24T15:52:33 repo: Add `git_repository__cleanup`
Vicent Marti a2378ae4 2013-04-23T20:42:29 opts: Add getter for cached memory
Vicent Marti 920cbc98 2013-04-22T17:31:47 cache: More aggressive default
Vicent Marti a14163a7 2013-04-22T17:30:49 cache: Shared meter for memory usage
Vicent Marti d8771592 2013-04-22T17:04:52 cache: Max cache size, and evict when the cache fills up
Russell Belfer 78606263 2013-04-15T00:05:44 Add callback to git_objects_table This adds create and free callback to the git_objects_table so that more of the creation and destruction of objects can be table driven instead of using switch statements. This also makes the semantics of certain object creation functions consistent so that we can make better use of function pointers. This also fixes a theoretical error case where an object allocation fails and we end up storing NULL into the cache.
Russell Belfer b12b72ea 2013-04-12T12:44:51 Add range checking around cache opts Add a git_cache_set_max_object_size method that does more checking around setting the max object size. Also add a git_cache_size to read the number of objects currently in the cache. This makes it easier to write tests.
Vicent Marti ee12272d 2013-04-05T22:48:39 Global option setters
Vicent Marti e183e375 2013-04-05T22:38:14 Clear the cache when there are too many items to expire
Vicent Marti d9d423e4 2013-04-03T23:53:32 Some stats
Vicent Marti 064236ca 2013-04-03T23:39:42 Per-object max size
Vicent Marti c4e91d45 2013-04-03T20:57:30 Random eviction
Vicent Marti 6b90e244 2013-04-01T19:53:49 Per-object filtering
Vicent Marti 5df18424 2013-04-01T19:38:23 lol this worked first try wtf
Vicent Marti 8842c75f 2013-04-03T22:30:07 What has science done.
Edward Thomson 359fc2d2 2013-01-08T17:07:25 update copyrights
Justin Spahr-Summers 1d009603 2012-12-09T02:40:16 orite C89
Justin Spahr-Summers a35b3864 2012-12-09T02:31:39 Always check the result of git_mutex_lock
Philip Kelley fcd03beb 2012-11-09T15:57:32 Fix a mutex/critical section leak
Michael Schubert 6ee68611 2012-09-10T21:29:07 cache: fix race condition Example: a cached node is owned only by the cache (refcount == 1). Thread A holds the lock and determines that the entry which should get cached equals the node (git_oid_cmp(&node->oid, &entry->oid) == 0). It frees the given entry to instead return the cached node to the user (entry = node). Now, before Thread A happens to increment the refcount of the node *outside* the cache lock, Thread B tries to store another entry and hits the slot of the node before, decrements its refcount and frees it *before* Thread A gets a chance to increment for the user. git_cached_obj_incref(entry); git_mutex_lock(&cache->lock); { git_cached_obj *node = cache->nodes[hash & cache->size_mask]; if (node == NULL) { cache->nodes[hash & cache->size_mask] = entry; } else if (git_oid_cmp(&node->oid, &entry->oid) == 0) { git_cached_obj_decref(entry, cache->free_obj); entry = node; } else { git_cached_obj_decref(node, cache->free_obj); // Thread B is here cache->nodes[hash & cache->size_mask] = entry; } } git_mutex_unlock(&cache->lock); // Thread A is here /* increase the refcount again, because we are * returning it to the user */ git_cached_obj_incref(entry);
Vicent Marti c07d9c95 2012-08-09T15:33:04 oid: Explicitly include `oid.h` for the inlined CMP
Vicent Martí 3f035860 2012-06-07T22:43:03 misc: Fix warnings from PVS Studio trial
Russell Belfer 25f258e7 2012-04-23T09:21:15 Moving power-of-two bit utilities into util.h
Russell Belfer 0d0fa7c3 2012-03-16T15:56:01 Convert attr, ignore, mwindow, status to new errors Also cleaned up some previously converted code that still had little things to polish.
schu 5e0de328 2012-02-13T17:10:24 Update Copyright header Signed-off-by: schu <schu-github@schulog.org>
Vicent Martí e4b4da14 2012-01-27T18:28:02 cache: Simplify locking mechanics The object cache is mostly IO-bound, so it makes no sense to have a lock per node.
Vicent Marti 3286c408 2011-10-28T14:51:13 global: Properly use `git__` memory wrappers Ensure that all memory related functions (malloc, calloc, strdup, free, etc) are using their respective `git__` wrappers.
Vicent Martí 71a4c1f1 2011-09-18T20:07:59 Merge pull request #384 from kiryl/warnings Add more -W flags to CFLAGS
Vicent Marti bb742ede 2011-09-19T01:54:32 Cleanup legal data 1. The license header is technically not valid if it doesn't have a copyright signature. 2. The COPYING file has been updated with the different licenses used in the project. 3. The full GPLv2 header in each file annoys me.
Kirill A. Shutemov 0b2c4061 2011-08-30T23:06:04 CMakefile: add -Wstrict-aliasing=2 and fix warnings Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Vicent Marti 335d6c99 2011-05-17T01:46:07 cache: Drop cuckoo hashing Now we use a simple closed-addressing cache. Cuckoo hashing was creating too many issues with race conditions. Fuck that. Let's see what happens performance wise, we may have to roll back or come up with another way to implement an efficient multi-threaded cache.
Vicent Marti 3de79280 2011-05-17T00:51:52 cache: Fix deadlock Do not try to adquire the same node lock twice when the cuckoo hashing resolves to the same node.
Vicent Marti 4edf3e09 2011-05-15T23:45:24 Return success code on `git_cache_init`
nulltoken 81201a4c 2011-05-15T06:57:34 Move cache.c to the new error handling
nulltoken 3abe3bba 2011-05-14T16:05:33 Move repository.c to the new error handling
Vicent Marti 99baacfb 2011-03-21T19:27:45 Fix MSVC warnings
Vicent Marti 72a3fe42 2011-03-18T19:38:49 I broke your bindings Hey. Apologies in advance -- I broke your bindings. This is a major commit that includes a long-overdue redesign of the whole object-database structure. This is expected to be the last major external API redesign of the library until the first non-alpha release. Please get your bindings up to date with these changes. They will be included in the next minor release. Sorry again! Major features include: - Real caching and refcounting on parsed objects - Real caching and refcounting on objects read from the ODB - Streaming writes & reads from the ODB - Single-method writes for all object types - The external API is now partially thread-safe The speed increases are significant in all aspects, specially when reading an object several times from the ODB (revwalking) and when writing big objects to the ODB. Here's a full changelog for the external API: blob.h ------ - Remove `git_blob_new` - Remove `git_blob_set_rawcontent` - Remove `git_blob_set_rawcontent_fromfile` - Rename `git_blob_writefile` -> `git_blob_create_fromfile` - Change `git_blob_create_fromfile`: The `path` argument is now relative to the repository's working dir - Add `git_blob_create_frombuffer` commit.h -------- - Remove `git_commit_new` - Remove `git_commit_add_parent` - Remove `git_commit_set_message` - Remove `git_commit_set_committer` - Remove `git_commit_set_author` - Remove `git_commit_set_tree` - Add `git_commit_create` - Add `git_commit_create_v` - Add `git_commit_create_o` - Add `git_commit_create_ov` tag.h ----- - Remove `git_tag_new` - Remove `git_tag_set_target` - Remove `git_tag_set_name` - Remove `git_tag_set_tagger` - Remove `git_tag_set_message` - Add `git_tag_create` - Add `git_tag_create_o` tree.h ------ - Change `git_tree_entry_2object`: New signature is `(git_object **object_out, git_repository *repo, git_tree_entry *entry)` - Remove `git_tree_new` - Remove `git_tree_add_entry` - Remove `git_tree_remove_entry_byindex` - Remove `git_tree_remove_entry_byname` - Remove `git_tree_clearentries` - Remove `git_tree_entry_set_id` - Remove `git_tree_entry_set_name` - Remove `git_tree_entry_set_attributes` object.h ------------ - Remove `git_object_new - Remove `git_object_write` - Change `git_object_close`: This method is now *mandatory*. Not closing an object causes a memory leak. odb.h ----- - Remove type `git_rawobj` - Remove `git_rawobj_close` - Rename `git_rawobj_hash` -> `git_odb_hash` - Change `git_odb_hash`: New signature is `(git_oid *id, const void *data, size_t len, git_otype type)` - Add type `git_odb_object` - Add `git_odb_object_close` - Change `git_odb_read`: New signature is `(git_odb_object **out, git_odb *db, const git_oid *id)` - Change `git_odb_read_header`: New signature is `(size_t *len_p, git_otype *type_p, git_odb *db, const git_oid *id)` - Remove `git_odb_write` - Add `git_odb_open_wstream` - Add `git_odb_open_rstream` odb_backend.h ------------- - Change type `git_odb_backend`: New internal signatures are as follows int (* read)(void **, size_t *, git_otype *, struct git_odb_backend *, const git_oid *) int (* read_header)(size_t *, git_otype *, struct git_odb_backend *, const git_oid *) int (* writestream)(struct git_odb_stream **, struct git_odb_backend *, size_t, git_otype) int (* readstream)( struct git_odb_stream **, struct git_odb_backend *, const git_oid *) - Add type `git_odb_stream` - Add enum `git_odb_streammode` Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti bb3de0c4 2011-03-16T21:35:51 Thread safe cache