|
770b91b1
|
2019-07-17T15:59:54
|
|
cache: evict items more efficiently
When our object cache is full, we pick eight items (or the whole cache,
if there are fewer) and evict them. For small cache sizes, this is fine,
but when we're dealing with a large number of objects, we can repeatedly
exhaust the cache and spend a large amount of time in git_oidmap_iterate
trying to find items to evict.
Instead, let's assume that if the cache gets full, we have a large
number of objects that we're handling, and be more aggressive about
evicting items. Let's remove one item for every 2048 items, but not less
than 8. This causes us to scale our evictions in proportion to the size
of the cache and significantly reduces the time we spend in
git_oidmap_iterate.
Before this change, a full pack of all the non-blob objects in the Linux
repository took in excess of 30 minutes and spent 62.3% of total runtime
in odb_read_1 and its children, and 44.3% of the time in
git_oidmap_iterate. With this change, the same operation now takes 14
minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total
time, whereas git_oidmap_iterate consists of 6.2%.
Note that we do spend a little more time inflating objects and a decent
amount more time in memcmp. However, overall, the time taken is
significantly improved, and time in pack building is now dominated by
git_delta_create_from_index (33.7%), which is what we would expect.
|
|
add17435
|
2019-05-24T15:24:26
|
|
cache: fix cache eviction using deallocated key
When evicting cache entries, we first retrieve the object that is
to be evicted, delete the object and then finally delete the key
from the cache. In case where the cache eviction caused us to
free the cached object, though, its key will point to invalid
memory now when trying to remove it from the cache map. On my
system, this causes us to not properly remove the key from the
map, as its memory has been overwritten already and thus the key
lookup it will fail and we cannot delete it.
Fix this by only decrementing the refcount of the evictee after
we have removed it from our cache map. Add a test that caused a
segfault previous to that change.
|
|
4069f924
|
2019-02-22T10:56:08
|
|
Merge pull request #4901 from pks-t/pks/uniform-map-api
High-level map APIs
|
|
bbdcd450
|
2019-02-20T10:40:06
|
|
cache: fix misnaming of `git_cache_free`
Functions that free a structure's contents but not the structure
itself shall be named `dispose` in the libgit2 project, but the
function `git_cache_free` does not follow this naming pattern.
Fix this by renaming it to `git_cache_dispose` and adjusting all
callers to make use of the new name.
|
|
6a9117f5
|
2018-12-01T10:18:42
|
|
cache: use iteration interface for cache eviction
To relieve us from memory pressure, we may regularly call `cache_evict_entries`
to remove some entries from it. Unfortunately, our cache does not support a
least-recently-used mode or something similar, which is why we evict entries
completeley at random right now. Thing is, this is only possible due to the map
interfaces exposing the entry indices, and we intend to completely remove those
to decouple map users from map implementations. As soon as that is done, we are
unable to do this random eviction anymore.
Convert this to make use of an iterator for now. Obviously, there is no random
eviction possible like that anymore, but we'll always start by evicting from the
beginning of the map. Due to hashing, one may hope that the selected buckets
will be evicted at least in some way unpredictably. But more likely than not,
this will not be the case. But let's see what happens and if any users complain
about degraded performance. If so, we might come up with a different scheme than
random removal, e.g. by using an LRU cache.
|
|
2e0a3048
|
2019-01-23T10:48:55
|
|
oidmap: introduce high-level setter for key/value pairs
Currently, one would use either `git_oidmap_insert` to insert key/value pairs
into a map or `git_oidmap_put` to insert a key only. These function have
historically been macros, which is why their syntax is kind of weird: instead of
returning an error code directly, they instead have to be passed a pointer to
where the return value shall be stored. This does not match libgit2's common
idiom of directly returning error codes.Furthermore, `git_oidmap_put` is tightly
coupled with implementation details of the map as it exposes the index of
inserted entries.
Introduce a new function `git_oidmap_set`, which takes as parameters the map,
key and value and directly returns an error code. Convert all trivial callers of
`git_oidmap_insert` and `git_oidmap_put` to make use of it.
|
|
9694ef20
|
2018-12-17T09:01:53
|
|
oidmap: introduce high-level getter for values
The current way of looking up an entry from a map is tightly coupled with the
map implementation, as one first has to look up the index of the key and then
retrieve the associated value by using the index. As a caller, you usually do
not care about any indices at all, though, so this is more complicated than
really necessary. Furthermore, it invites for errors to happen if the correct
error checking sequence is not being followed.
Introduce a new high-level function `git_oidmap_get` that takes a map and a key
and returns a pointer to the associated value if such a key exists. Otherwise,
a `NULL` pointer is returned. Adjust all callers that can trivially be
converted.
|
|
351eeff3
|
2019-01-23T10:42:46
|
|
maps: use uniform lifecycle management functions
Currently, the lifecycle functions for maps (allocation, deallocation, resize)
are not named in a uniform way and do not have a uniform function signature.
Rename the functions to fix that, and stick to libgit2's naming scheme of saying
`git_foo_new`. This results in the following new interface for allocation:
- `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an
error code if we ran out of memory
- `void git_<t>map_free(git_<t>map *map)` to free a map
- `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map
This commit also fixes all existing callers.
|
|
f673e232
|
2018-12-27T13:47:34
|
|
git_error: use new names in internal APIs and usage
Move to the `git_error` name in the internal API for error-related
functions.
|
|
168fe39b
|
2018-11-28T14:26:57
|
|
object_type: use new enumeration names
Use the new object_type enumeration names within the codebase.
|
|
852bc9f4
|
2018-11-23T19:26:24
|
|
khash: remove intricate knowledge of khash types
Instead of using the `khiter_t`, `git_strmap_iter` and `khint_t` types,
simply use `size_t` instead. This decouples code from the khash stuff
and makes it possible to move the khash includes into the implementation
files.
|
|
0c7f49dd
|
2017-06-30T13:39:01
|
|
Make sure to always include "common.h" first
Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.
This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.
This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.
|
|
0d716905
|
2017-01-27T15:23:15
|
|
oidmap: remove GIT__USE_OIDMAP macro
|
|
73028af8
|
2017-01-27T14:20:24
|
|
khash: avoid using macro magic to get return address
|
|
85d2748c
|
2017-01-27T14:05:10
|
|
khash: avoid using `kh_key`/`kh_val` as lvalue
|
|
a8cd560b
|
2017-01-25T14:41:17
|
|
khash: avoid using `kh_del` directly
|
|
71a54317
|
2017-01-25T14:32:23
|
|
khash: avoid using `kh_key` directly
|
|
cb18386f
|
2017-01-25T14:26:58
|
|
khash: avoid using `kh_val`/`kh_value` directly
|
|
76e671a6
|
2017-01-25T14:20:56
|
|
khash: avoid using `kh_exist` directly
|
|
c37b069b
|
2017-01-25T14:16:35
|
|
khash: avoid using `kh_clear` directly
|
|
a853c527
|
2017-01-25T14:14:32
|
|
khash: avoid using `kh_get` directly
|
|
64e46dc3
|
2017-01-25T14:14:12
|
|
khash: avoid using `kh_end` directly
|
|
9694d9ba
|
2017-01-25T14:09:17
|
|
khash: avoid using `kh_foreach`/`kh_foreach_value` directly
|
|
63e914cb
|
2017-01-25T14:05:24
|
|
khash: avoid using `kh_size` directly
|
|
909d5494
|
2016-12-29T12:25:15
|
|
giterr_set: consistent error messages
Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
|
|
768f8be3
|
2015-06-30T19:00:41
|
|
Fix #3094 - improve use of portable size_t/ssize_t format specifiers.
The header src/cc-compat.h defines portable format specifiers PRIuZ, PRIdZ, and PRIxZ. The original report highlighted the need to use these specifiers in examples/network/fetch.c. For this commit, I checked all C source and header files not in deps/ and transitioned to the appropriate format specifier where appropriate.
|
|
2d73075a
|
2015-06-10T10:23:08
|
|
cache: add a check for a failed allocation
Rather minimal change, but it's the kind of thing we should do.
|
|
6a211d7c
|
2014-08-26T15:12:43
|
|
Refactor git_cache to use an rwlock
This significantly reduces contention when many threads are trying to
read from the cache simultaneously.
|
|
6de9b2ee
|
2013-06-12T21:10:33
|
|
util: It's called `memzero`
|
|
3e9e6cda
|
2013-06-07T09:54:33
|
|
Add safe memset and use it
This adds a `git__memset` routine that will not be optimized away
and updates the places where I memset() right before a free() call
to use it.
|
|
1a42dd17
|
2013-05-31T14:13:11
|
|
Mutex init can fail
It is obviously quite a serious problem if this happens, but mutex
initialization can fail and we should detect it. It's a bit like
a memory allocation failure, in that you're probably pretty screwed
if this occurs, but at least we'll catch it.
|
|
f658dc43
|
2013-05-31T14:09:58
|
|
Zero memory for major objects before freeing
By zeroing out the memory when we free larger objects (i.e. those
that serve as collections of other data, such as repos, odb, refdb),
I'm hoping that it will be easier for libgit2 bindings to find
errors in their object management code.
|
|
2e62e7c2
|
2013-05-24T10:33:41
|
|
Docs for git_libgit2_opts and cache disable tweak
This adds docs for the cache control options to git_libgit2_opts
and also tweaks the cache code so that if the cache is disabled,
then the next time we attempt to insert something into the cache
in question, we will actually clear any old cached objects.
|
|
eb63fda2
|
2013-04-25T11:52:17
|
|
git_atomic_ssize for 64-bit atomics only on 64-bit platforms
|
|
879458e7
|
2013-04-24T15:52:33
|
|
repo: Add `git_repository__cleanup`
|
|
a2378ae4
|
2013-04-23T20:42:29
|
|
opts: Add getter for cached memory
|
|
920cbc98
|
2013-04-22T17:31:47
|
|
cache: More aggressive default
|
|
a14163a7
|
2013-04-22T17:30:49
|
|
cache: Shared meter for memory usage
|
|
d8771592
|
2013-04-22T17:04:52
|
|
cache: Max cache size, and evict when the cache fills up
|
|
78606263
|
2013-04-15T00:05:44
|
|
Add callback to git_objects_table
This adds create and free callback to the git_objects_table so
that more of the creation and destruction of objects can be table
driven instead of using switch statements. This also makes the
semantics of certain object creation functions consistent so that
we can make better use of function pointers. This also fixes a
theoretical error case where an object allocation fails and we
end up storing NULL into the cache.
|
|
b12b72ea
|
2013-04-12T12:44:51
|
|
Add range checking around cache opts
Add a git_cache_set_max_object_size method that does more checking
around setting the max object size. Also add a git_cache_size to
read the number of objects currently in the cache. This makes it
easier to write tests.
|
|
e183e375
|
2013-04-05T22:38:14
|
|
Clear the cache when there are too many items to expire
|
|
d9d423e4
|
2013-04-03T23:53:32
|
|
Some stats
|
|
064236ca
|
2013-04-03T23:39:42
|
|
Per-object max size
|
|
ee12272d
|
2013-04-05T22:48:39
|
|
Global option setters
|
|
8842c75f
|
2013-04-03T22:30:07
|
|
What has science done.
|
|
c4e91d45
|
2013-04-03T20:57:30
|
|
Random eviction
|
|
5df18424
|
2013-04-01T19:38:23
|
|
lol this worked first try wtf
|
|
6b90e244
|
2013-04-01T19:53:49
|
|
Per-object filtering
|
|
359fc2d2
|
2013-01-08T17:07:25
|
|
update copyrights
|
|
1d009603
|
2012-12-09T02:40:16
|
|
orite C89
|
|
a35b3864
|
2012-12-09T02:31:39
|
|
Always check the result of git_mutex_lock
|
|
fcd03beb
|
2012-11-09T15:57:32
|
|
Fix a mutex/critical section leak
|
|
6ee68611
|
2012-09-10T21:29:07
|
|
cache: fix race condition
Example: a cached node is owned only by the cache (refcount == 1).
Thread A holds the lock and determines that the entry which should get
cached equals the node (git_oid_cmp(&node->oid, &entry->oid) == 0).
It frees the given entry to instead return the cached node to the user
(entry = node). Now, before Thread A happens to increment the refcount
of the node *outside* the cache lock, Thread B tries to store another
entry and hits the slot of the node before, decrements its refcount and
frees it *before* Thread A gets a chance to increment for the user.
git_cached_obj_incref(entry);
git_mutex_lock(&cache->lock);
{
git_cached_obj *node = cache->nodes[hash & cache->size_mask];
if (node == NULL) {
cache->nodes[hash & cache->size_mask] = entry;
} else if (git_oid_cmp(&node->oid, &entry->oid) == 0) {
git_cached_obj_decref(entry, cache->free_obj);
entry = node;
} else {
git_cached_obj_decref(node, cache->free_obj);
// Thread B is here
cache->nodes[hash & cache->size_mask] = entry;
}
}
git_mutex_unlock(&cache->lock);
// Thread A is here
/* increase the refcount again, because we are
* returning it to the user */
git_cached_obj_incref(entry);
|
|
c07d9c95
|
2012-08-09T15:33:04
|
|
oid: Explicitly include `oid.h` for the inlined CMP
|
|
3f035860
|
2012-06-07T22:43:03
|
|
misc: Fix warnings from PVS Studio trial
|
|
25f258e7
|
2012-04-23T09:21:15
|
|
Moving power-of-two bit utilities into util.h
|
|
0d0fa7c3
|
2012-03-16T15:56:01
|
|
Convert attr, ignore, mwindow, status to new errors
Also cleaned up some previously converted code that still had
little things to polish.
|
|
5e0de328
|
2012-02-13T17:10:24
|
|
Update Copyright header
Signed-off-by: schu <schu-github@schulog.org>
|
|
e4b4da14
|
2012-01-27T18:28:02
|
|
cache: Simplify locking mechanics
The object cache is mostly IO-bound, so it makes no sense to have a lock
per node.
|
|
3286c408
|
2011-10-28T14:51:13
|
|
global: Properly use `git__` memory wrappers
Ensure that all memory related functions (malloc, calloc, strdup, free,
etc) are using their respective `git__` wrappers.
|
|
71a4c1f1
|
2011-09-18T20:07:59
|
|
Merge pull request #384 from kiryl/warnings
Add more -W flags to CFLAGS
|
|
bb742ede
|
2011-09-19T01:54:32
|
|
Cleanup legal data
1. The license header is technically not valid if it doesn't have a
copyright signature.
2. The COPYING file has been updated with the different licenses used in
the project.
3. The full GPLv2 header in each file annoys me.
|
|
0b2c4061
|
2011-08-30T23:06:04
|
|
CMakefile: add -Wstrict-aliasing=2 and fix warnings
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
|
|
335d6c99
|
2011-05-17T01:46:07
|
|
cache: Drop cuckoo hashing
Now we use a simple closed-addressing cache. Cuckoo hashing was creating
too many issues with race conditions. Fuck that.
Let's see what happens performance wise, we may have to roll back or
come up with another way to implement an efficient multi-threaded cache.
|
|
3de79280
|
2011-05-17T00:51:52
|
|
cache: Fix deadlock
Do not try to adquire the same node lock twice when the cuckoo hashing
resolves to the same node.
|
|
4edf3e09
|
2011-05-15T23:45:24
|
|
Return success code on `git_cache_init`
|
|
81201a4c
|
2011-05-15T06:57:34
|
|
Move cache.c to the new error handling
|
|
3abe3bba
|
2011-05-14T16:05:33
|
|
Move repository.c to the new error handling
|
|
99baacfb
|
2011-03-21T19:27:45
|
|
Fix MSVC warnings
|
|
72a3fe42
|
2011-03-18T19:38:49
|
|
I broke your bindings
Hey. Apologies in advance -- I broke your bindings.
This is a major commit that includes a long-overdue redesign of the
whole object-database structure. This is expected to be the last major
external API redesign of the library until the first non-alpha release.
Please get your bindings up to date with these changes. They will be
included in the next minor release. Sorry again!
Major features include:
- Real caching and refcounting on parsed objects
- Real caching and refcounting on objects read from the ODB
- Streaming writes & reads from the ODB
- Single-method writes for all object types
- The external API is now partially thread-safe
The speed increases are significant in all aspects, specially when
reading an object several times from the ODB (revwalking) and when
writing big objects to the ODB.
Here's a full changelog for the external API:
blob.h
------
- Remove `git_blob_new`
- Remove `git_blob_set_rawcontent`
- Remove `git_blob_set_rawcontent_fromfile`
- Rename `git_blob_writefile` -> `git_blob_create_fromfile`
- Change `git_blob_create_fromfile`:
The `path` argument is now relative to the repository's working dir
- Add `git_blob_create_frombuffer`
commit.h
--------
- Remove `git_commit_new`
- Remove `git_commit_add_parent`
- Remove `git_commit_set_message`
- Remove `git_commit_set_committer`
- Remove `git_commit_set_author`
- Remove `git_commit_set_tree`
- Add `git_commit_create`
- Add `git_commit_create_v`
- Add `git_commit_create_o`
- Add `git_commit_create_ov`
tag.h
-----
- Remove `git_tag_new`
- Remove `git_tag_set_target`
- Remove `git_tag_set_name`
- Remove `git_tag_set_tagger`
- Remove `git_tag_set_message`
- Add `git_tag_create`
- Add `git_tag_create_o`
tree.h
------
- Change `git_tree_entry_2object`:
New signature is `(git_object **object_out, git_repository *repo, git_tree_entry *entry)`
- Remove `git_tree_new`
- Remove `git_tree_add_entry`
- Remove `git_tree_remove_entry_byindex`
- Remove `git_tree_remove_entry_byname`
- Remove `git_tree_clearentries`
- Remove `git_tree_entry_set_id`
- Remove `git_tree_entry_set_name`
- Remove `git_tree_entry_set_attributes`
object.h
------------
- Remove `git_object_new
- Remove `git_object_write`
- Change `git_object_close`:
This method is now *mandatory*. Not closing an object causes a
memory leak.
odb.h
-----
- Remove type `git_rawobj`
- Remove `git_rawobj_close`
- Rename `git_rawobj_hash` -> `git_odb_hash`
- Change `git_odb_hash`:
New signature is `(git_oid *id, const void *data, size_t len, git_otype type)`
- Add type `git_odb_object`
- Add `git_odb_object_close`
- Change `git_odb_read`:
New signature is `(git_odb_object **out, git_odb *db, const git_oid *id)`
- Change `git_odb_read_header`:
New signature is `(size_t *len_p, git_otype *type_p, git_odb *db, const git_oid *id)`
- Remove `git_odb_write`
- Add `git_odb_open_wstream`
- Add `git_odb_open_rstream`
odb_backend.h
-------------
- Change type `git_odb_backend`:
New internal signatures are as follows
int (* read)(void **, size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* read_header)(size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* writestream)(struct git_odb_stream **, struct git_odb_backend *, size_t, git_otype)
int (* readstream)( struct git_odb_stream **, struct git_odb_backend *, const git_oid *)
- Add type `git_odb_stream`
- Add enum `git_odb_streammode`
Signed-off-by: Vicent Marti <tanoku@gmail.com>
|
|
bb3de0c4
|
2011-03-16T21:35:51
|
|
Thread safe cache
|