Commit 770b91b114467c5e6b7d3edc884e0fef6475c0bc

brian m. carlson 2019-07-17T15:59:54

cache: evict items more efficiently When our object cache is full, we pick eight items (or the whole cache, if there are fewer) and evict them. For small cache sizes, this is fine, but when we're dealing with a large number of objects, we can repeatedly exhaust the cache and spend a large amount of time in git_oidmap_iterate trying to find items to evict. Instead, let's assume that if the cache gets full, we have a large number of objects that we're handling, and be more aggressive about evicting items. Let's remove one item for every 2048 items, but not less than 8. This causes us to scale our evictions in proportion to the size of the cache and significantly reduces the time we spend in git_oidmap_iterate. Before this change, a full pack of all the non-blob objects in the Linux repository took in excess of 30 minutes and spent 62.3% of total runtime in odb_read_1 and its children, and 44.3% of the time in git_oidmap_iterate. With this change, the same operation now takes 14 minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total time, whereas git_oidmap_iterate consists of 6.2%. Note that we do spend a little more time inflating objects and a decent amount more time in memcmp. However, overall, the time taken is significantly improved, and time in pack building is now dominated by git_delta_create_from_index (33.7%), which is what we would expect.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
diff --git a/src/cache.c b/src/cache.c
index 3128e40..32ba993 100644
--- a/src/cache.c
+++ b/src/cache.c
@@ -115,9 +115,12 @@ void git_cache_dispose(git_cache *cache)
 /* Called with lock */
 static void cache_evict_entries(git_cache *cache)
 {
-	size_t evict_count = 8, i;
+	size_t evict_count = git_cache_size(cache) / 2048, i;
 	ssize_t evicted_memory = 0;
 
+	if (evict_count < 8)
+		evict_count = 8;
+
 	/* do not infinite loop if there's not enough entries to evict  */
 	if (evict_count > git_cache_size(cache)) {
 		clear_cache(cache);