|
ce2bf7b7
|
2022-05-29T17:51:33
|
|
fix a bug in findwixt() which caused pack files with missing parent commits
The 'nskip' variable is supposed to reflect commits which are waiting on
the queue and have the 'skip' color. Only increment 'nskip' when adding
such commits to the queue.
Problem observed with got send -T and a tag pointing to a deleted branch.
Test to reproduce the bug written by op@.
|
|
d6a28ffe
|
2022-05-20T21:21:42
|
|
use random seeds for murmurhash2
change the three hardcoded seeds to fresh ones generated on demand via
arc4random. Suggested/fixed by and ok stsp@
|
|
17cfdba6
|
2022-05-20T21:19:30
|
|
include header
|
|
411cbec1
|
2022-05-20T09:31:25
|
|
shrink struct got_pack_meta a bit by removing the have_reused_delta flag
This flag can be expressed as m->reused_delta_offset != 0 because all
deltas in valid pack files will be written at a non-zero offset.
We allocate a huge number of these structs during packing, so every
little bit helps.
|
|
adb4bbb2
|
2022-05-20T08:40:46
|
|
reduce the amount of memory used for caching deltas during deltification
With files sorted properly for deltification we produce better deltas
but end up consuming more memory and risk running into OpenBSD ulimits
during packing. To compensate, reduce the threshold for the amount of
delta data we store in memory, spooling more deltas into the cache file.
ok op@
|
|
f8174ca5
|
2022-05-20T08:40:46
|
|
store a path hash instead of a verbatim path in pack meta data
This reduces memory use by gotadmin pack. The goal is to sort files
which share a path next to each other for deltification. A hash of
the path is good enough for this purpose and consumes less memory
than a verbatim copy of the path. Git does something similar.
ok op@
|
|
3e6ceea0
|
2022-05-20T08:40:46
|
|
fix paths stored in pack meta data, improving file deltification
The old code was broken and stored an empty path or filenames, instead
of a repository-relative path. Which means we didn't sort files for
deltification as was intended.
Fixing this provides much better deltas in large pack files written by
gotadmin pack -a. In my test case, pack size changed from 2GB to 1.5GB.
ok op@
|
|
17259bfa
|
2022-05-19T09:26:13
|
|
plug a small memleak on error in got_pack_create()
|
|
e93fb944
|
2022-05-10T11:34:16
|
|
map delta cache file into memory if possible while writing a pack file
with a fix from + ok op@
|
|
dc3fe1bf
|
2022-05-10T11:24:12
|
|
fix load_object_ids() such that packing tags works if zero commits are packed
reported by jrick and op
|
|
fae7e038
|
2022-05-07T11:50:56
|
|
run the search for deltas to reuse in got-read-pack
This significantly speeds up the deltification step of packing by
avoiding imsg traffic. gotadmin no longer requests individual raw
deltas from got-read-pack to check whether it can reuse them.
Instead, got-read-pack obtains a list of objects we want to pack,
and hands back the list of all deltas in its pack file which can be
reused. Messages are now batched such that imsg buffers are filled
as much as possible.
Another advantage is that deltas we are not going to reuse will
no longer be written to the delta cache file, saving disk space.
Before this patch, any raw delta candidate was written to the
delta cache file by got-read-pack, and the decision whether to
reuse the delta happened afterwards in the gotadmin process.
Code for reading individual raw deltas is now unused and could be
removed at some point.
ok op@
|
|
2f8438b0
|
2022-05-04T15:39:15
|
|
avoid 'remove unused' loop by storing excluded objects in a separate set
ok op@
|
|
f5e78e05
|
2022-05-04T15:39:15
|
|
avoid loop over the ID set which removes objects IDs with reused deltas
ok op@
|
|
2d9e6abf
|
2022-05-04T13:43:24
|
|
store deltas in compressed form while packing, both in memory and cache file
This reduces memory and disk space consumption during packing.
with tweaks + memleak on error fix from op@
ok op@
|
|
611e8e31
|
2022-05-01T11:47:21
|
|
avoid subtraction of values larger than int in qsort(3) comparison callbacks
tweak + ok tb@
|
|
d7b5a0e8
|
2022-04-20T14:00:12
|
|
inline struct got_object_id in struct got_object_qid
Saves us from doing a malloc/free call for every item on the list.
ok op@
|
|
cbc287dc
|
2022-04-19T20:08:41
|
|
reimplement object-ID set data structure on top of a hash table
Siphash suggested by jrick as a better alternative to murmurhash
for this use case.
with small fixes from and ok op@
|
|
70f8f24d
|
2022-04-14T15:05:19
|
|
speed up initial stage of packing by adding a "skip" commit color
The skip color marks boundary commits and their ancestors. Boundary commits
are reachable both via references which we want to exclude from the pack,
and via references which we want to include in the pack.
We continue processing commit history up to the point we are left with only
skip commits on the queue. This can speed up findtwixt() significantly and
avoids wrong results produced by the old algorithm which made no distinction
between "drop" and "skip".
This idea was first implemented by Michael Forney for git9:
https://git.9front.org/plan9front/plan9front/2e47badb88312c5c045a8042dc2ef80148e5ab47/commit.html
Michael's log message for git9 is reproduced below:
git/query: refactor graph painting algorithm (findtwixt, lca)
We now keep track of 3 sets during traversal:
- keep: commits we've reached from head commits
- drop: commits we've reached from tail commits
- skip: ancestors of commits in both 'keep' and 'drop'
Commits in 'keep' and/or 'drop' may be added later to the 'skip' set
if we discover later that they are part of a common subgraph of the
head and tail commits.
From these sets we can calculate the commits we are interested in:
lca commits are those in 'keep' and 'drop', but not in 'skip'.
findtwixt commits are those in 'keep', but not in 'drop' or 'skip'.
The "LCA" commit returned is a common ancestor such that there are no
other common ancestors that can reach that commit. Although there can
be multiple commits that meet this criteria, where one is technically
lower on the commit-graph than the other, these cases only happen in
complex merge arrangements and any choice is likely a decent merge
base.
Repainting is now done in paint() directly. When we find a boundary
commit, we switch our paint color to 'skip'. 'skip' painting does
not stop when it hits another color; we continue until we are left
with only 'skip' commits on the queue.
This fixes several mishandled cases in the current algorithm:
1. If we hit the common subgraph from tail commits first (if the tail
commit was newer than the head commit), we ended up traversing the
entire commit graph. This is because we couldn't distinguish
between 'drop' commits that were part of the common subgraph, and
those that were still looking for it.
2. If we traversed through an initial part of the common subgraph from
head commits before reaching it from tail commits, these commits
were returned from findtwixt even though they were also reachable
from tail commits.
3. In the same case as 2, we might end up choosing an incorrect
commit as the LCA, which is an ancestor of the real LCA.
|
|
bb6672b6
|
2022-04-14T11:51:32
|
|
make sure callers of got_object_idset_add() free data.
|
|
fbafdecf
|
2022-04-10T13:03:29
|
|
revert 03c03172 "drop a commit right away if it matches an excluded commit"
This change resulted in a full history walk even when no objects will
be added to the pack file. Fix this regression by reverting the change.
|
|
14dbbf48
|
2022-04-10T12:15:46
|
|
for clarity, move the coloring loop from findtwixt() into a separate function
|
|
1d765da3
|
2022-04-10T12:13:02
|
|
remove a pointless object-id dup/free dance in findtwixt()
|
|
57bc7b6d
|
2022-04-10T12:10:52
|
|
don't forget to call the cancel callback while coloring commits in findtwixt()
|
|
03c03172
|
2022-04-10T12:08:45
|
|
in findtwixt(), drop a commit right away if it matches an excluded commit
|
|
912a163e
|
2022-04-10T11:35:53
|
|
the obj_types array in pack_create.c is no longer useful, remove it
|
|
29e0594f
|
2022-04-09T17:34:51
|
|
make gotadmin pack -x option work with tag arguments
|
|
9d34261e
|
2022-04-07T20:55:39
|
|
in load_object_ids(), process "their" commits and tags in the same loop
No functional change, the end result is the same.
|
|
6863cbf9
|
2022-03-21T19:59:03
|
|
fix pack progress object counter for loose objects
Move pack progres object accounting to a single place. This makes
it easier to account for the case were only loose objects are packed.
A wrong amount of objects was reported before when packing loose ones.
|
|
c4e796b2
|
2022-03-21T16:08:41
|
|
in pack progress output, remove excluded objects from 'found' objects counter
|
|
cdeb891a
|
2022-03-21T15:52:15
|
|
fix a bug where 'gotadmin pack' packed too many objects unless -a was used
|
|
bfc73a47
|
2022-03-19T14:53:07
|
|
explicitly include <unistd.h> for close(2)
|
|
b8af7c06
|
2022-03-15T10:45:02
|
|
print additional progress information while packing
ok op@
|
|
9b576444
|
2022-03-14T13:22:20
|
|
cache a list of known pack index files when the repository is opened
Avoids overhead due to readdir calls while searching a pack index.
ok op@
|
|
e3f86256
|
2022-02-18T20:23:32
|
|
explicitly include <endian.h> for be32toh()
|
|
28526235
|
2022-02-13T00:12:04
|
|
fix pack.sh test failure from reuse-deltas patch by tweaking progress output
|
|
67fd6849
|
2022-02-13T00:10:25
|
|
reuse existing deltas when creating pack files
tested by thomas, naddy, and myself
|
|
72840534
|
2022-01-19T12:04:58
|
|
compress delta data from delta_cache directly into pack file
|
|
402a5ec1
|
2022-01-10T13:13:16
|
|
set a cap on the amount of memory we use to store encoded deltas
|
|
5060d5a1
|
2022-01-10T11:09:25
|
|
encode short deltas in memory instead of writing them to a temporary file
|
|
64a8571e
|
2022-01-07T23:32:27
|
|
map raw object files into memory while packing if possible
|
|
59b21794
|
2022-01-07T14:33:52
|
|
only open raw objects if necessary while writing out pack file data
significantly speeds up the "writing pack: " step of gotadmin pack
|
|
211cfef0
|
2022-01-05T19:57:10
|
|
use time-based rate-limiting for gotadmin progress output
Suggested by naddy some time ago.
ok tracey
|
|
22edbce7
|
2021-10-24T09:41:04
|
|
use up to 128 delta chain elements again; creates smaller packs at same speed
|
|
4f4d853e
|
2021-10-24T09:41:04
|
|
try only 3 delta base candidates instead of 10 to speed up packing
Tests by kn, thomas_adam and myself made on various repositories
indicate that 3 is a good choice. Tyring 10 deltas is much slower
and does not result in significantly smaller pack files.
|
|
a319ca8c
|
2021-10-15T10:36:12
|
|
move encode_delta() in pack_create.c to eliminate a forward declaration
|
|
74881701
|
2021-10-15T10:34:44
|
|
while packing, store encoded deltas in a temporary file instead of in memory
|
|
dc20764a
|
2021-10-15T09:30:29
|
|
limit delta chain length in newly created pack files to 32 deltas
Our former limit was 128 which is fairly high. Git uses 50 by default.
A smaller limit results in slightly larger pack files but makes both
packing and unpacking faster.
|
|
94dac27c
|
2021-10-15T09:24:56
|
|
raw object blocksize and read buffer were unused; remove them
|
|
d3c116bf
|
2021-10-15T09:10:14
|
|
cache raw objects in order to speed up gotadmin pack
|
|
cc7a354a
|
2021-10-15T07:15:00
|
|
reuse temporary files which were not used by got_object_raw_open()
|
|
600b755e
|
2021-10-14T20:30:26
|
|
avoid opening delta base objects in genpack() just to find their size
|
|
08347b73
|
2021-10-14T17:27:26
|
|
encode deltas in temporary files to avoid high memory usage
|
|
1d19226a
|
2021-10-13T18:48:15
|
|
fix two more error strings in pack_create.c using the wrong function name
|
|
f8b19efd
|
2021-10-13T11:09:15
|
|
use RB_TREE instead of STAILQ to manage packindex bloom filters; much faster
|
|
3af9de88
|
2021-09-22T13:32:37
|
|
fix 'got send' with tree objects which contain symlinks; reported by Omar
|
|
26960ff7
|
2021-09-14T09:52:49
|
|
make 'got send' properly send commits which are referenced only by tags
Problem reported by Omar Polo.
|
|
eca70f98
|
2021-09-03T09:51:31
|
|
fix 'got send' adding too many objects to the pack file in some cases
Load server-side tags before loading local commits. Otherwise objects
which are reachable via server-side tags will not be filtered out.
|
|
f8a36e22
|
2021-08-26T12:30:42
|
|
add 'got send' command for sending changes to remote repositories
Known to work against git-daemon and github Git server implementations.
Tests by abieber, naddy, jrick, and myself.
Man page additions reviewed by Lucas.
|
|
dc7edd42
|
2021-08-22T12:58:34
|
|
fix miscalculation of the final pack file size reported by got_pack_create()
|
|
07165b17
|
2021-07-01T14:57:10
|
|
cache object type in memory to speed up packing of objects referenced by tags
|
|
f4a2ff2d
|
2021-07-01T14:10:33
|
|
fix out-of-bounds access in 'gotadmin pack'; wrong array pointer in read_meta()
|
|
dbdddfee
|
2021-06-23T20:48:35
|
|
switch from SIMPLEQ to equivalent STAILQ macros
The singly-linked tail queue macros were added to OpenBSD 6.9 and
are more widely available on other systems.
ok stsp
|
|
08736cf9
|
2021-06-23T10:16:23
|
|
fix imsg header includes in pack_create.c
|
|
05118f5a
|
2021-06-22T19:37:20
|
|
implement gotadmin pack, indexpack, and listpack commands
|
|
e6bcace5
|
2021-06-22T19:34:53
|
|
initial port of git9's pack file creation code to gameoftrees; thank you, Ori!
|