|
9d117e38
|
2020-02-17T21:28:13
|
|
midx: Add a way to write multi-pack-index files
This change adds the git_midx_writer_* functions to allow to
write and create `multi-pack-index` files from `.idx`/`.pack` files.
Part of: #5399
|
|
366115e0
|
2021-08-27T04:06:31
|
|
Review feedback
|
|
fff209c4
|
2020-02-17T21:28:13
|
|
midx: Add a way to write multi-pack-index files
This change adds the git_midx_writer_* functions to allow to
write and create `multi-pack-index` files from `.idx`/`.pack` files.
Part of: #5399
|
|
37763d38
|
2020-12-05T15:26:59
|
|
threads: rename git_atomic to git_atomic32
Clarify the `git_atomic` type and functions now that we have a 64 bit
version as well (`git_atomic64`).
|
|
322c15ee
|
2020-08-01T18:24:41
|
|
Make the pack and mwindow implementations data-race-free
This change fixes a packfile heap corruption that can happen when
interacting with multiple packfiles concurrently across multiple
threads. This is exacerbated by setting a lower mwindow open file limit.
This change:
* Renames most of the internal methods in pack.c to clearly indicate
that they expect to be called with a certain lock held, making
reasoning about the state of locks a bit easier.
* Splits the `git_pack_file` lock in two: the one in `git_pack_file`
only protects the `index_map`. The protection to `git_mwindow_file` is
now in that struct.
* Explicitly checks for freshness of the `git_pack_file` in
`git_packfile_unpack_header`: this allows the mwindow implementation
to close files whenever there is enough cache pressure, and
`git_packfile_unpack_header` will reopen the packfile if needed.
* After a call to `p_munmap()`, the `data` and `len` fields are poisoned
with `NULL` to make use-after-frees more evident and crash rather than
being open to the possibility of heap corruption.
* Adds a test case to prevent this from regressing in the future.
Fixes: #5591
|
|
7cd0bf65
|
2020-04-05T18:26:52
|
|
pack: use GIT_ASSERT
|
|
005e7715
|
2020-02-23T22:28:52
|
|
multipack: Introduce a parser for multi-pack-index files
This change is the first in a series to add support for git's
multi-pack-index. This should speed up large repositories significantly.
Part of: #5399
|
|
ba59a4a2
|
2020-04-01T12:34:16
|
|
Making get_delta_base() conform to the general error-handling pattern
This makes get_delta_base() return the error code as the return value
and the delta base as an out-parameter.
|
|
0edc26c8
|
2019-12-13T18:54:13
|
|
pack: refactor streams to use `git_zstream`
While we do have a `git_zstream` abstraction that encapsulates all the
calls to zlib as well as its error handling, we do not use it in our
pack file code. Refactor it to make the code a lot easier to understand.
|
|
6460e8ab
|
2019-06-23T18:13:29
|
|
internal: use off64_t instead of git_off_t
Prefer `off64_t` internally.
|
|
351eeff3
|
2019-01-23T10:42:46
|
|
maps: use uniform lifecycle management functions
Currently, the lifecycle functions for maps (allocation, deallocation, resize)
are not named in a uniform way and do not have a uniform function signature.
Rename the functions to fix that, and stick to libgit2's naming scheme of saying
`git_foo_new`. This results in the following new interface for allocation:
- `int git_<t>map_new(git_<t>map **out)` to allocate a new map, returning an
error code if we ran out of memory
- `void git_<t>map_free(git_<t>map *map)` to free a map
- `void git_<t>map_clear(git<t>map *map)` to remove all entries from a map
This commit also fixes all existing callers.
|
|
168fe39b
|
2018-11-28T14:26:57
|
|
object_type: use new enumeration names
Use the new object_type enumeration names within the codebase.
|
|
c8ee5270
|
2017-12-08T09:05:58
|
|
pack: rename `git_packfile_stream_free`
The function `git_packfile_stream_free` frees all state of the packfile
stream without freeing the structure itself. This naming makes it hard
to spot whether it will try to free the pointer itself or not, causing
potential future errors. Due to this reason, we have decided to name a
function freeing state without freeing the actual struture a "dispose"
function.
Rename `git_packfile_stream_free` to `git_packfile_stream_dispose` as a
first example following this rule.
|
|
0c7f49dd
|
2017-06-30T13:39:01
|
|
Make sure to always include "common.h" first
Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.
This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.
This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.
|
|
bf339ab0
|
2017-01-21T14:51:31
|
|
indexer: introduce `git_packfile_close`
Encapsulation!
|
|
27051d4e
|
2016-07-22T13:34:19
|
|
odb: only freshen pack files every 2 seconds
Since writing multiple objects may all already exist in a single
packfile, avoid freshening that packfile repeatedly in a tight loop.
Instead, only freshen pack files every 2 seconds.
|
|
b644e223
|
2016-01-13T11:02:38
|
|
Make packfile_unpack_compressed a private API
|
|
b63b76e0
|
2014-10-12T11:42:31
|
|
Reorder some khash declarations
Keep the definitions in the headers, while putting the declarations in
the C files. Putting the function definitions in headers causes
them to be duplicated if you include two headers with them.
|
|
c8e02b87
|
2015-02-15T21:07:05
|
|
Remove extra semicolon outside of a function
Without this change, compiling with gcc and pedantic generates warning:
ISO C does not allow extra ‘;’ outside of a function.
|
|
b3b66c57
|
2014-06-18T17:13:12
|
|
Share packs across repository instances
Opening the same repository multiple times will currently open the same
file multiple times, as well as map the same region of the file multiple
times. This is not necessary, as the packfile data is immutable.
Instead of opening and closing packfiles directly, introduce an
indirection and allocate packfiles globally. This does mean locking on
each packfile open, but we already use this lock for the global mwindow
list so it doesn't introduce a new contention point.
|
|
a3ffbf23
|
2014-05-11T03:50:34
|
|
pack: expose a cached delta base directly
Instead of going through a special entry in the chain, let's pass it as
an output parameter.
|
|
2acdf4b8
|
2014-05-06T19:20:33
|
|
pack: unpack using a loop
We currently make use of recursive function calls to unpack an object,
resolving the deltas as we come back down the chain. This means that we
have unbounded stack growth as we look up objects in a pack.
This is now done in two steps: first we figure out what the dependency
chain is by looking up the delta bases until we reach a non-delta
object, pushing the information we need onto a stack and then we pop
from that stack and apply the deltas until there are no more left.
This version of the code does not make use of the delta base cache so it
is slower than what's in the mainline. A later commit will reintroduce
it.
|
|
a332e91c
|
2014-05-06T23:37:28
|
|
pack: use a cache for delta bases when unpacking
Bring back the use of the delta base cache for unpacking objects. When
generating the delta chain, we stop when we find a delta base in the
pack's cache and use that as the starting point.
|
|
8610487c
|
2014-01-23T23:28:28
|
|
Drop parsing pack filename SHA1 part, no one cares the filename
|
|
51a3dfb5
|
2013-11-01T16:31:02
|
|
pack: `__object_header` always returns unsigned values
|
|
3343b5ff
|
2013-10-31T22:59:42
|
|
Fix warning on win64
|
|
51e82492
|
2013-10-03T16:54:25
|
|
pack: move the object header function here
|
|
5d2d21e5
|
2013-04-16T15:00:43
|
|
Consolidate packfile allocation further
Rename git_packfile_check to git_packfile_alloc since it is now
being used more in that capacity. Fix the various places that use
it. Consolidate some repeated code in odb_pack.c related to the
allocation of a new pack_backend.
|
|
53607868
|
2013-04-15T00:09:03
|
|
Further threading fixes
This builds on the earlier thread safety work to make it so that
setting the odb, index, refdb, or config for a repository is done
in a threadsafe manner with minimized locking time. This is done
by adding a lock to the repository object and using it to guard
the assignment of the above listed pointers. The lock is only
held to assign the pointer value.
This also contains some minor fixes to the other work with pack
files to reduce the time that locks are being held to and fix an
apparently memory leak.
|
|
24c70804
|
2013-04-12T12:59:38
|
|
Add mutex around mapping and unmapping pack files
When I was writing threading tests for the new cache, the main
error I kept running into was a pack file having it's content
unmapped underneath the running thread. This adds a lock around
the routines that map and unmap the pack data so that threads can
effectively reload the data when they need it.
This also required reworking the error handling paths in a couple
places in the code which I tried to make consistent.
|
|
0e040c03
|
2013-03-03T14:50:47
|
|
indexer: use a hashtable for keeping track of offsets
These offsets are needed for REF_DELTA objects, which encode which
object they use as a base, but not where it lies in the packfile, so
we need a list.
These objects are mostly from older packfiles, before OFS_DELTA was
widely spread. The time spent in indexing these packfiles is greatly
reduced, though remains above what git is able to do.
|
|
96c9b9f0
|
2013-01-12T18:38:19
|
|
indexer: properly free the packfile resources
The indexer needs to call the packfile's free function so it takes care of
freeing the caches.
We still need to close the mwf descriptor manually so we can rename the
packfile into its final name on Windows.
|
|
80d647ad
|
2013-01-11T20:15:06
|
|
Revert "pack: packfile_free -> git_packfile_free and use it in the indexers"
This reverts commit f289f886cb81bb570bed747053d5ebf8aba6bef7, which
makes the tests fail on Windows. Revert until we can figure out a
solution.
|
|
d0b14cea
|
2013-01-11T18:21:09
|
|
pack: That declaration
|
|
c8f79c2b
|
2012-12-21T10:59:10
|
|
pack: abstract out the cache into its own functions
|
|
0ed75620
|
2012-12-21T13:46:48
|
|
pack: limit the amount of memory the base delta cache can use
Currently limited to 16MB (like git) and to objects up to 1MB in
size.
|
|
525d961c
|
2012-12-20T07:55:51
|
|
pack: refcount entries and add a mutex around cache access
|
|
c0f4a011
|
2012-12-19T16:48:12
|
|
pack: introduce a delta base cache
Many delta bases are re-used. Cache them to avoid inflating the same
data repeatedly.
This version doesn't limit the amount of entries to store, so it can
end up using a considerable amound of memory.
|
|
359fc2d2
|
2013-01-08T17:07:25
|
|
update copyrights
|
|
0249a503
|
2012-12-07T09:40:21
|
|
Merge pull request #1091 from carlosmn/stream-object
Indexer speedup with large objects
|
|
44f9f547
|
2012-11-30T13:33:30
|
|
pack: add git_packfile_resolve_header
To paraphrase @peff:
You can get both size and type from a packed object reasonably cheaply.
If you have:
* An object that is not a delta; both type and size are available in the
packfile header.
* An object that is a delta. The packfile type will be OBJ_*_DELTA, and
you have to resolve back to the base to find the real type. That means
potentially a lot of packfile index lookups, but each one is
relatively cheap. For the size, you inflate the first few bytes of the
delta, whose header will tell you the resulting size of applying the
delta to the base.
For simplicity, we just decompress the whole delta for now.
|
|
46635339
|
2012-11-19T22:22:33
|
|
pack: introduce a streaming API for raw objects
This allows us to take objects from the packfile as a stream instead
of having to keep it all in memory.
|
|
c3fb7d04
|
2012-11-27T15:00:49
|
|
Make git_odb_foreach_cb take const param
This makes the first OID param of the ODB callback a const pointer
and also propogates that change all the way to the backends.
|
|
60ecdf59
|
2012-09-10T11:48:21
|
|
pack: iterate objects in offset order
Compute the ordering on demand and persist until the index is freed.
|
|
b8457baa
|
2012-07-24T07:57:58
|
|
portability: Improve x86/amd64 compatibility
|
|
521aedad
|
2012-06-05T14:48:51
|
|
odb: add git_odb_foreach()
Go through each backend and list every objects that exists in
them. This allows fsck-like uses.
|
|
fa679339
|
2012-04-13T09:58:54
|
|
Add packfile_unpack_compressed() to the internal header
|
|
e1de726c
|
2012-03-12T22:55:40
|
|
Migrate ODB files to new error handling
This migrates odb.c, odb_loose.c, odb_pack.c and pack.c to
the new style of error handling. Also got the unix and win32
versions of map.c. There are some minor changes to other
files but no others were completely converted.
This also contains an update to filebuf so that a zeroed out
filebuf will not think that the fd (== 0) is actually open
(and inadvertently call close() on fd 0 if cleaned up).
Lastly, this was built and tested on win32 and contains a
bunch of fixes for the win32 build which was pretty broken.
|
|
5e0de328
|
2012-02-13T17:10:24
|
|
Update Copyright header
Signed-off-by: schu <schu-github@schulog.org>
|
|
01ad7b3a
|
2011-09-06T15:48:45
|
|
*: correct and codify various file permissions
The following files now have 0444 permissions:
- loose objects
- pack indexes
- pack files
- packs downloaded by fetch
- packs downloaded by the HTTP transport
And the following files now have 0666 permissions:
- config files
- repository indexes
- reflogs
- refs
This brings libgit2 more in line with Git.
Note that git_filebuf_commit() and git_filebuf_commit_at() have both
gained a new mode parameter.
The latter change fixes an important issue where filebufs created with
GIT_FILEBUF_TEMPORARY received 0600 permissions (due to mkstemp(3)
usage). Now we chmod() the file before renaming it into place.
Tests have been added to confirm that new commit, tag, and tree
objects are created with the right permissions. I don't have access to
Windows, so for now I've guarded the tests with "#ifndef GIT_WIN32".
|
|
87d9869f
|
2011-09-19T03:34:49
|
|
Tabify everything
There were quite a few places were spaces were being used instead of
tabs. Try to catch them all. This should hopefully not break anything.
Except for `git blame`. Oh well.
|
|
bb742ede
|
2011-09-19T01:54:32
|
|
Cleanup legal data
1. The license header is technically not valid if it doesn't have a
copyright signature.
2. The COPYING file has been updated with the different licenses used in
the project.
3. The full GPLv2 header in each file annoys me.
|
|
c1af5a39
|
2011-08-06T00:35:20
|
|
Implement cooperative caching
When indexing a file with ref deltas, a temporary cache for the
offsets has to be built, as we don't have an index file yet. If the
user takes the responsiblity for filling the cache, the packing code
will look there first when it finds a ref delta.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
|
|
a070f152
|
2011-07-29T01:08:02
|
|
Move pack functions to their own file
|
|
7d0cdf82
|
2011-07-09T02:25:01
|
|
Make packfile_unpack_header more generic
On the way, store the fd and the size in the mwindow file.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
|
|
b5b474dd
|
2011-07-28T11:45:46
|
|
Modify the given offset in git_packfile_unpack
The callers immediately throw away the offset, so we don't need any
logical changes in any of them. This will be useful for the indexer,
as it does need to know where the compressed data ends.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
|
|
c7c9e183
|
2011-07-07T10:17:40
|
|
Move the pack structs to an internal header
|