|
a0a1b19a
|
2015-10-14T19:31:54
|
|
odb: Prioritize alternate backends
For most real use cases, repositories with alternates use them as main
object storage. Checking the alternate for objects before the main
repository should result in measurable speedups.
Because of this, we're changing the sorting algorithm to prioritize
alternates *in cases where two backends have the same priority*. This
means that the pack backend for the alternate will be checked before the
pack backend for the main repository *but* both of them will be checked
before any loose backends.
|
|
43820f20
|
2015-10-14T19:24:07
|
|
odb: Be smarter when refreshing backends
In the current implementation of ODB backends, each backend is tasked
with refreshing itself after a failed lookup. This is standard Git
behavior: we want to e.g. reload the packfiles on disk in case they have
changed and that's the reason we can't find the object we're looking
for.
This behavior, however, becomes pathological in repositories where
multiple alternates have been loaded. Given that each alternate counts
as a separate backend, a miss in the main repository (which can
potentially be very frequent in cases where object storage comes from
the alternate) will result in refreshing all its packfiles before we
move on to the alternate backend where the object will most likely be
found.
To fix this, the code in `odb.c` has been refactored as to perform the
refresh of all the backends externally, once we've verified that the
object is nowhere to be found.
If the refresh is successful, we then perform the lookup sequentially
through all the backends, skipping the ones that we know for sure
weren't refreshed (because they have no refresh API).
The on-disk pack backend has been adjusted accordingly: it no longer
performs refreshes internally.
|
|
d3b29fb9
|
2015-10-01T00:50:37
|
|
refdb and odb backends must provide `free` function
As refdb and odb backends can be allocated by client code, libgit2
can’t know whether an alternative memory allocator was used, and thus
should not try to call `git__free` on those objects.
Instead, odb and refdb backend implementations must always provide
their own `free` functions to ensure memory gets freed correctly.
|
|
e5f9df7b
|
2015-06-29T21:45:04
|
|
odb: cast to long long for printf
|
|
9f3c18e2
|
2015-06-02T08:36:15
|
|
Fixed build warnings on Xcode 6.1
|
|
a6f2ceaf
|
2015-05-13T12:11:55
|
|
Merge pull request #3118 from libgit2/cmn/stream-size
odb: make the writestream's size a git_off_t
|
|
b0d7f329
|
2015-05-13T10:23:19
|
|
odb: reverse the default backend priorities
We currently first look in the loose object dir and then in the packs
for objects. When performing operations on recent history this has a
higher likelihood of hitting, but when we deal with operations which
look further back into the past, we start spending a large amount of
time getting ENOTENT from `access`.
Reversing the priorities means that long-running operations can get to
their objects faster, as we can look at the index data we have in memory
(or rather mapped) to figure out whether we have an object, which is
faster than going out to the filesystem.
The packed backend already implements an optimistic read algorithm by
first looking at the packs we know about and only going out to disk to
referesh if the object is not found which means that in the case where
we do have the object (which will be in the majority for anything that
traverses the graph) we can avoid going to to disk entirely to determine
whether an object exists.
Operations which look at recent history may take a slight impact, but
these would be operations which look a lot less at object and thus take
less time regardless.
|
|
77b339f7
|
2015-05-12T13:06:33
|
|
odb: make the writestream's size a git_off_t
Restricting files to size_t is a silly limitation. The loose backend
writes to a file directly, so there is no issue in using 63 bits for the
size.
We still assume that the header is going to fit in 64 bytes, which does
mean quite a bit smaller files due to the run-length encoding, but it's
still a much larger size than you would want Git to handle.
|
|
7dd22538
|
2015-05-11T10:19:25
|
|
centralizing all IO buffer size values
|
|
f1453c59
|
2015-02-12T12:19:37
|
|
Make our overflow check look more like gcc/clang's
Make our overflow checking look more like gcc and clang's, so that
we can substitute it out with the compiler instrinsics on platforms
that support it. This means dropping the ability to pass `NULL` as
an out parameter.
As a result, the macros also get updated to reflect this as well.
|
|
15d54fdd
|
2015-02-10T22:34:03
|
|
odb__hashlink: check st.st_size before casting
|
|
392702ee
|
2015-02-09T23:41:13
|
|
allocations: test for overflow of requested size
Introduce some helper macros to test integer overflow from arithmetic
and set error message appropriately.
|
|
c251f3bb
|
2014-12-08T16:05:47
|
|
win32: remember to cleanup our hash_ctx
|
|
e0156651
|
2014-11-21T13:50:46
|
|
odb: `git_odb_object` contents are never NULL
This is a contract that we made in the library and that we need to uphold. The
contents of a blob can never be NULL because several parts of the library (including
the filter and attributes code) expect `git_blob_rawcontent` to always return a
valid pointer.
|
|
e1ac0101
|
2014-11-08T14:40:53
|
|
odb: hardcode the empty blob and tree
git hardocodes these as objects which exist regardless of whether they
are in the odb and uses them in the shell interface as a way of
expressing the lack of a blob or tree for one side of e.g. a diff.
In the library we use each language's natural way of declaring a lack of
value which makes a workaround like this unnecessary. Since git uses it,
it does however mean each shell application would need to perform this
check themselves.
This makes it common work across a range of applications and an issue
with compatibility with git, which fits right into what the library aims
to provide.
Thus we introduce the hard-coded empty blob and tree in the odb
frontend. These hard-coded objects are checked for before going to the
backends, but after the cache check, which means the second time they're
used, they will be treated as normal cached objects instead of creating
new ones.
|
|
530594c0
|
2014-05-23T05:53:41
|
|
odb: clear backend errors on successful read
We go through the different backends in order, so it's not an error if
at least one of the backends has the data we want.
|
|
bc91347b
|
2014-04-30T11:16:31
|
|
Fix remaining init_options inconsistencies
There were a couple of "init_opts()" functions a few more cases
of structure initialization that I somehow missed.
|
|
48e60ae7
|
2014-04-21T11:23:29
|
|
Don't redefine the same callback types, their signatures may change
|
|
3ab57816
|
2014-03-31T23:23:32
|
|
Merge pull request #2178 from libgit2/rb/fix-short-id
Fix git_odb_short_id and git_odb_exists_prefix bugs
|
|
31a14982
|
2014-03-21T17:36:34
|
|
Fix wrong assertion
Fixes issue #2196
|
|
89499078
|
2014-03-10T10:53:39
|
|
Fix a number of git_odb_exists_prefix bugs
The git_odb_exists_prefix API was not dealing correctly when a
later backend returned GIT_ENOTFOUND even if an earlier backend
had found the object.
Additionally, the unit tests were not properly exercising the API
and had a couple mistakes in checking the results.
Lastly, since the backends are not expected to behavior correctly
unless all bytes of the short id are zero except for the prefix,
this makes the ODB prefix APIs explicitly clear out the extra
bytes so the user doesn't have to be as careful.
|
|
b9f81997
|
2014-03-05T21:49:23
|
|
Added function-based initializers for every options struct.
The basic structure of each function is courtesy of arrbee.
|
|
a064dc2d
|
2014-03-06T00:47:05
|
|
Merge pull request #2159 from libgit2/rb/odb-exists-prefix
Add ODB API to check for existence by prefix and object id shortener
|
|
26875825
|
2014-03-05T13:06:22
|
|
Check short OID len in odb, not in backends
|
|
7bd2f401
|
2014-03-05T11:35:47
|
|
ODB writing fails gracefully when unsupported
If no ODB backends support writing, we should fail gracefully.
|
|
f5753999
|
2014-03-04T15:34:23
|
|
Add exists_prefix to ODB backend and ODB API
|
|
ae3b6d61
|
2014-01-12T23:31:13
|
|
odb: handle NULL pointers passed to git_odb_stream_free
Signed-off-by: Brodie Rao <brodie@sf.io>
|
|
dd64c71c
|
2013-11-04T14:50:25
|
|
Allow backend consumers to specify file mode
|
|
5c50f22a
|
2013-10-28T09:25:44
|
|
Merge pull request #1891 from libgit2/cmn/fix-thin-packs
Add support for thin packs
|
|
98fec8a9
|
2013-10-22T16:05:47
|
|
Implement `git_odb_object_dup`
|
|
0b33fca0
|
2013-10-02T13:39:35
|
|
indexer: fix thin packs
When given an ODB from which to read objects, the indexer will attempt
to inject the missing bases at the end of the pack and update the
header and trailer to reflect the new contents.
|
|
92d19d16
|
2013-09-21T09:34:03
|
|
Merge pull request #1840 from linquize/warning
Fix warning
|
|
66566516
|
2013-09-08T17:15:42
|
|
Fix warning
|
|
a9f51e43
|
2013-09-11T22:00:36
|
|
Merge git_buf and git_buffer
This makes the git_buf struct that was used internally into an
externally available structure and eliminates the git_buffer.
As part of that, some of the special cases that arose with the
externally used git_buffer were blended into the git_buf, such as
being careful about git_buf objects that may have a NULL ptr and
allowing for bufs with a valid ptr and size but zero asize as a
way of referring to externally owned data.
|
|
2a7d224f
|
2013-09-10T16:33:32
|
|
Extend public filter api with filter lists
This moves the git_filter_list into the public API so that users
can create, apply, and dispose of filter lists. This allows more
granular application of filters to user data outside of libgit2
internals.
This also converts all the internal usage of filters to the public
APIs along with a few small tweaks to make it easier to use the
public git_buffer stuff alongside the internal git_buf.
|
|
85d54812
|
2013-08-28T16:44:04
|
|
Create public filter object and use it
This creates include/sys/filter.h with a basic definition of a
git_filter and then converts the internal code to use it. There
are related internal objects (git_filter_list) that we will want
to publish at some point, but this is a first step.
|
|
8cf80525
|
2013-09-11T20:13:59
|
|
errors: Fix format of some error messages
|
|
031f3f80
|
2013-09-07T22:39:05
|
|
odb: Error when streaming in too [few|many] bytes
|
|
4047950f
|
2013-08-29T14:19:34
|
|
odb: Prevent stream_finalize_write() from overwriting
Now that #1785 is merged, git_odb_stream_finalize_write() calculates the object id before invoking the odb backend.
This commit gives a chance to the backend to check if it already knows this object.
|
|
b1a6c316
|
2013-08-30T17:36:00
|
|
odb: Move the auto refresh logic to the pack backend
Previously, `git_object_read()`, `git_object_read_prefix()` and
`git_object_exists()` were implementing an auto refresh logic. When the
expected object couldn't be found in any backend, a call to
`git_odb_refresh()` was triggered and the lookup was once again performed
against all backends.
This commit removes this auto-refresh logic from the odb layer and pushes
it down into the pack-backend (as it's the only one currently exposing
a `refresh()` endpoint).
|
|
a12e069a
|
2013-08-30T16:31:52
|
|
odb: Honor the non refreshing capability of a backend
|
|
090a07d2
|
2013-08-17T02:12:04
|
|
odb: avoid hashing twice in and edge case
If none of the backends support direct writes and we must stream the
whole file, we already know what the object's id should be; so use the
stream's functions directly, bypassing the frontend's hashing and
overwriting of our existing id.
|
|
fe0c6d4e
|
2013-08-17T01:41:08
|
|
odb: make it clearer that the id is calculated in the frontend
The frontend is in charge of calculating the id of the objects. Thus
the backends should treat it as a read-only value. The positioning in
the function signature made it seem as though it was an output
parameter.
Make the id const and move it from the front to behind the subject
(backend or stream).
|
|
8380b39a
|
2013-08-15T14:29:39
|
|
odb: perform the stream hashing in the frontend
Hash the data as it's coming into the stream and tell the backend what
its name is when finalizing the write. This makes it consistent with
the way a plain git_odb_write() performs the write.
|
|
376e6c9f
|
2013-08-15T13:48:35
|
|
odb: wrap the stream reading and writing functions
This is in preparation for moving the hashing to the frontend, which
requires us to handle the incoming data before passing it to the
backend's stream.
|
|
e54cfb9b
|
2013-08-12T11:50:27
|
|
odb: free object data when id is ambiguous
By the time we recognise this as an ambiguous id, the object's data
has been loaded into memory. Free it when returning EABMIGUOUS.
|
|
c6451624
|
2013-07-15T16:00:07
|
|
Fix some more memory leaks in error path
|
|
6de9b2ee
|
2013-06-12T21:10:33
|
|
util: It's called `memzero`
|
|
3e9e6cda
|
2013-06-07T09:54:33
|
|
Add safe memset and use it
This adds a `git__memset` routine that will not be optimized away
and updates the places where I memset() right before a free() call
to use it.
|
|
f658dc43
|
2013-05-31T14:09:58
|
|
Zero memory for major objects before freeing
By zeroing out the memory when we free larger objects (i.e. those
that serve as collections of other data, such as repos, odb, refdb),
I'm hoping that it will be easier for libgit2 bindings to find
errors in their object management code.
|
|
03c28d92
|
2013-05-06T06:45:53
|
|
Merge pull request #1526 from arrbee/cleanup-error-return-without-msg
Make sure error messages are set for most error returns
|
|
dfec726b
|
2013-05-03T23:30:54
|
|
odb: Do not error out if an alternate ODB is missing
|
|
f063f578
|
2013-05-01T14:48:35
|
|
Catch some odd odb backend corner case errors
There are some cases, particularly where no loaded ODB backends
support a particular operation, where we would return an error
code without having set an error. This catches those cases and
reports that no ODB backends support the operation in question.
|
|
cd2ed9f0
|
2013-04-30T04:02:52
|
|
Merge pull request #1518 from arrbee/export-oid-comparison
Remove most inlines from the public API
|
|
b7f167da
|
2013-04-29T13:52:12
|
|
Make git_oid_cmp public and add git_oid__cmp
|
|
c8a4e8a5
|
2013-04-29T11:14:56
|
|
don't use uninitialized struct stat in win32
|
|
78606263
|
2013-04-15T00:05:44
|
|
Add callback to git_objects_table
This adds create and free callback to the git_objects_table so
that more of the creation and destruction of objects can be table
driven instead of using switch statements. This also makes the
semantics of certain object creation functions consistent so that
we can make better use of function pointers. This also fixes a
theoretical error case where an object allocation fails and we
end up storing NULL into the cache.
|
|
8842c75f
|
2013-04-03T22:30:07
|
|
What has science done.
|
|
5df18424
|
2013-04-01T19:38:23
|
|
lol this worked first try wtf
|
|
0edad3cc
|
2013-04-22T16:41:56
|
|
Merge branch 'development' into vmg/dupe-odb-backends
Conflicts:
src/odb.c
|
|
4ef2c79c
|
2013-04-22T16:37:40
|
|
odb: Disable inode checks for Win32
|
|
83cc70d9
|
2013-04-19T12:48:33
|
|
Move odb_backend implementors stuff into git2/sys
This moves some of the odb_backend stuff that is related to the
internals of an odb_backend implementation into include/git2/sys.
Some of the stuff related to streaming I left in include/git2
because it seemed like it would be reasonably needed by a normal
user who wanted to stream objects into and out of the ODB.
Also, I added APIs for traversing the list of backends so that
some of the tests would not need to access ODB internals.
|
|
a29c6b5f
|
2013-04-19T23:51:18
|
|
odb: Do not allow duplicate on-disk backends
|
|
f5e28202
|
2013-03-25T13:38:43
|
|
opts: allow configuration of odb cache size
Currently, the odb cache has a fixed size of 128 slots as defined by
GIT_DEFAULT_CACHE_SIZE. Allow users to set the size of the cache via
git_libgit2_opts().
Fixes #1035.
|
|
10c06114
|
2013-03-17T04:46:46
|
|
Several warnings detected by static code analyzer fixed
Implicit type conversion argument of function to size_t type
Suspicious sequence of types castings: size_t -> int -> size_t
Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)'
Unsigned type is never < 0
|
|
8fe6bc5c
|
2013-01-10T15:43:08
|
|
odb: Refresh on `exists` query too
|
|
891a4681
|
2013-01-04T17:42:41
|
|
dat errorcode
|
|
4a863c06
|
2013-01-03T20:36:26
|
|
Sane refresh logic
All the ODB backends have a specific refresh interface. When reading an
object, first we attempt every single backend: if the read fails, then
we refresh all the backends and retry the read one more time to see if
the object has appeared.
|
|
359fc2d2
|
2013-01-08T17:07:25
|
|
update copyrights
|
|
4d185dd9
|
2012-12-19T14:30:06
|
|
odb: check if object exists before writing
Update the procondition of git_odb_backend::write.
It may now be assumed that the object has already been hashed.
|
|
0249a503
|
2012-12-07T09:40:21
|
|
Merge pull request #1091 from carlosmn/stream-object
Indexer speedup with large objects
|
|
c7231c45
|
2012-11-30T16:31:42
|
|
Deploy GITERR_CHECK_VERSION
|
|
55f6f21b
|
2012-11-29T19:59:18
|
|
Deploy versioned git_odb_backend structure
|
|
f56f8585
|
2012-11-19T22:23:16
|
|
indexer: use the packfile streaming API
The new API allows us to read the object bit by bit from the packfile,
instead of needing it all at once in the packfile. This also allows us
to hash the object as it comes in from the network instead of having
to try to read it all and failing repeatedly for larger objects.
This is only the first step, but it already shows huge improvements
when dealing with objects over a few megabytes in size. It reduces the
memory needs in some cases, but delta objects still need to be
completely in memory and the old inefficent method is still used for
that.
|
|
9507a434
|
2012-11-28T10:47:10
|
|
odb: Add `git_odb_add_disk_alternate`
Loads a disk alternate by path to the ODB. Mimics the
`GIT_ALTERNATE_OBJECT_DIRECTORIES` shell var.
|
|
2e76b5fc
|
2012-11-27T09:49:16
|
|
API updates for odb.h
|
|
85e7efa1
|
2012-11-14T13:35:43
|
|
odb: recursively load alternates
The maximum depth is 5, like in git
|
|
603bee07
|
2012-11-12T19:22:49
|
|
Remove git_hash_ctx_new - callers now _ctx_init()
|
|
d6fb0924
|
2012-11-05T12:37:15
|
|
Win32 CryptoAPI and CNG support for SHA1
|
|
09cc0b92
|
2012-11-05T11:33:10
|
|
create callback to handle packs from fetch, move the indexer to odb_pack
|
|
edca6c8f
|
2012-07-01T19:44:22
|
|
git_odb_object_free: don't segfault w/ arg == NULL
|
|
addc9be4
|
2012-09-26T17:21:32
|
|
Fix error hashing empty file.
|
|
e8776d30
|
2012-09-16T00:10:07
|
|
odb: don't overflow the link path buffer
Allocate a buffer large enough to store the path plus the terminator
instead of letting readlink write beyond the end.
|
|
9be2261e
|
2012-09-13T09:24:12
|
|
Merge pull request #927 from arrbee/hashfile-with-filters
Add git_repository_hashfile to hash with filters
|
|
13faa77c
|
2012-09-13T17:57:45
|
|
Fix -Wuninitialized warning
|
|
a13fb55a
|
2012-09-11T17:26:21
|
|
Add tests and improve param checks
Fixed some minor `git_repository_hashfile` issues:
- Fixed incorrect doc (saying that repo could be NULL)
- Added checking of object type value to acceptable ones
- Added more tests for various parameter permutations
|
|
c859184b
|
2012-09-11T23:05:24
|
|
Properly handle p_reads
|
|
c6ac28fd
|
2012-09-10T12:24:05
|
|
Reorg internal odb read header and object lookup
Often `git_odb_read_header` will "fail" and have to read the
entire object into memory instead of just the header. When this
happens, the object is loaded and then disposed of immediately,
which makes it difficult to efficiently use the header information
to decide if the object should be loaded (since attempting to do
so will often result in loading the object twice).
This commit takes the existing code and reorganizes it to have
two new functions:
- `git_odb__read_header_or_object` which acts just like the old
read header function except that it returns the object, too, if
it was forced to load the whole thing. It then becomes the
callers responsibility to free the `git_odb_object`.
- `git_object__from_odb_object` which was extracted from the old
`git_object_lookup` and creates a subclass of `git_object` from
an existing `git_odb_object` (separating the ODB lookup from the
`git_object` creation). This allows you to use the first header
reading function efficiently without instantiating the
`git_odb_object` twice.
There is no net change to the behavior of any of the existing
functions, but this allows internal code to tap into the ODB
lookup and object creation to be more efficient.
|
|
60b9d3fc
|
2012-09-05T15:00:40
|
|
Implement filters for status/diff blobs
This adds support to diff and status for running filters (a la crlf)
on blobs in the workdir before computing SHAs and before generating
text diffs. This ended up being a bit more code change than I had
thought since I had to reorganize some of the diff logic to minimize
peak memory use when filtering blobs in a diff.
This also adds a cap on the maximum size of data that will be loaded
to diff. I set it at 512Mb which should match core git. Right now
it is a #define in src/diff.h but it could be moved into the public
API if desired.
|
|
0e9f2fce
|
2012-09-06T11:35:09
|
|
odb: mark unused variable
|
|
c49d328c
|
2012-08-27T09:59:13
|
|
Expose a malloc function to 3rd party ODB backends
|
|
c07d9c95
|
2012-08-09T15:33:04
|
|
oid: Explicitly include `oid.h` for the inlined CMP
|
|
51e1d808
|
2012-08-06T12:41:08
|
|
Merge remote-tracking branch 'arrbee/tree-walk-fixes' into development
Conflicts:
src/notes.c
src/transports/git.c
src/transports/http.c
src/transports/local.c
tests-clar/odb/foreach.c
|
|
5dca2010
|
2012-08-03T17:08:01
|
|
Update iterators for consistency across library
This updates all the `foreach()` type functions across the library
that take callbacks from the user to have a consistent behavior.
The rules are:
* A callback terminates the loop by returning any non-zero value
* Once the callback returns non-zero, it will not be called again
(i.e. the loop stops all iteration regardless of state)
* If the callback returns non-zero, the parent fn returns GIT_EUSER
* Although the parent returns GIT_EUSER, no error will be set in
the library and `giterr_last()` will return NULL if called.
This commit makes those changes across the library and adds tests
for most of the iteration APIs to make sure that they follow the
above rules.
|
|
b8457baa
|
2012-07-24T07:57:58
|
|
portability: Improve x86/amd64 compatibility
|
|
521aedad
|
2012-06-05T14:48:51
|
|
odb: add git_odb_foreach()
Go through each backend and list every objects that exists in
them. This allows fsck-like uses.
|
|
c06e0003
|
2012-06-20T01:41:30
|
|
odb: don't leak when detecting id ambiguity
If we find several objects with the same prefix, we need to free the
memory where we stored the earlier object. Keep track of the raw.data
pointer across read_prefix calls and free it if we find another
object.
|
|
904b67e6
|
2012-05-18T01:48:50
|
|
errors: Rename error codes
|
|
e172cf08
|
2012-05-18T01:21:06
|
|
errors: Rename the generic return codes
|
|
24634c6f
|
2012-05-12T15:01:39
|
|
Handle duplicate objects from different backends in git_odb_read_prefix().
|