src/odb_loose.c


Log

Author Commit Date CI Message
Roberto Tyley c51065e3 2011-10-24T14:39:03 Tolerate zlib deflation with window size < 32Kb libgit2 currently identifies loose objects as corrupt if they've been deflated using a window size less than 32Kb, because the is_zlib_compressed_data() function doesn't recognise the header byte as a zlib header. This patch makes the method tolerant of all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice it's accuracy in distingushing the standard loose-object format from the experimental (now abandoned) format. It's based on a patch which has been merged into C-Git master branch: https://github.com/git/git/commit/7f684a2aff636f44a506 On memory constrained systems zlib may use a much smaller window size - working on Agit, I found that Android uses a 4KB window; giving a header byte of 0x48, not 0x78. Consequently all loose objects generated by the Android platform appear 'corrupt' :( It might appear that this patch changes isStandardFormat() to the point where it could incorrectly identify the experimental format as the standard one, but the two criteria (bitmask & checksum) can only give a false result for an experimental object where both of the following are true: 1) object size is exactly 8 bytes when uncompressed (bitmask) 2) [single-byte in-pack git type&size header] * 256 + [1st byte of the following zlib header] % 31 = 0 (checksum) As it happens, for all possible combinations of valid object type (1-4) and window bits (0-7), the only time when the checksum will be divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which, due the fields all Commit objects must contain, could never be as small as 8 bytes in size. Given this, the combination of the two criteria (bitmask & checksum) always correctly determines the buffer format, and is more tolerant than the previous version. References: Android uses a 4KB window for deflation: http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb258 Code snippet searching for false positives with the zlib checksum: https://gist.github.com/1118177 Change-Id: Ifd84cd2bd6b46f087c9984fb4cbd8309f483dec0
Vicent Marti c103d7b4 2011-09-29T15:49:28 odb: Pass compression settings to filebuf
Vicent Marti 8af4d074 2011-09-29T15:34:17 odb: Let users decide compression level for the loose ODB
Vicent Martí 71a4c1f1 2011-09-18T20:07:59 Merge pull request #384 from kiryl/warnings Add more -W flags to CFLAGS
Vicent Marti 87d9869f 2011-09-19T03:34:49 Tabify everything There were quite a few places were spaces were being used instead of tabs. Try to catch them all. This should hopefully not break anything. Except for `git blame`. Oh well.
Vicent Marti bb742ede 2011-09-19T01:54:32 Cleanup legal data 1. The license header is technically not valid if it doesn't have a copyright signature. 2. The COPYING file has been updated with the different licenses used in the project. 3. The full GPLv2 header in each file annoys me.
Sebastian Schuberth 1c3fac4d 2011-09-08T14:31:37 Add casts to get rid of some warnings when filling zlib structures
Kirill A. Shutemov d568d585 2011-08-30T23:55:22 CMakefile: add -Wmissing-prototypes and fix warnings Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Vicent Marti afeecf4f 2011-07-09T02:10:46 odb: Direct writes are back DIRECT WRITES ARE BACK AND FASTER THAN EVER. The streaming writer to the ODB was an overkill for the smaller objects like Commit and Tags; most of the streaming logic was taking too long. This commit makes Commits, Tags and Trees to be built-up in memory, and then written to disk in 2 pushes (header + data), instead of streaming everything. This is *always* faster, even for big files (since the git_filebuf class still does streaming writes when the memory cache overflows). This is also a gazillion lines of code smaller, because we don't have to precompute the final size of the object before starting the stream (this was kind of defeating the point of streaming, anyway). Blobs are still written with full streaming instead of loading them in memory, since this is still the fastest way. A new `git_buf` class has been added. It's missing some features, but it'll get there.
Vicent Marti f79026b4 2011-07-04T11:43:34 fileops: Cleanup Cleaned up the structure of the whole OS-abstraction layer. fileops.c now contains a set of utility methods for file management used by the library. These are abstractions on top of the original POSIX calls. There's a new file called `posix.c` that contains emulations/reimplementations of all the POSIX calls the library uses. These are prefixed with `p_`. There's a specific posix file for each platform (win32 and unix). All the path-related methods have been moved from `utils.c` to `path.c` and have their own prefix.
Vicent Marti f1d01851 2011-06-16T02:48:48 oid: Uniformize ncmp methods Drop redundant methods. The ncmp method is now public
Vicent Marti fa48608e 2011-06-16T02:36:21 oid: Rename methods Yeah. Finally. Fuck the old names, this ain't POSIX and they don't make any sense at all.
Marc Pegon c09093cc 2011-06-06T10:55:36 Renamed git_oid_match to git_oid_ncmp. As suggested by carlosmn, git_oid_ncmp would probably be a better name than git_oid_match, for it does the same as git_oid_cmp but only up to a certain amount of hex digits.
Vicent Marti d0323a5f 2011-06-01T21:25:56 short-oid: Cleanup
Marc Pegon aea8a638 2011-05-29T18:00:35 Implemented read_unique_short_oid method for loose backend.
Marc Pegon ecd6fdf1 2011-05-27T18:49:09 Added a read_unique_short_oid method to backends, to make it possible to find objects from sha1 prefixes in the future. Default implementations throw GIT_ENOTIMPLEMENTED for strict prefixes (i.e. length < GIT_OID_HEXSZ).
Vicent Marti 60e1b49a 2011-05-23T21:12:18 odb_loose: Reword errors
Jakob Pfender dfb12cd5 2011-05-19T12:28:46 odb_loose.c: Move to new error handling mechanism
Jakob Pfender f93f8ec5 2011-05-19T12:27:43 odb_loose.c: Return GIT_ENOMEM when allocation fails When trying to inflate a buffer, a GIT_ERROR was returned when malloc() failed. Fix this to return GIT_ENOMEM.
Carlos Martín Nieto d8e1d038 2011-05-06T12:47:21 Fix two warnings from Clang Both are about not reading the value stored in a variable. Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Sergey Nikishin a3ced637 2011-04-22T17:36:28 Fix going into infinite loop in read_header_loose() read_header_loose causes infinite loop on this file: $ cat ../libcppgit/bin/sample-repo/test_mailbox/.git/objects/8f/e274605cbc740a2a957f44b2722a8a73915a09 | base64 eAErKUpNVTAzYzA0MDAzMVHISUxKzSlmWLgkuyN5+rxr6juMPR2EmN8s7Vl9D6oiN7UkkcHJdLbl 7Z3N/oxfE0W8wrSbuFRkAwDFfBn1
Vicent Marti f6f72d7e 2011-03-23T18:44:53 Improve the ODB writing backend Temporary files when doing streaming writes are now stored inside the Objects folder, to prevent issues when moving files between disks/partitions. Add support for block writes to the ODB again (for those backends that cannot implement streaming).
Vicent Marti 72a3fe42 2011-03-18T19:38:49 I broke your bindings Hey. Apologies in advance -- I broke your bindings. This is a major commit that includes a long-overdue redesign of the whole object-database structure. This is expected to be the last major external API redesign of the library until the first non-alpha release. Please get your bindings up to date with these changes. They will be included in the next minor release. Sorry again! Major features include: - Real caching and refcounting on parsed objects - Real caching and refcounting on objects read from the ODB - Streaming writes & reads from the ODB - Single-method writes for all object types - The external API is now partially thread-safe The speed increases are significant in all aspects, specially when reading an object several times from the ODB (revwalking) and when writing big objects to the ODB. Here's a full changelog for the external API: blob.h ------ - Remove `git_blob_new` - Remove `git_blob_set_rawcontent` - Remove `git_blob_set_rawcontent_fromfile` - Rename `git_blob_writefile` -> `git_blob_create_fromfile` - Change `git_blob_create_fromfile`: The `path` argument is now relative to the repository's working dir - Add `git_blob_create_frombuffer` commit.h -------- - Remove `git_commit_new` - Remove `git_commit_add_parent` - Remove `git_commit_set_message` - Remove `git_commit_set_committer` - Remove `git_commit_set_author` - Remove `git_commit_set_tree` - Add `git_commit_create` - Add `git_commit_create_v` - Add `git_commit_create_o` - Add `git_commit_create_ov` tag.h ----- - Remove `git_tag_new` - Remove `git_tag_set_target` - Remove `git_tag_set_name` - Remove `git_tag_set_tagger` - Remove `git_tag_set_message` - Add `git_tag_create` - Add `git_tag_create_o` tree.h ------ - Change `git_tree_entry_2object`: New signature is `(git_object **object_out, git_repository *repo, git_tree_entry *entry)` - Remove `git_tree_new` - Remove `git_tree_add_entry` - Remove `git_tree_remove_entry_byindex` - Remove `git_tree_remove_entry_byname` - Remove `git_tree_clearentries` - Remove `git_tree_entry_set_id` - Remove `git_tree_entry_set_name` - Remove `git_tree_entry_set_attributes` object.h ------------ - Remove `git_object_new - Remove `git_object_write` - Change `git_object_close`: This method is now *mandatory*. Not closing an object causes a memory leak. odb.h ----- - Remove type `git_rawobj` - Remove `git_rawobj_close` - Rename `git_rawobj_hash` -> `git_odb_hash` - Change `git_odb_hash`: New signature is `(git_oid *id, const void *data, size_t len, git_otype type)` - Add type `git_odb_object` - Add `git_odb_object_close` - Change `git_odb_read`: New signature is `(git_odb_object **out, git_odb *db, const git_oid *id)` - Change `git_odb_read_header`: New signature is `(size_t *len_p, git_otype *type_p, git_odb *db, const git_oid *id)` - Remove `git_odb_write` - Add `git_odb_open_wstream` - Add `git_odb_open_rstream` odb_backend.h ------------- - Change type `git_odb_backend`: New internal signatures are as follows int (* read)(void **, size_t *, git_otype *, struct git_odb_backend *, const git_oid *) int (* read_header)(size_t *, git_otype *, struct git_odb_backend *, const git_oid *) int (* writestream)(struct git_odb_stream **, struct git_odb_backend *, size_t, git_otype) int (* readstream)( struct git_odb_stream **, struct git_odb_backend *, const git_oid *) - Add type `git_odb_stream` - Add enum `git_odb_streammode` Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 19a30a3f 2011-03-03T19:53:17 Add new move function, `gitfo_mv_force` Forces a move by creating the folder for the destination file, if it doesn't exist. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti d4b5a4e2 2011-02-09T19:49:02 Internal changes on the backend system The priority value for different backends has been removed from the public `git_odb_backend` struct. We handle that internally. The priority value is specified on the `git_odb_add_alternate`. This is convenient because it allows us to poll a backend twice with different priorities without having to instantiate it twice. We also differentiate between main backends and alternates; alternates have lower priority and cannot be written to. These changes come with some unit tests to make sure that the backend sorting is consistent. The libgit2 version has been bumped to 0.4.0. This commit changes the external API: CHANGED: struct git_odb_backend No longer has a `priority` attribute; priority for the backend in managed internally by the library. git_odb_add_backend(git_odb *odb, git_odb_backend *backend, int priority) Now takes an additional priority parameter, the priority that will be given to the backend. ADDED: git_odb_add_alternate(git_odb *odb, git_odb_backend *backend, int priority) Add a backend as an alternate. Alternate backends have always lower priority than main backends, and writing is disabled on them. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti f725931b 2011-02-05T12:42:41 Fix directory/path manipulation methods The `dirname` and `dirbase` methods have been replaced with the Android implementation, which is actually compilant to some kind of standard. A new method `topdir` has been added, which returns the topmost directory in a path. These changes fix issue #49: `gitfo_prettify_dir_path` converts "./.git/" to ".git/", so the code at src/repository.c:190 goes out of bounds when trying to find the topmost directory. The new `git__topdir` method handles this gracefully, and the fixed `git__dirname` now returns the proper value for the repository's working dir. E.g. /repo/.git/ ==> working dir '/repo/' .git/ ==> working dir '.' Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 44908fe7 2010-12-06T23:03:16 Change the library include file Libgit2 is now officially include as #include "<git2.h>" or indidividual files may be included as #include <git2/index.h> Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti d12299fe 2010-12-03T22:22:10 Change include structure for the project The maze with include dependencies has been fixed. There is now a global include: #include <git.h> The git_odb_backend API has been exposed. Signed-off-by: Vicent Marti <tanoku@gmail.com>
Vicent Marti 7d7cd885 2010-12-03T18:01:30 Decouple storage from ODB logic Comes with two default backends: loose object and packfiles. Signed-off-by: Vicent Marti <tanoku@gmail.com>