src/indexer.c


Log

Author Commit Date CI Message
Arkadiy Shapkin 10c06114 2013-03-17T04:46:46 Several warnings detected by static code analyzer fixed Implicit type conversion argument of function to size_t type Suspicious sequence of types castings: size_t -> int -> size_t Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)' Unsigned type is never < 0
Carlos Martín Nieto 0e040c03 2013-03-03T14:50:47 indexer: use a hashtable for keeping track of offsets These offsets are needed for REF_DELTA objects, which encode which object they use as a base, but not where it lies in the packfile, so we need a list. These objects are mostly from older packfiles, before OFS_DELTA was widely spread. The time spent in indexing these packfiles is greatly reduced, though remains above what git is able to do.
Carlos Martín Nieto 447ae791 2013-03-03T15:19:21 indexer: kill git_indexer This was the first implementation and its goal was simply to have something that worked. It is slow and now it's just taking up space. Remove it and switch the one known usage to use the streaming indexer.
Philip Kelley 2fe67aeb 2013-02-14T08:46:58 Fix a git_filebuf leak (fixes Win32 clone::can_cancel)
Ben Straub def60ea4 2013-02-05T13:14:48 Allow all non-zero returns to cancel transfers
Ben Straub fe95ac1b 2013-02-05T10:59:58 Allow progress callback to cancel fetch This works by having the indexer watch the return code of the callback, so will only take effect on object boundaries.
Carlos Martín Nieto 96c9b9f0 2013-01-12T18:38:19 indexer: properly free the packfile resources The indexer needs to call the packfile's free function so it takes care of freeing the caches. We still need to close the mwf descriptor manually so we can rename the packfile into its final name on Windows.
Carlos Martín Nieto 80d647ad 2013-01-11T20:15:06 Revert "pack: packfile_free -> git_packfile_free and use it in the indexers" This reverts commit f289f886cb81bb570bed747053d5ebf8aba6bef7, which makes the tests fail on Windows. Revert until we can figure out a solution.
nulltoken 090d5e1f 2013-01-11T14:40:09 Fix MSVC compilation warnings
Carlos Martín Nieto f289f886 2013-01-11T17:24:52 pack: packfile_free -> git_packfile_free and use it in the indexers It turns out the indexers have been ignoring the pack's free function and leaking data. Plug that.
Edward Thomson 359fc2d2 2013-01-08T17:07:25 update copyrights
nulltoken bdb94c21 2012-12-17T12:20:52 Fix MSVC compilation warnings
Carlos Martín Nieto 6481a68d 2012-12-07T19:23:16 indexer: move the temporary buffers into the indexer object Storing 4kB or 8kB in the stack is not very gentle. As this part has to be linear, put the buffer into the indexer object so we allocate it once in the heap.
Carlos Martín Nieto 3908c254 2012-11-30T17:25:50 indexer: correctly deal with objects larger than the window size A mmap-window is not guaranteed to give you the whole object, but the indexer currently assumes so. Loop asking for more data until we've successfully CRC'd all of the packed data.
Carlos Martín Nieto 5a3ad89d 2012-11-20T07:03:56 indexer: make use of streaming also for deltas Up to now, deltas needed to be enterily in the packfile, and we tried to decompress then in their entirety over and over again. Adjust the logic so we read them as they come, just as we do for full objects. This also allows us to simplify the logic and have less nested code. The delta resolving phase still needs to decompress the whole object into memory, as there is not yet any streaming delta-apply support, but it helps in speeding up the downloading process and reduces the amount of memory allocations we need to do.
Carlos Martín Nieto f56f8585 2012-11-19T22:23:16 indexer: use the packfile streaming API The new API allows us to read the object bit by bit from the packfile, instead of needing it all at once in the packfile. This also allows us to hash the object as it comes in from the network instead of having to try to read it all and failing repeatedly for larger objects. This is only the first step, but it already shows huge improvements when dealing with objects over a few megabytes in size. It reduces the memory needs in some cases, but delta objects still need to be completely in memory and the old inefficent method is still used for that.
Ben Straub 839c5f57 2012-11-26T12:04:07 API updates for indexer.h
Martin Woodward 826bc4a8 2012-11-23T13:31:22 Remove use of English expletives Remove words such as fuck, crap, shit etc. Remove other potentially offensive words from comments. Tidy up other geopolicital terms in comments.
Sascha Cunz 4cc7342e 2012-11-18T09:07:35 Indexer: Avoid a possible double-deletion in error case
Edward Thomson d6fb0924 2012-11-05T12:37:15 Win32 CryptoAPI and CNG support for SHA1
Edward Thomson 603bee07 2012-11-12T19:22:49 Remove git_hash_ctx_new - callers now _ctx_init()
Ben Straub 81eecc34 2012-10-29T13:34:14 Fetch: don't clobber received count This memset was being reached after the entire packfile under WinHttp, so the byte count was being lost for small repos.
Ben Straub 7d222e13 2012-10-24T13:29:14 Network progress: rename things git_indexer_stats and friends -> git_transfer_progress* Also made git_transfer_progress members more sanely named.
Ben Straub 909f6265 2012-10-18T15:28:09 Indexing progress now goes to 100%
Ben Straub 216863c4 2012-10-17T14:02:24 Fetch/indexer: progress callbacks
Michael Schubert e3f8d58d 2012-08-14T23:07:54 indexer: do not require absolute path
Carlos Martín Nieto 2b175ca9 2012-08-26T00:35:52 indexer: kill git_indexer_stats.data_received It's not really needed with the current code as we have EOS and the sideband's flush to tell us we're done. Keep the distinction between processed and received objects.
Carlos Martín Nieto 7a57ae54 2012-08-25T23:31:29 indexer: don't segfault when freeing an unused indexer Make sure that idx->pack isn't NULL before trying to free resources under it.
Carlos Martín Nieto bffa852f 2012-07-13T12:01:11 indexer: recognize and mark when all of the packfile has been downloaded We can't always rely on the network telling us when the download is finished. Recognize it from the indexer itself.
Carlos Martín Nieto d1af70b0 2012-07-13T20:43:56 indexer: delay resolving deltas Not all delta bases are available on the first try. By delaying resolving all deltas until the end, we avoid decompressing some of the data twice or even more times, saving effort and time.
Carlos Martin Nieto 1d8943c6 2012-06-28T12:05:49 mwindow: allow memory-window files to deregister Once a file is registered, there is no way to deregister it, even after the structure that contains it is no longer needed and has been freed. This may be the source of #624. Allow and use the deregister function to remove our file from the global list.
Carlos Martín Nieto 37159957 2012-06-28T09:33:08 indexer: don't use '/objects/pack/' unconditionally Not everyone who indexes a packfile wants to put it in the standard git repository location.
Michael Schubert f9fd7105 2012-06-25T15:26:38 indexer: start parsing input data immediately Currently, the first call of git_indexer_stream_add adds the data to the underlying pack file and opens it for later use, but doesn't start parsing the already available data. This means, git_indexer_stream_finalize only works if git_indexer_stream_add was called at least twice. Kill this limitation by parsing available data immediately.
Chris Young a21bb1aa 2012-06-13T23:28:51 Merge remote-tracking branch 'source/development' into development
Chris Young 2aeadb9c 2012-06-12T19:25:09 Actually do the mmap... unsurprisingly, this makes the indexer work on SFS On RAM: the .idx and .pack files become links to a .lock and the original download respectively. Assume some feature (such as record locking) supported by SFS but not JXFS or RAM: is required.
Vicent Martí 3f035860 2012-06-07T22:43:03 misc: Fix warnings from PVS Studio trial
Michael Schubert 54db1a18 2012-05-19T13:20:55 Cleanup * indexer: remove leftover printf * commit: remove unused macros COMMIT_BASIC_PARSE, COMMIT_FULL_PARSE and COMMIT_PRINT
Vicent Martí 904b67e6 2012-05-18T01:48:50 errors: Rename error codes
Vicent Martí e172cf08 2012-05-18T01:21:06 errors: Rename the generic return codes
Carlos Martín Nieto 6a9d61ef 2012-05-15T15:08:54 indexer: add more consistency checks Error out in finalize if there is junk after the packfile hash or we couldn't process all the objects.
Carlos Martín Nieto 73d87a09 2012-05-15T21:42:01 Introduce GITERR_INDEXER
Carlos Martín Nieto a640d79e 2012-05-09T13:11:50 indexer: close the pack's fd before renaming it Windows gets upset if we rename a file with an open descriptor.
nulltoken fa6420f7 2012-04-29T21:46:33 buf: deploy git_buf_len()
Russell Belfer 821f6bc7 2012-04-26T13:04:54 Fix Win32 warnings
Carlos Martín Nieto dee5515a 2012-04-14T18:34:50 transports: buffer the git requests before sending them Trying to send every single line immediately won't give us any speed improvement and duplicates the code we need for other transports. Make the git transport use the same buffer functions as HTTP.
Carlos Martín Nieto 453ab98d 2012-04-11T12:55:34 indexer: Add git_indexer_stream_finalize() Resolve any lingering deltas, write out the index file and rename the packfile.
Carlos Martín Nieto 1c9c081a 2012-04-13T19:25:06 indexer: add git_indexer_stream_free() and _hash()
Carlos Martín Nieto 3f93e16c 2012-03-29T17:49:57 indexer: start writing the stream indexer This will allow us to index a packfile as soon as we receive it from the network as well as storing it with its final name so we don't need to pass temporary file names around.
Russell Belfer 4aa7de15 2012-03-19T17:49:46 Convert indexer, notes, sha1_lookup, and signature More files moved to new error handling style.
Russell Belfer deafee7b 2012-03-14T17:36:15 Continue error conversion This converts blob.c, fileops.c, and all of the win32 files. Also, various minor cleanups throughout the code. Plus, in testing the win32 build, I cleaned up a bunch (although not all) of the warnings with the 64-bit build.
Russell Belfer e1de726c 2012-03-12T22:55:40 Migrate ODB files to new error handling This migrates odb.c, odb_loose.c, odb_pack.c and pack.c to the new style of error handling. Also got the unix and win32 versions of map.c. There are some minor changes to other files but no others were completely converted. This also contains an update to filebuf so that a zeroed out filebuf will not think that the fd (== 0) is actually open (and inadvertently call close() on fd 0 if cleaned up). Lastly, this was built and tested on win32 and contains a bunch of fixes for the win32 build which was pretty broken.
Vicent Martí cb8a7961 2012-03-07T00:02:55 error-handling: Repository This also includes droping `git_buf_lasterror` because it makes no sense in the new system. Note that in most of the places were it has been dropped, the code needs cleanup. I.e. GIT_ENOMEM is going away, so instead it should return a generic `-1` and obviously not throw anything.
Vicent Martí 0c3bae62 2012-02-15T16:56:56 zlib: Remove custom `git2/zlib.h` header This is legacy compat stuff for when `deflateBound` is not defined, but we're not embedding zlib and that function is always available. Kill that with fire.
schu 5e0de328 2012-02-13T17:10:24 Update Copyright header Signed-off-by: schu <schu-github@schulog.org>
Vicent Martí 18e5b854 2012-02-10T19:47:02 odb: Add internal `git_odb__hashfd`
Carlos Martín Nieto d0ec3fb8 2012-01-19T17:07:49 indexer: save the pack index with the right name Truncate at the slash; otherwise we get ppack-*.idx filenames.
Russell Belfer 97769280 2011-11-30T11:27:15 Use git_buf for path storage instead of stack-based buffers This converts virtually all of the places that allocate GIT_PATH_MAX buffers on the stack for manipulating paths to use git_buf objects instead. The patch is pretty careful not to touch the public API for libgit2, so there are a few places that still use GIT_PATH_MAX. This extends and changes some details of the git_buf implementation to add a couple of extra functions and to make error handling easier. This includes serious alterations to all the path.c functions, and several of the fileops.c ones, too. Also, there are a number of new functions that parallel existing ones except that use a git_buf instead of a stack-based buffer (such as git_config_find_global_r that exists alongsize git_config_find_global). This also modifies the win32 version of p_realpath to allocate whatever buffer size is needed to accommodate the realpath instead of hardcoding a GIT_PATH_MAX limit, but that change needs to be tested still.
Vicent Martí 89fb8f02 2011-10-28T19:04:23 Merge pull request #456 from brodie/perm-fixes Create objects, indexes, and directories with the right file permissions
Vicent Marti 3286c408 2011-10-28T14:51:13 global: Properly use `git__` memory wrappers Ensure that all memory related functions (malloc, calloc, strdup, free, etc) are using their respective `git__` wrappers.
Brodie Rao 01ad7b3a 2011-09-06T15:48:45 *: correct and codify various file permissions The following files now have 0444 permissions: - loose objects - pack indexes - pack files - packs downloaded by fetch - packs downloaded by the HTTP transport And the following files now have 0666 permissions: - config files - repository indexes - reflogs - refs This brings libgit2 more in line with Git. Note that git_filebuf_commit() and git_filebuf_commit_at() have both gained a new mode parameter. The latter change fixes an important issue where filebufs created with GIT_FILEBUF_TEMPORARY received 0600 permissions (due to mkstemp(3) usage). Now we chmod() the file before renaming it into place. Tests have been added to confirm that new commit, tag, and tree objects are created with the right permissions. I don't have access to Windows, so for now I've guarded the tests with "#ifndef GIT_WIN32".
Carlos Martín Nieto 72d6a20b 2011-10-05T19:59:34 indexer: NUL-terminate the filename As we no longer use the STRLEN macro, the NUL-terminator in the string was not copied over. Fix this. Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Carlos Martín Nieto 92be7908 2011-10-01T14:46:30 indexer: return immediately if passed a NULL value Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Vicent Marti bb742ede 2011-09-19T01:54:32 Cleanup legal data 1. The license header is technically not valid if it doesn't have a copyright signature. 2. The COPYING file has been updated with the different licenses used in the project. 3. The full GPLv2 header in each file annoys me.
Sebastian Schuberth 26e74c6a 2011-09-08T14:21:17 Fix some random size_t vs. int conversion warnings
Kirill A. Shutemov 932669b8 2011-08-25T14:22:57 Drop STRLEN() macros There is no need in STRLEN macros. Compilers can do this trivial optimization on its own. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Vicent Marti c85e08b1 2011-08-16T13:05:05 odb: Do not pass around a header when hashing
Carlos Martín Nieto ade3c9bb 2011-08-07T10:26:33 Assert a filename in indexer creation Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Carlos Martín Nieto c1af5a39 2011-08-06T00:35:20 Implement cooperative caching When indexing a file with ref deltas, a temporary cache for the offsets has to be built, as we don't have an index file yet. If the user takes the responsiblity for filling the cache, the packing code will look there first when it finds a ref delta. Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Vicent Martí 2133c44f 2011-08-09T17:08:18 Merge pull request #355 from jdavid/fix-build Fix "redefinition of typedef git_indexer" build error
Vicent Marti f6867e63 2011-08-08T16:56:28 Fix compilation in Windows
J. David Ibáñez 2d3e417e 2011-08-05T15:11:25 Fix "redefinition of typedef git_indexer" build error Signed-off-by: J. David Ibáñez <jdavid@itaapy.com>
Carlos Martín Nieto 48b3ad4f 2011-08-01T14:02:09 Move pack index writing to a public function Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Carlos Martín Nieto b7c44096 2011-07-28T23:35:39 Implement the indexer Only v2 index files are supported. Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Carlos Martín Nieto a070f152 2011-07-29T01:08:02 Move pack functions to their own file
Carlos Martín Nieto b5b474dd 2011-07-28T11:45:46 Modify the given offset in git_packfile_unpack The callers immediately throw away the offset, so we don't need any logical changes in any of them. This will be useful for the indexer, as it does need to know where the compressed data ends. Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Carlos Martín Nieto 7d0cdf82 2011-07-09T02:25:01 Make packfile_unpack_header more generic On the way, store the fd and the size in the mwindow file. Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Carlos Martín Nieto ab525a74 2011-07-07T19:20:13 Rename stuff to git_indexer_ Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Carlos Martín Nieto f23c4a66 2011-07-07T19:08:45 Start the runner Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Carlos Martín Nieto 3412391d 2011-07-07T11:47:31 Intial indexer code