src/indexer.c


Log

Author Commit Date CI Message
Stefan Widgren c369b379 2015-07-31T16:23:11 Remove extra semicolon outside of a function Without this change, compiling with gcc and pedantic generates warning: ISO C does not allow extra ‘;’ outside of a function.
Edward Thomson 3e8c5e45 2015-06-10T16:43:48 Merge pull request #3174 from libgit2/cmn/idx-fill-hole indexer: use lseek to extend the packfile
Carlos Martín Nieto 02980bdc 2015-06-09T16:53:07 Initialize a few variables Coverity complains about the git_rawobj ones because we use a loop in which we keep remembering the old version, and we end up copying our object as the base, so we want to have the data pointer be NULL.
Carlos Martín Nieto aa57231f 2015-06-02T10:25:22 indexer: use lseek to extend the packfile We've been using `p_ftruncate()` to extend the packfile in order to mmap it and write the new data into it. This works well in the general case, but as truncation does not allocate space in the filesystem, it must do so when we write data to it. The only way the OS has to indicate a failure to allocate space is via SIGBUS which means we tried to write outside the file. This will cause everyone to crash as they don't expect to handle this signal. Switch to using `p_lseek()` and `p_write()` to extend the file in a way which tells the filesystem to allocate the space for the missing data. We can then be sure that we have space to write into.
Edward Thomson e2dd3735 2015-05-22T11:20:47 indexer: avoid loading already existent bases When thickening a pack, avoid loading already loaded bases and trying to insert them all over again.
Edward Thomson 7800048a 2015-03-17T10:06:50 Merge pull request #2972 from libgit2/cmn/pack-objects-walk [WIP] Smarter pack-building
Carlos Martín Nieto 7c63a33f 2015-03-13T19:41:40 indexer: bring back the error message on duplcate commits It turns out that erroring out on duplicate commits is the right thing to do, but git was not hitting the bug on the server-side. Bring back a descriptive error message in case of duplicate entries and error out.
Carlos Martín Nieto dccf59ad 2015-03-13T18:28:07 indexer: don't worry about duplicate objects If a packfile includes duplicate objects, we can choose to use the secon copy instead of the first by using the same logic as if it were the first. Change the error condition from 0 to -1, which indicates a bad resize, and set the OOM message in that case. This does mean we will leak the first copy of the object. We can deal with that later, but making fetches work is more important.
Carlos Martín Nieto a34692c4 2015-03-13T18:00:15 indexer: set an error message on duplicate objects in pack While this is not even close to a fix, we can at least set an error message so we know which error we are facing. Up to know we just returned an error without a message.
Carlos Martín Nieto b63b76e0 2014-10-12T11:42:31 Reorder some khash declarations Keep the definitions in the headers, while putting the declarations in the C files. Putting the function definitions in headers causes them to be duplicated if you include two headers with them.
Edward Thomson c251f3bb 2014-12-08T16:05:47 win32: remember to cleanup our hash_ctx
Ravindra Patel ec7e680c 2014-11-20T12:07:55 Fix for misleading "missing delta bases" error - Fix #2721.
Ravindra Patel 7561f98d 2014-11-19T14:54:30 Fix for memory leak issue in indexer.c, that surfaces on windows
Carlos Martín Nieto 177a29d8 2014-10-27T10:39:45 Merge commit 'refs/pull/2366/head' of github.com:libgit2/libgit2
William Swanson 01b432cf 2014-07-09T14:12:30 Properly report failure when expanding a packfile
Philip Kelley bc8a0886 2014-06-27T11:51:35 Fix assert when receiving uncommon sideband packet
Carlos Martín Nieto b3b66c57 2014-06-18T17:13:12 Share packs across repository instances Opening the same repository multiple times will currently open the same file multiple times, as well as map the same region of the file multiple times. This is not necessary, as the packfile data is immutable. Instead of opening and closing packfiles directly, introduce an indirection and allocate packfiles globally. This does mean locking on each packfile open, but we already use this lock for the global mwindow list so it doesn't introduce a new contention point.
Albert Meltzer 62e562f9 2014-05-18T07:54:41 Fix compiler warning (git_off_t cast to size_t). Use size_t for page size, instead of long. Check result of sysconf. Use size_t for page offset so no cast to size_t (second arg to p_mmap). Use mod instead div/mult pair, so no cast to size_t is necessary.
Albert Meltzer 9c4feef9 2014-05-17T12:44:21 Fix warning on uninitialized variable.
Carlos Martín Nieto 0731a5b4 2014-05-14T19:12:48 indexer: mmap fixes for Windows Windows has its own ftruncate() called _chsize_s(). p_mkstemp() is changed to use p_open() so we can make sure we open for writing; the addition of exclusive create is a good thing to do regardless, as we want a temporary path for ourselves. Lastly, MSVC doesn't quite know how to add two numbers if one of them is a void pointer, so let's alias it to unsigned char.C
Carlos Martín Nieto f7310540 2014-05-13T02:41:48 indexer: use mmap for writing Some OSs cannot keep their ideas about file content straight when mixing standard IO with file mapping. As we use mmap for reading from the packfile, let's make writing to the pack file use mmap.
Linquize b3f27c43 2014-05-13T21:08:50 Initialize local variable
Carlos Martín Nieto 2dde1e0c 2014-05-08T22:31:59 indexer: avoid memory moves Our vector does a move of the rest of the array when we remove an item. Doing this repeatedly can be expensive, and we do this a lot in the indexer. Instead, set the value to NULL and skip those entries. perf reported around 30% of `index-pack` time was going into memmove. With this change, that goes away and we spent most of the time hashing and inflating data.
Jacques Germishuys 48e60ae7 2014-04-21T11:23:29 Don't redefine the same callback types, their signatures may change
Russell Belfer e9d5e5f3 2014-01-28T16:25:42 Some fixes for Windows x64 warnings
Vicent Marti 557bd1f4 2014-01-14T10:27:57 Merge pull request #2043 from arthurschreiber/arthur/fix-memory-leaks Fix a bunch of memory leaks.
Arthur Schreiber 24953757 2014-01-14T19:08:58 Incorporate @arrbee's suggestions.
Edward Thomson c6f26b48 2013-12-13T18:26:46 Refactor zlib for easier deflate streaming
Arthur Schreiber ac44b3d2 2014-01-13T23:28:03 Incorporate @ethomson's suggestions.
Arthur Schreiber ddf1b1ff 2014-01-13T22:33:10 Fix a memory leak in `hash_and_save` and `inject_object`.
Russell Belfer 9cfce273 2013-12-12T12:11:38 Cleanups, renames, and leak fixes This renames git_vector_free_all to the better git_vector_free_deep and also contains a couple of memory leak fixes based on valgrind checks. The fixes are specifically: failure to free global dir path variables when not compiled with threading on and failure to free filters from the filter registry that had not be initialized fully.
Russell Belfer 7697e541 2013-12-11T15:02:20 Test cancel from indexer progress callback This adds tests that try canceling an indexer operation from within the progress callback. After writing the tests, I wanted to run this under valgrind and had a number of errors in that situation because mmap wasn't working. I added a CMake option to force emulation of mmap and consolidated the Amiga-specific code into that new place (so we don't actually need separate Amiga code now, just have to turn on -DNO_MMAP). Additionally, I made the indexer code propagate error codes more reliably than it used to.
Russell Belfer 26c1cb91 2013-12-09T09:44:03 One more rename/cleanup for callback err functions
Russell Belfer 25e0b157 2013-12-06T15:07:57 Remove converting user error to GIT_EUSER This changes the behavior of callbacks so that the callback error code is not converted into GIT_EUSER and instead we propagate the return value through to the caller. Instead of using the giterr_capture and giterr_restore functions, we now rely on all functions to pass back the return value from a callback. To avoid having a return value with no error message, the user can call the public giterr_set_str or some such function to set an error message. There is a new helper 'giterr_set_callback' that functions can invoke after making a callback which ensures that some error message was set in case the callback did not set one. In places where the sign of the callback return value is meaningful (e.g. positive to skip, negative to abort), only the negative values are returned back to the caller, obviously, since the other values allow for continuing the loop. The hardest parts of this were in the checkout code where positive return values were overloaded as meaningful values for checkout. I fixed this by adding an output parameter to many of the internal checkout functions and removing the overload. This added some code, but it is probably a better implementation. There is some funkiness in the network code where user provided callbacks could be returning a positive or a negative value and we want to rely on that to cancel the loop. There are still a couple places where an user error might get turned into GIT_EUSER there, I think, though none exercised by the tests.
Russell Belfer fcd324c6 2013-12-06T15:04:31 Add git_vector_free_all There are a lot of places that we call git__free on each item in a vector and then call git_vector_free on the vector itself. This just wraps that up into one convenient helper function.
Russell Belfer dab89f9b 2013-12-04T21:22:57 Further EUSER and error propagation fixes This continues auditing all the places where GIT_EUSER is being returned and making sure to clear any existing error using the new giterr_user_cancel helper. As a result, places that relied on intercepting GIT_EUSER but having the old error preserved also needed to be cleaned up to correctly stash and then retrieve the actual error. Additionally, as I encountered places where error codes were not being propagated correctly, I tried to fix them up. A number of those fixes are included in the this commit as well.
Jameson Miller db4cbfe5 2013-12-02T14:09:12 Updates to cancellation logic during download and indexing of packfile.
Edward Thomson 1e60e5f4 2013-11-07T12:03:44 Allow callers to set mode on packfile creation
Edward Thomson 1d3a8aeb 2013-11-04T18:28:57 move mode_t to filebuf_open instead of _commit
Russell Belfer 948f00b4 2013-11-01T09:38:03 Merge pull request #1933 from libgit2/vmg/gcc-warnings Warnings for Windows x64 (MSVC) and GCC on Linux
Vicent Marti 51a3dfb5 2013-11-01T16:31:02 pack: `__object_header` always returns unsigned values
Linquize 3343b5ff 2013-10-31T22:59:42 Fix warning on win64
Carlos Martín Nieto a6154f21 2013-10-30T15:00:05 indexer: remove the stream infix It was there to keep it apart from the one which read in from a file on disk. This other indexer does not exist anymore, so there is no need for anything other than git_indexer to refer to it. While here, rename _add() function to _append() and _finalize() to _commit(). The former change is cosmetic, while the latter avoids talking about "finalizing", which OO languages use to mean something completely different.
Vicent Martí 5c50f22a 2013-10-28T09:25:44 Merge pull request #1891 from libgit2/cmn/fix-thin-packs Add support for thin packs
Carlos Martín Nieto ab46b1d8 2013-10-23T15:08:18 indexer: include the delta stats The user is unable to derive the number of deltas in the pack, as that would require them to capture the stats exactly in the moment between download and final processing, which is abstracted away in the fetch. Capture these numbers for the user and expose them in the progress struct. The clone and fetch examples now also present this information to the user.
Carlos Martín Nieto 893055f2 2013-10-11T17:24:29 indexer: clearer stats for thin packs Don't increase the number of total objects, as it can produce suprising progress output. The only addition compared to pre-thin is the addition of local_objects to allow an output similar to git's "completed with %d local objects".
Carlos Martín Nieto 7fb6eb27 2013-10-08T11:54:50 indexer: inject one base at a time There may be multiple deltas referencing the same base as well as OFS deltas which rely on a thin delta. Deal with both at the same time by injecting a single object and going back up to the main delta-resolving loop.
Carlos Martín Nieto 0b33fca0 2013-10-02T13:39:35 indexer: fix thin packs When given an ODB from which to read objects, the indexer will attempt to inject the missing bases at the end of the pack and update the header and trailer to reflect the new contents.
Carlos Martín Nieto cf0582b4 2013-10-02T12:22:54 indexer: do multiple passes over the delta list Though unusual, a packfile may contain a delta whose base is a delta that comes later. In order index such a packfile, we must not give up on the first failure to resolve a delta, but keep it around. If there is a pass which makes no progress, this indicates that the packfile is broken, so fail accordingly.
Russell Belfer af302aca 2013-10-02T14:13:11 Clean up annoying warnings The indexer code was generating warnings on Windows 64-bit. I looked closely at the logic and was able to simplify it a bit. Also this fixes some other Windows and Linux warnings.
Jameson Miller 5b188225 2013-10-02T13:45:32 Support cancellation in push operation This commit adds cancellation for the push operation. This work consists of: 1) Support cancellation during push operation - During object counting phase - During network transfer phase - Propagate GIT_EUSER error code out to caller 2) Improve cancellation support during fetch - Handle cancellation request during network transfer phase - Clear error string when cancelled during indexing 3) Fix error handling in git_smart__download_pack Cancellation during push is still only handled in the pack building and network transfer stages of push (and not during packbuilding).
nulltoken 8a1e925d 2013-09-26T12:00:35 Fix warnings
Vicent Marti 5a284edc 2013-09-18T03:54:17 msvc: No void* arithmetic on Windows
Carlos Martín Nieto e0aa6fc1 2013-09-18T02:20:17 indexer: don't reiterate the class in the message
Carlos Martín Nieto 98eb2c59 2013-09-17T17:44:05 indexer: check the packfile trailer for correctness The packfile trailer gets sent over and we should check whether it's correct as part of our sanity checks of the packfile.
Rémi Duraffort 8d6ef4bf 2013-07-15T15:59:35 index: fix potential memory leaks
Russell Belfer 278ce746 2013-07-01T10:20:38 Add helpful buffer shorten function
Linquize 7026ad89 2013-05-16T21:08:55 calloc() to initialize memory
Russell Belfer b7f167da 2013-04-29T13:52:12 Make git_oid_cmp public and add git_oid__cmp
Russell Belfer 38eef611 2013-04-16T14:19:27 Make indexer use shared packfile open code The indexer was creating a packfile object separately from the code in pack.c which was a problem since I put a call to git_mutex_init into just pack.c. This commit updates the pack function for creating a new pack object (i.e. git_packfile_check()) so that it can be used in both places and then makes indexer.c use the shared initialization routine. There are also a few minor formatting and warning message fixes.
Russell Belfer 5d2d21e5 2013-04-16T15:00:43 Consolidate packfile allocation further Rename git_packfile_check to git_packfile_alloc since it is now being used more in that capacity. Fix the various places that use it. Consolidate some repeated code in odb_pack.c related to the allocation of a new pack_backend.
Arkadiy Shapkin 10c06114 2013-03-17T04:46:46 Several warnings detected by static code analyzer fixed Implicit type conversion argument of function to size_t type Suspicious sequence of types castings: size_t -> int -> size_t Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)' Unsigned type is never < 0
Carlos Martín Nieto 0e040c03 2013-03-03T14:50:47 indexer: use a hashtable for keeping track of offsets These offsets are needed for REF_DELTA objects, which encode which object they use as a base, but not where it lies in the packfile, so we need a list. These objects are mostly from older packfiles, before OFS_DELTA was widely spread. The time spent in indexing these packfiles is greatly reduced, though remains above what git is able to do.
Carlos Martín Nieto 447ae791 2013-03-03T15:19:21 indexer: kill git_indexer This was the first implementation and its goal was simply to have something that worked. It is slow and now it's just taking up space. Remove it and switch the one known usage to use the streaming indexer.
Philip Kelley 2fe67aeb 2013-02-14T08:46:58 Fix a git_filebuf leak (fixes Win32 clone::can_cancel)
Ben Straub def60ea4 2013-02-05T13:14:48 Allow all non-zero returns to cancel transfers
Ben Straub fe95ac1b 2013-02-05T10:59:58 Allow progress callback to cancel fetch This works by having the indexer watch the return code of the callback, so will only take effect on object boundaries.
Carlos Martín Nieto 96c9b9f0 2013-01-12T18:38:19 indexer: properly free the packfile resources The indexer needs to call the packfile's free function so it takes care of freeing the caches. We still need to close the mwf descriptor manually so we can rename the packfile into its final name on Windows.
Carlos Martín Nieto 80d647ad 2013-01-11T20:15:06 Revert "pack: packfile_free -> git_packfile_free and use it in the indexers" This reverts commit f289f886cb81bb570bed747053d5ebf8aba6bef7, which makes the tests fail on Windows. Revert until we can figure out a solution.
nulltoken 090d5e1f 2013-01-11T14:40:09 Fix MSVC compilation warnings
Carlos Martín Nieto f289f886 2013-01-11T17:24:52 pack: packfile_free -> git_packfile_free and use it in the indexers It turns out the indexers have been ignoring the pack's free function and leaking data. Plug that.
Edward Thomson 359fc2d2 2013-01-08T17:07:25 update copyrights
nulltoken bdb94c21 2012-12-17T12:20:52 Fix MSVC compilation warnings
Carlos Martín Nieto 6481a68d 2012-12-07T19:23:16 indexer: move the temporary buffers into the indexer object Storing 4kB or 8kB in the stack is not very gentle. As this part has to be linear, put the buffer into the indexer object so we allocate it once in the heap.
Carlos Martín Nieto 3908c254 2012-11-30T17:25:50 indexer: correctly deal with objects larger than the window size A mmap-window is not guaranteed to give you the whole object, but the indexer currently assumes so. Loop asking for more data until we've successfully CRC'd all of the packed data.
Carlos Martín Nieto f56f8585 2012-11-19T22:23:16 indexer: use the packfile streaming API The new API allows us to read the object bit by bit from the packfile, instead of needing it all at once in the packfile. This also allows us to hash the object as it comes in from the network instead of having to try to read it all and failing repeatedly for larger objects. This is only the first step, but it already shows huge improvements when dealing with objects over a few megabytes in size. It reduces the memory needs in some cases, but delta objects still need to be completely in memory and the old inefficent method is still used for that.
Carlos Martín Nieto 5a3ad89d 2012-11-20T07:03:56 indexer: make use of streaming also for deltas Up to now, deltas needed to be enterily in the packfile, and we tried to decompress then in their entirety over and over again. Adjust the logic so we read them as they come, just as we do for full objects. This also allows us to simplify the logic and have less nested code. The delta resolving phase still needs to decompress the whole object into memory, as there is not yet any streaming delta-apply support, but it helps in speeding up the downloading process and reduces the amount of memory allocations we need to do.
Ben Straub 839c5f57 2012-11-26T12:04:07 API updates for indexer.h
Martin Woodward 826bc4a8 2012-11-23T13:31:22 Remove use of English expletives Remove words such as fuck, crap, shit etc. Remove other potentially offensive words from comments. Tidy up other geopolicital terms in comments.
Sascha Cunz 4cc7342e 2012-11-18T09:07:35 Indexer: Avoid a possible double-deletion in error case
Edward Thomson d6fb0924 2012-11-05T12:37:15 Win32 CryptoAPI and CNG support for SHA1
Edward Thomson 603bee07 2012-11-12T19:22:49 Remove git_hash_ctx_new - callers now _ctx_init()
Ben Straub 81eecc34 2012-10-29T13:34:14 Fetch: don't clobber received count This memset was being reached after the entire packfile under WinHttp, so the byte count was being lost for small repos.
Ben Straub 7d222e13 2012-10-24T13:29:14 Network progress: rename things git_indexer_stats and friends -> git_transfer_progress* Also made git_transfer_progress members more sanely named.
Ben Straub 909f6265 2012-10-18T15:28:09 Indexing progress now goes to 100%
Ben Straub 216863c4 2012-10-17T14:02:24 Fetch/indexer: progress callbacks
Michael Schubert e3f8d58d 2012-08-14T23:07:54 indexer: do not require absolute path
Carlos Martín Nieto 2b175ca9 2012-08-26T00:35:52 indexer: kill git_indexer_stats.data_received It's not really needed with the current code as we have EOS and the sideband's flush to tell us we're done. Keep the distinction between processed and received objects.
Carlos Martín Nieto 7a57ae54 2012-08-25T23:31:29 indexer: don't segfault when freeing an unused indexer Make sure that idx->pack isn't NULL before trying to free resources under it.
Carlos Martín Nieto bffa852f 2012-07-13T12:01:11 indexer: recognize and mark when all of the packfile has been downloaded We can't always rely on the network telling us when the download is finished. Recognize it from the indexer itself.
Carlos Martín Nieto d1af70b0 2012-07-13T20:43:56 indexer: delay resolving deltas Not all delta bases are available on the first try. By delaying resolving all deltas until the end, we avoid decompressing some of the data twice or even more times, saving effort and time.
Carlos Martin Nieto 1d8943c6 2012-06-28T12:05:49 mwindow: allow memory-window files to deregister Once a file is registered, there is no way to deregister it, even after the structure that contains it is no longer needed and has been freed. This may be the source of #624. Allow and use the deregister function to remove our file from the global list.
Carlos Martín Nieto 37159957 2012-06-28T09:33:08 indexer: don't use '/objects/pack/' unconditionally Not everyone who indexes a packfile wants to put it in the standard git repository location.
Michael Schubert f9fd7105 2012-06-25T15:26:38 indexer: start parsing input data immediately Currently, the first call of git_indexer_stream_add adds the data to the underlying pack file and opens it for later use, but doesn't start parsing the already available data. This means, git_indexer_stream_finalize only works if git_indexer_stream_add was called at least twice. Kill this limitation by parsing available data immediately.
Chris Young a21bb1aa 2012-06-13T23:28:51 Merge remote-tracking branch 'source/development' into development
Chris Young 2aeadb9c 2012-06-12T19:25:09 Actually do the mmap... unsurprisingly, this makes the indexer work on SFS On RAM: the .idx and .pack files become links to a .lock and the original download respectively. Assume some feature (such as record locking) supported by SFS but not JXFS or RAM: is required.
Vicent Martí 3f035860 2012-06-07T22:43:03 misc: Fix warnings from PVS Studio trial
Michael Schubert 54db1a18 2012-05-19T13:20:55 Cleanup * indexer: remove leftover printf * commit: remove unused macros COMMIT_BASIC_PARSE, COMMIT_FULL_PARSE and COMMIT_PRINT
Vicent Martí 904b67e6 2012-05-18T01:48:50 errors: Rename error codes
Vicent Martí e172cf08 2012-05-18T01:21:06 errors: Rename the generic return codes