Hash :
4dfcc50f
Author :
Date :
2020-04-01T15:16:18
merge: cache negative cache results for similarity metrics When computing renames, we cache the hash signatures for each of the potentially conflicting entries so that we do not need to repeatedly read the file and can at least halfway efficiently determine whether two files are similar enough to be deemed a rename. In order to make the hash signatures meaningful, we require at least four lines of data to be present, resulting in at least four different hashes that can be compared. Files that are deemed too small are not cached at all and will thus be repeatedly re-hashed, which is usually not a huge issue. The issue with above heuristic is in case a file does _not_ have at least four lines, where a line is anything separated by a consecutive run of "\n" or "\0" characters. For example "a\nb" is two lines, but "a\0\0b" is also just two lines. Taken to the extreme, a file that has megabytes of consecutive space- or NUL-only may also be deemed as too small and thus not get cached. As a result, we will repeatedly load its blob, calculate its hash signature just to finally throw it away as we notice it's not of any value. When you've got a comparitively big file that you compare against a big set of potentially renamed files, then the cost simply expodes. The issue can be trivially fixed by introducing negative cache entries. Whenever we determine that a given blob does not have a meaningful representation via a hash signature, we store this negative cache marker and will from then on not hash it again, but also ignore it as a potential rename target. This should help the "normal" case already where you have a lot of small files as rename candidates, but in the above scenario it's savings are extraordinarily high. To verify we do not hit the issue anymore with described solution, this commit adds a test that uses the exact same setup described above with one 50 megabyte blob of '\0' characters and 1000 other files that get renamed. Without the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 11m48.377s user 11m11.576s sys 0m35.187s And with the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 0m1.972s user 0m1.851s sys 0m0.118s So this represents a ~350-fold performance improvement, but it obviously depends on how many files you have and how big the blob is. The test number were chosen in a way that one will immediately notice as soon as the bug resurfaces.
Git HTTP | https://git.kmx.io/thodg/libgit2.git |
---|---|
Git SSH | git@git.kmx.io:thodg/libgit2.git |
Public access ? | public |
Description | |
Users |
![]() |
Tags |
|
Build Status | |
---|---|
master branch CI builds |
|
v0.99 branch CI builds |
|
v0.28 branch CI builds |
|
Nightly builds |
|
libgit2
is a portable, pure C implementation of the Git core methods
provided as a linkable library with a solid API, allowing to build Git
functionality into your application. Language bindings like
Rugged (Ruby),
LibGit2Sharp (.NET),
pygit2 (Python) and
NodeGit (Node) allow you to build Git tooling
in your favorite language.
libgit2
is used to power Git GUI clients like
GitKraken and gmaster
and on Git hosting providers like GitHub,
GitLab and
Azure DevOps.
We perform the merge every time you click “merge pull request”.
libgit2
is licensed under a very permissive license (GPLv2 with a special
Linking Exception). This basically means that you can link it (unmodified)
with any kind of software without having to release its source code.
Additionally, the example code has been released to the public domain (see the
separate license for more information).
Prerequisites for building libgit2:
PATH
. PATH
. Build
mkdir build && cd build
cmake ..
cmake --build .
Trouble with these steps? Read our troubleshooting guide. More detailed build guidance is available below.
Chat with us
#libgit2
Getting Help
If you have questions about the library, please be sure to check out the
API documentation. If you still have
questions, reach out to us on Slack or post a question on
StackOverflow (with the libgit2
tag).
Reporting Bugs
Please open a GitHub Issue and include as much information as possible. If possible, provide sample code that illustrates the problem you’re seeing. If you’re seeing a bug only on a specific repository, please provide a link to it if possible.
We ask that you not open a GitHub Issue for help, only for bug reports.
Reporting Security Issues
Please have a look at SECURITY.md.
libgit2 provides you with the ability to manage Git repositories in the programming language of your choice. It’s used in production to power many applications including GitHub.com, Plastic SCM and Azure DevOps.
It does not aim to replace the git tool or its user-facing commands. Some APIs resemble the plumbing commands as those align closely with the concepts of the Git system, but most commands a user would type are out of scope for this library to implement directly.
The library provides:
As libgit2 is purely a consumer of the Git system, we have to adjust to changes made upstream. This has two major consequences:
While the library provides git functionality without the need for dependencies, it can make use of a few libraries to add to it:
The library needs to keep track of some global state. Call
git_libgit2_init();
before calling any other libgit2 functions. You can call this function many times. A matching number of calls to
git_libgit2_shutdown();
will free the resources. Note that if you have worker threads, you should
call git_libgit2_shutdown
after those threads have exited. If you
require assistance coordinating this, simply have the worker threads call
git_libgit2_init
at startup and git_libgit2_shutdown
at shutdown.
See threading for information
See conventions for an overview of the external and internal API/coding conventions we use.
libgit2
builds cleanly on most platforms without any external dependencies.
Under Unix-like systems, like Linux, *BSD and Mac OS X, libgit2 expects pthreads
to be available;
they should be installed by default on all systems. Under Windows, libgit2 uses the native Windows API
for threading.
The libgit2
library is built using CMake (version 2.8 or newer) on all platforms.
On most systems you can build the library using the following commands
$ mkdir build && cd build
$ cmake ..
$ cmake --build .
Alternatively you can point the CMake GUI tool to the CMakeLists.txt file and generate platform specific build project or IDE workspace.
Once built, you can run the tests from the build
directory with the command
$ ctest -V
Alternatively you can run the test suite directly using,
$ ./libgit2_clar
Invoking the test suite directly is useful because it allows you to execute
individual tests, or groups of tests using the -s
flag. For example, to
run the index tests:
$ ./libgit2_clar -sindex
To run a single test named index::racy::diff
, which corresponds to the test
function test_index_racy__diff
:
$ ./libgit2_clar -sindex::racy::diff
The test suite will print a .
for every passing test, and an F
for any
failing test. An S
indicates that a test was skipped because it is not
applicable to your platform or is particularly expensive.
Note: There should be no failing tests when you build an unmodified source tree from a release, or from the master branch. Please contact us or open an issue if you see test failures.
To install the library you can specify the install prefix by setting:
$ cmake .. -DCMAKE_INSTALL_PREFIX=/install/prefix
$ cmake --build . --target install
For more advanced use or questions about CMake please read https://cmake.org/Wiki/CMake_FAQ.
The following CMake variables are declared:
CMAKE_INSTALL_BINDIR
: Where to install binaries to. CMAKE_INSTALL_LIBDIR
: Where to install libraries to. CMAKE_INSTALL_INCLUDEDIR
: Where to install headers to. BUILD_SHARED_LIBS
: Build libgit2 as a Shared Library (defaults to ON) BUILD_CLAR
: Build Clar-based test suite (defaults to ON) THREADSAFE
: Build libgit2 with threading support (defaults to ON) To list all build options and their current value, you can do the following:
# Create and set up a build directory
$ mkdir build
$ cmake ..
# List all build options and their values
$ cmake -L
CMake lets you specify a few variables to control the behavior of the compiler and linker. These flags are rarely used but can be useful for 64-bit to 32-bit cross-compilation.
CMAKE_C_FLAGS
: Set your own compiler flags CMAKE_FIND_ROOT_PATH
: Override the search path for libraries ZLIB_LIBRARY
, OPENSSL_SSL_LIBRARY
AND OPENSSL_CRYPTO_LIBRARY
:
Tell CMake where to find those specific libraries
If you want to build a universal binary for Mac OS X, CMake sets it
all up for you if you use -DCMAKE_OSX_ARCHITECTURES="i386;x86_64"
when configuring.
Extract toolchain from NDK using, make-standalone-toolchain.sh
script.
Optionally, crosscompile and install OpenSSL inside of it. Then create CMake
toolchain file that configures paths to your crosscompiler (substitute {PATH}
with full path to the toolchain):
SET(CMAKE_SYSTEM_NAME Linux)
SET(CMAKE_SYSTEM_VERSION Android)
SET(CMAKE_C_COMPILER {PATH}/bin/arm-linux-androideabi-gcc)
SET(CMAKE_CXX_COMPILER {PATH}/bin/arm-linux-androideabi-g++)
SET(CMAKE_FIND_ROOT_PATH {PATH}/sysroot/)
SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
Add -DCMAKE_TOOLCHAIN_FILE={pathToToolchainFile}
to cmake command
when configuring.
Here are the bindings to libgit2 that are currently available:
If you start another language binding to libgit2, please let us know so we can add it to the list.
We welcome new contributors! We have a number of issues marked as “up for grabs” and “easy fix” that are good places to jump in and get started. There’s much more detailed information in our list of outstanding projects.
Please be sure to check the contribution guidelines to understand our workflow, and the libgit2 coding conventions.
libgit2
is under GPL2 with linking exception. This means you can link to
and use the library from any program, proprietary or open source; paid or
gratis. However, if you modify libgit2 itself, you must distribute the
source to your modified version of libgit2.
See the COPYING file for the full license text.