src


Log

Author Commit Date CI Message
Edward Thomson 2ffa426e 2020-07-09T23:02:05 Merge pull request #5567 from lhchavez/msan Make the tests pass cleanly with MemorySanitizer
Alexander Ovchinnikov dc1deb3b 2020-07-01T15:41:38 Use __GNUC__ macro in the resource script Fix the default LIBGIT2_FILENAME for GNU windres
Alexander Ovchinnikov 71000441 2020-06-16T18:58:07 Review: Rename the stringize macro
Alexander Ovchinnikov 5c40456b 2020-06-16T13:19:02 Enable building git2.rc resource script with GCC
lhchavez 3a197ea7 2020-06-27T12:33:32 Make the tests pass cleanly with MemorySanitizer This change: * Initializes a few variables that were being read before being initialized. * Includes https://github.com/madler/zlib/pull/393. As such, it only works reliably with `-DUSE_BUNDLED_ZLIB=ON`.
Patrick Steinhardt 6256d023 2020-06-15T14:34:29 diff_print: adjust code to match current coding style
Patrick Steinhardt 490d0c9c 2020-06-15T14:26:13 diff_print: return out-of-memory situation when printing binary We currently don't check for out-of-memory situations on exiting `format_binary` and, as a result, may return a partially filled buffer. Fix this by checking the buffer via `git_buf_oom`.
Patrick Steinhardt bea5fd9f 2020-06-15T13:26:18 diff_print: do not call abort(3P) Calling abort(3P) in a library is rather rude and shouldn't happen, as we effectively prohibit any corrective actions made by the application linking to it. We thus shouldn't call it at all, but instead use our new `GIT_ASSERT` macros. Remove the call to abort(3P) in case a diff delta has an unexpected type to fix this.
Patrick Steinhardt 0cf1f444 2020-06-15T13:19:44 diff_print: handle errors when printing to file When printing the diff to a `FILE *` handle, we neither check the return value of fputc(3P) nor the one of fwrite(3P). As a result, we'll silently return successful even if we didn't print anything at all. Futhermore, the arguments to fwrite(3P) are reversed: we have one item of length `content_len`, and not `content_len` items of one byte. Fix both issues by checking return values as well as reversing the arguments to fwrite(3P).
Edward Thomson 74520b91 2020-06-13T19:38:11 Merge pull request #5552 from libgit2/pks/small-fixes Random code cleanups and fixes
Patrick Steinhardt 03c4f86c 2020-06-08T12:42:59 cmake: enable warnings for missing function declarations Over time, we have accumulated quite a lot of functions with missing prototypes, missing `static` keywords or which were completely unused. It's easy to miss these mistakes, but luckily GCC and Clang both have the `-Wmissing-declarations` warning. Enabling this will cause them to emit warnings for every not-static function that doesn't have a previous declaration. This is a very sane thing to enable, and with the preceding commits all these new warnings have been fixed. So let's always enable this warning so we won't introduce new instances of them.
Patrick Steinhardt fd1f0940 2020-06-08T12:42:26 refs: add missing function declaration The function `git_reference__is_note` is not declared anywhere. Let's add the declaration to avoid having non-static functions without declaration.
Patrick Steinhardt c6184f0c 2020-06-08T21:07:36 tree-wide: do not compile deprecated functions with hard deprecation When compiling libgit2 with -DDEPRECATE_HARD, we add a preprocessor definition `GIT_DEPRECATE_HARD` which causes the "git2/deprecated.h" header to be empty. As a result, no function declarations are made available to callers, but the implementations are still available to link against. This has the problem that function declarations also aren't visible to the implementations, meaning that the symbol's visibility will not be set up correctly. As a result, the resulting library may not expose those deprecated symbols at all on some platforms and thus cause linking errors. Fix the issue by conditionally compiling deprecated functions, only. While it becomes impossible to link against such a library in case one uses deprecated functions, distributors of libgit2 aren't expected to pass -DDEPRECATE_HARD anyway. Instead, users of libgit2 should manually define GIT_DEPRECATE_HARD to hide deprecated functions. Using "real" hard deprecation still makes sense in the context of CI to test we don't use deprecated symbols ourselves and in case a dependant uses libgit2 in a vendored way and knows it won't ever use any of the deprecated symbols anyway.
Patrick Steinhardt 6e1efcd6 2020-06-08T12:46:04 tree-wide: add missing header includes We're missing some header includes leading to missing function prototypes. While we currently don't warn about these, we should have their respective headers included in order to detect the case where a function signature change results in an incompatibility.
Patrick Steinhardt a6c9e0b3 2020-06-08T12:40:47 tree-wide: mark local functions as static We've accumulated quite some functions which are never used outside of their respective code unit, but which are lacking the `static` keyword. Add it to reduce their linkage scope and allow the compiler to optimize better.
Patrick Steinhardt 7c499b54 2020-06-08T12:39:09 tree-wide: remove unused functions We have some functions which aren't used anywhere. Let's remove them to get rid of unneeded baggage.
Patrick Steinhardt 46637b5e 2020-06-08T14:47:01 checkout: remove unused code for deferred removals With commit 05f690122 (checkout: remove blocking dir when FORCEd, 2015-03-31), the last case was removde that actually queued a deferred removal. This is now more than five years in the past and nobody complained, so we can rest quite assured that the deferred removal is not really needed at all. Let's remove all related code to simplify the already complicated checkout logic.
Patrick Steinhardt 45901d3e 2020-06-08T12:57:16 revparse: remove superfluous tab character
Patrick Steinhardt c146374c 2020-06-08T12:54:26 revparse: detect out-of-memory cases when parsing curly brace contents When extracting curly braces (e.g. the "upstream" part in "HEAD@{upstream}"), we put the curly braces' contents into a `git_buf` structure, but don't check the return value of `git_buf_putc`. So when we run out-of-memory, we'll use a partially filled buffer without noticing. Let's fix this issue by checking `git_buf_putc`'s return value.
Patrick Steinhardt 53a8f463 2020-06-03T07:40:59 Merge pull request #5536 from libgit2/ethomson/http httpclient: support googlesource
Edward Thomson 6de8aa7f 2020-06-02T12:21:22 Merge pull request #5532 from joshtriplett/pack-default-path git_packbuilder_write: Allow setting path to NULL to use the default path
Edward Thomson 22f9a0fc 2020-06-02T12:12:41 Merge pull request #5531 from joshtriplett/mempack-threads mempack: Use threads when building the pack
Edward Thomson 04c7bdb4 2020-06-01T22:44:14 httpclient: clear the read_buf on new requests The httpclient implementation keeps a `read_buf` that holds the data in the body of the response after the headers have been written. We store that data for subsequent calls to `git_http_client_read_body`. If we want to stop reading body data and send another request, we need to clear that cached data. Clear the cached body data on new requests, just like we read any outstanding data from the socket.
Edward Thomson aa8b2c0f 2020-06-01T23:53:55 httpclient: don't read more than the client wants When `git_http_client_read_body` is invoked, it provides the size of the buffer that can be read into. This will be set as the parser context's `output_size` member. Use this as an upper limit on our reads, and ensure that we do not read more than the client requests.
Edward Thomson 51eff5a5 2020-05-29T13:13:19 strarray: we should `dispose` instead of `free` We _dispose_ the contents of objects; we _free_ objects (and their contents). Update `git_strarray_free` to be `git_strarray_dispose`. `git_strarray_free` remains as a deprecated proxy function.
Edward Thomson a9746b30 2020-05-29T11:21:55 strarray: move to its own file
Edward Thomson 570f0340 2020-06-01T19:10:38 httpclient: read_body should return 0 at EOF When users call `git_http_client_read_body`, it should return 0 at the end of a message. When the `on_message_complete` callback is called, this will set `client->state` to `DONE`. In our read loop, we look for this condition and exit. Without this, when there is no data left except the end of message chunk (`0\r\n`) in the http stream, we would block by reading the three bytes off the stream but not making progress in any `on_body` callbacks. Listening to the `on_message_complete` callback allows us to stop trying to read from the socket when we've read the end of message chunk.
Patrick Steinhardt 17641f1f 2020-06-01T15:05:51 Merge pull request #5526 from libgit2/ethomson/poolinit git_pool_init: allow the function to fail
Edward Thomson 0f35efeb 2020-05-23T10:15:51 git_pool_init: handle failure cases Propagate failures caused by pool initialization errors.
Patrick Steinhardt 1bbdf15d 2020-06-01T13:57:12 Merge pull request #5527 from libgit2/ethomson/config_unreadable Handle unreadable configuration files
Wil Shipley d1409f48 2020-05-06T19:57:07 config: ignore unreadable configuration files Modified `config_file_open()` so it returns 0 if the config file is not readable, which happens on global config files under macOS sandboxing (note that for some reason `access(F_OK)` DOES work with sandboxing, but it is lying). Without this read check sandboxed applications on macOS can not open any repository, because `config_file_read()` will return GIT_ERROR when it cannot read the global /Users/username/.gitconfig file, and the upper layers will just completely abort on GIT_ERROR when attempting to load the global config file, so no repositories can be opened.
Patrick Wang 8c96d56d 2020-05-26T04:53:09 index: write v4: bugfix: prefix path with strip_len, not same_len According to index-format.txt of git, the path of an entry is prefixed with N, where N indicates the length of bytes to be stripped.
Josh Triplett 5278a006 2020-05-23T16:07:54 git_packbuilder_write: Allow setting path to NULL to use the default path If given a NULL path, write to the object path of the repository. Add tests for the new behavior.
Josh Triplett 0bc091dd 2020-05-23T15:35:38 git_packbuilder_write: Unify cleanup path Clean up and return via a single label, to avoid duplicate error handling before each return, and to make it easier to extend the set of cleanups needed.
Josh Triplett 30285a3c 2020-05-23T15:04:19 mempack: Use threads when building the pack The mempack ODB backend creates a packbuilder internally to write out a pack; call git_packbuilder_set_threads on that packbuilder, to use threads for packing if available.
Edward Thomson 27cb4e0e 2020-05-23T11:02:07 Merge pull request #5522 from pks-t/pks/openssl-cert-memleak OpenSSL certificate memory leak
Edward Thomson abfdb8a6 2020-05-23T10:15:37 git_pool_init: return an int Let `git_pool_init` return an int so that it could fail.
Edward Thomson e4bdba56 2020-05-23T09:57:22 Merge pull request #5515 from pks-t/pks/flaky-checkout-test tests: checkout: fix flaky test due to mtime race
Edward Thomson 3b7b4d27 2020-05-23T09:40:55 Merge pull request #5523 from libgit2/pks/cmake-sort-reproducible-builds cmake: Sort source files for reproducible builds
Patrick Steinhardt 3f201f75 2020-05-16T13:48:04 checkout: fix file being treated as unmodified due to racy index When trying to determine whether a file changed, we try to avoid heavy operations by fist taking a look at the index, seeing whether the index entry is modified already. This doesn't seem to cut it, though, as we currently have the racy checkout::index::can_disable_pathspec_match test case: sometimes the files get restored to their original contents, sometimes they aren't. The issue is caused by a racy index [1]: in case we modify a file, add it to the index and then modify it again in-place without changing its file, then we may end up with a modified file that has the same stat(3P) info as we've currently got it in its corresponding index entry. The mitigation for this is to treat files with the same mtime as the index are treated as racily modified. We already have this logic in place for the index, but not when doing a checkout. Fix the issue by only consulting the index entry in case it has an older mtime as the index. Previously, the following script reliably had at least 20 failures, while now there is no failure to be observed anymore: ```bash j=0 for i in $(seq 100) do if ! ./libgit2_clar -scheckout::index::can_disable_pathspec_match >/dev/null then j=$(($j + 1)) fi done echo "Failures: $j" ``` [1]: https://git-scm.com/docs/racy-git
Patrick Steinhardt b85eefb4 2020-05-15T19:52:40 cmake: Sort source files for reproducible builds We currently use `FILE(GLOB ...)` in most places to find source and header files. This is problematic in that the order of files returned depends on the operating system's directory iteration order and may thus not be deterministic. As a result, we link object files in unspecified order, which may cause the linker to emit different code across runs. Fix this issue by sorting all code used as input to the libgit2 library to improve the reliability of reproducible builds.
Patrick Steinhardt b43a9e66 2020-05-15T17:46:24 streams: openssl: fix memleak due to us not free'ing certs When creating a `git_cert` from the OpenSSL X509 certificate of a given stream, we do not call `X509_free()` on the certificate, leading to a memory leak as soon as the certificate is requested e.g. by the certificate check callback. Fix the issue by properly calling `X509_free()`.
Patrick Steinhardt a2eca682 2020-05-12T21:35:07 futils: fix order of declared parameters for `git_futils_fake_symlink` While the function `git_futils_fake_symlink` is declared with arguments `new, old`, the implementation uses the reverse order `old, new`. Let's fix the ordering issues to be `new, old` for both, which matches what symlink(3P) has. While at it, we also rename these parameters: `old` and `new` doesn't really make a lot of sense in the context of symlinks, which is why this commit renames them to be called `target` and `path`.
Edward Thomson cbae1c21 2020-04-01T22:12:07 assert: allow non-int returning functions to assert Include GIT_ASSERT_WITH_RETVAL and GIT_ASSERT_ARG_WITH_RETVAL so that functions that do not return int (or more precisely, where `-1` would not be an error code) can assert. This allows functions that return, eg, NULL on an error code to do that by passing the return value (in this example, `NULL`) as a second parameter to the GIT_ASSERT_WITH_RETVAL functions.
Edward Thomson a95096ba 2020-01-12T10:31:07 assert: optionally fall-back to assert(3) Fall back to the system assert(3) in debug builds, which may aide in debugging. "Safe" assertions can be enabled in debug builds by setting GIT_ASSERT_HARD=0. Similarly, hard assertions can be enabled in release builds by setting GIT_ASSERT_HARD to nonzero.
Edward Thomson abe2efe1 2019-12-09T12:37:34 Introduce GIT_ASSERT macros Provide macros to replace usages of `assert`. A true `assert` is punishing as a library. Instead we should do our best to not crash. GIT_ASSERT_ARG(x) will now assert that the given argument complies to some format and sets an error message and returns `-1` if it does not. GIT_ASSERT(x) is for internal usage, and available as an internal consistency check. It will set an error message and return `-1` in the event of failure.
Philip Kelley 56c95cf6 2020-05-10T21:43:38 Fix uninitialized stack memory and NULL ptr dereference in stash_to_index Caught by static analysis.
Segev Finer d62e44cb 2019-06-03T18:35:08 checkout: Fix removing untracked files by path in subdirectories The checkout code didn't iterate into a subdir if it didn't match the pathspec, but since the pathspec might match files in the subdir we should recurse into it (In contrast to gitignore handling). Fixes #5089
Edward Thomson 63de2128 2020-02-02T20:20:19 checkout: filter pathspecs for _all_ checkout types We were previously applying the pathspec filter for the baseline iterator during checkout, as well as the target tree. This was an oversight; in fact, we should apply the pathspec filter to _all_ checkout targets, not just trees. Add a helper function to set the iterator pathspecs from the given checkout pathspecs, and call it everywhere.
Edward Thomson 898caead 2020-05-10T19:03:10 Merge pull request #5431 from libgit2/ethomson/hexdump git__hexdump: better mimic `hexdump -C`
Carl Schwan 9830ab3d 2020-01-29T02:00:04 blame: add option to ignore whitespace changes
Patrick Steinhardt e9b0cfc0 2020-04-05T13:24:13 Merge pull request #5485 from libgit2/ethomson/sysdir_unused sysdir: remove unused git_sysdir_get_str
Edward Thomson b6f18db9 2020-04-05T11:16:29 sysdir: remove unused git_sysdir_get_str
Seth Junot ce2ab78f 2020-04-04T16:35:33 Fix typo causing removal of symbol 'git_worktree_prune_init_options' Commit 0b5ba0d replaced this function with an "option_init" equivallent, but misspelled the replacement function. As a result, this symbol has been missing from libgit2.so ever since.
Patrick Steinhardt ad341eb7 2020-04-04T13:40:14 Merge pull request #5425 from lhchavez/fix-get-delta-base pack: Improve error handling for get_delta_base()
Patrick Steinhardt 966db47d 2020-04-04T13:21:02 Merge pull request #5477 from pks-t/pks/rename-detection-negative-caches merge: cache negative cache results for similarity metrics
lhchavez 4d4c8e0a 2020-04-02T07:34:55 Re-adding the "delta offset is zero" error case
Patrick Steinhardt dfd7fcc4 2020-04-02T13:26:13 Merge pull request #5388 from bk2204/repo-format-v1 Handle repository format v1
Edward Thomson b8eec0b2 2020-04-01T22:22:38 Merge pull request #5461 from pks-t/pks/refdb-fs-unused-header refdb_fs: remove unused header file
Edward Thomson 5d37128d 2020-03-01T10:34:15 git__hexdump: better mimic `hexdump -C`
lhchavez ba59a4a2 2020-04-01T12:34:16 Making get_delta_base() conform to the general error-handling pattern This makes get_delta_base() return the error code as the return value and the delta base as an out-parameter.
lhchavez f3273725 2020-02-25T20:58:09 pack: Improve error handling for get_delta_base() This change moves the responsibility of setting the error upon failures of get_delta_base() to get_delta_base() instead of its callers. That way, the caller chan always check if the return value is negative and mark the whole operation as an error instead of using garbage values, which can lead to crashes if the .pack files are malformed.
Patrick Steinhardt 4dfcc50f 2020-04-01T15:16:18 merge: cache negative cache results for similarity metrics When computing renames, we cache the hash signatures for each of the potentially conflicting entries so that we do not need to repeatedly read the file and can at least halfway efficiently determine whether two files are similar enough to be deemed a rename. In order to make the hash signatures meaningful, we require at least four lines of data to be present, resulting in at least four different hashes that can be compared. Files that are deemed too small are not cached at all and will thus be repeatedly re-hashed, which is usually not a huge issue. The issue with above heuristic is in case a file does _not_ have at least four lines, where a line is anything separated by a consecutive run of "\n" or "\0" characters. For example "a\nb" is two lines, but "a\0\0b" is also just two lines. Taken to the extreme, a file that has megabytes of consecutive space- or NUL-only may also be deemed as too small and thus not get cached. As a result, we will repeatedly load its blob, calculate its hash signature just to finally throw it away as we notice it's not of any value. When you've got a comparitively big file that you compare against a big set of potentially renamed files, then the cost simply expodes. The issue can be trivially fixed by introducing negative cache entries. Whenever we determine that a given blob does not have a meaningful representation via a hash signature, we store this negative cache marker and will from then on not hash it again, but also ignore it as a potential rename target. This should help the "normal" case already where you have a lot of small files as rename candidates, but in the above scenario it's savings are extraordinarily high. To verify we do not hit the issue anymore with described solution, this commit adds a test that uses the exact same setup described above with one 50 megabyte blob of '\0' characters and 1000 other files that get renamed. Without the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 11m48.377s user 11m11.576s sys 0m35.187s And with the negative cache: $ time ./libgit2_clar -smerge::trees::renames::cache_recomputation >/dev/null real 0m1.972s user 0m1.851s sys 0m0.118s So this represents a ~350-fold performance improvement, but it obviously depends on how many files you have and how big the blob is. The test number were chosen in a way that one will immediately notice as soon as the bug resurfaces.
Patrick Steinhardt 5f47cb48 2020-03-26T14:16:41 patch: correctly handle mode changes for renames When generating a patch for a renamed file whose mode bits have changed in addition to the rename, then we currently fail to parse the generated patch. Furthermore, when generating a diff we output mode bits after the similarity metric, which is different to how upstream git handles it. Fix both issues by adding another state transition that allows similarity indices after mode changes and by printing mode changes before the similarity index.
Edward Thomson bba9599a 2020-03-26T11:56:10 Merge pull request #5445 from lhchavez/fix-5443 Fix segfault when calling git_blame_buffer()
Utkarsh Gupta e7a1fd88 2020-03-26T11:42:47 Fix spelling error Signed-off-by: Utkarsh Gupta <utkarsh@debian.org>
Patrick Steinhardt 74e0489a 2020-03-24T19:42:10 refdb_fs: remove unused header file The "refdb_fs.h" header contains a single struct `git_refcache` that is not used anywhere. As a result, we can just delete the header altogether as it doesn't have any purpose and may confuse readers.
lhchavez 62d59467 2020-03-08T02:13:11 Fix segfault when calling git_blame_buffer() This change makes sure that the hunk is not null before trying to dereference it. This avoids segfaults, especially when blaming against a modified buffer (i.e. the index). Fixes: #5443
Patrick Steinhardt a2d3316a 2020-03-13T23:01:11 refdb_fs: initialize backend version While the `git_refdb_backend()` struct has a version, we do not initialize it correctly when calling `git_refdb_backend_fs()`. Fix this by adding the call to `git_refdb_init_backend()`.
Edward Thomson 9a102446 2020-03-21T16:49:44 Merge pull request #5455 from pks-t/pks/cmake-install-dirs cmake: use install directories provided via GNUInstallDirs
Patrick Steinhardt 87fc539f 2020-03-13T22:08:19 cmake: use install directories provided via GNUInstallDirs We currently hand-code logic to configure where to install our artifacts via the `LIB_INSTALL_DIR`, `INCLUDE_INSTALL_DIR` and `BIN_INSTALL_DIR` variables. This is reinventing the wheel, as CMake already provide a way to do that via `CMAKE_INSTALL_<DIR>` paths, e.g. `CMAKE_INSTALL_LIB`. This requires users of libgit2 to know about the discrepancy and will require special hacks for any build systems that handle these variables in an automated way. One such example is Gentoo Linux, which sets up these paths in both the cmake and cmake-utils eclass. So let's stop doing that: the GNUInstallDirs module handles it in a better way for us, especially so as the actual values are dependent on CMAKE_INSTALL_PREFIX. This commit removes our own set of variables and instead refers users to use the standard ones. As a second benefit, this commit also fixes our pkgconfig generation to use the GNUInstallDirs module. We had a bug there where we ignored the CMAKE_INSTALL_PREFIX when configuring the libdir and includedir keys, so if libdir was set to "lib64", then libdir would be an invalid path. With GNUInstallDirs, we can now use `CMAKE_INSTALL_FULL_LIBDIR`, which handles the prefix for us.
Patrick Steinhardt b1f6481f 2020-03-10T22:07:35 cmake: ignore deprecation notes for Secure Transport The Secure Transport interface we're currently using has been deprecated with macOS 10.15. As we're currently in code freeze, we cannot migrate to newer interfaces. As such, let's disable deprecation warnings for our "schannel.c" stream.
Edward Thomson 43d7a42b 2020-03-08T18:14:09 win32: don't canonicalize symlink targets Don't canonicalize symlink targets; our win32 path canonicalization routines expect an absolute path. In particular, using the path canonicalization routines for symlink targets (introduced in commit 7d55bee6d, "win32: fix relative symlinks pointing into dirs", 2020-01-10). Now, use the utf8 -> utf16 relative path handling functions, so that paths like "../foo" will be translated to "..\foo".
Edward Thomson f2b114ba 2020-03-08T18:11:45 win32: introduce relative path handling function Add a function that takes a (possibly) relative UTF-8 path and emits a UTF-16 path with forward slashes translated to backslashes. If the given path is, in fact, absolute, it will be translated to absolute path handling rules.
Edward Thomson fb7da154 2020-03-08T16:34:23 win32: clarify usage of path canonicalization funcs The path canonicalization functions on win32 are intended to canonicalize absolute paths; those with prefixes. In other words, things start with drive letters (`C:\`), share names (`\\server\share`), or other prefixes (`\\?\`). This function removes leading `..` that occur after the prefix but before the directory/file portion (eg, turning `C:\..\..\..\foo` into `C:\foo`). This translation is not appropriate for local paths.
Edward Thomson e23b8b44 2020-03-06T17:13:48 Merge pull request #5422 from pks-t/pks/cmake-booleans CMake booleans
Edward Thomson 8eb1fc36 2020-03-06T17:12:18 Merge pull request #5439 from ignatenkobrain/patch-2 Set proper pkg-config dependency for pcre2
Edward Thomson 502e5d51 2020-03-01T12:44:39 httpclient: use a 16kb read buffer for macOS Use a 16kb read buffer for compatibility with macOS SecureTransport. SecureTransport `SSLRead` has the following behavior: 1. It will return _at most_ one TLS packet's worth of data, and 2. It will try to give you as much data as you asked for This means that if you call `SSLRead` with a buffer size that is smaller than what _it_ reads (in other words, the maximum size of a TLS packet), then it will buffer that data for subsequent calls. However, it will also attempt to give you as much data as you requested in your SSLRead call. This means that it will guarantee a network read in the event that it has buffered data. Consider our 8kb buffer and a server sending us 12kb of data on an HTTP Keep-Alive session. Our first `SSLRead` will read the TLS packet off the network. It will return us the 8kb that we requested and buffer the remaining 4kb. Our second `SSLRead` call will see the 4kb that's buffered and decide that it could give us an additional 4kb. So it will do a network read. But there's nothing left to read; that was the end of the data. The HTTP server is waiting for us to provide a new request. The server will eventually time out, our `read` system call will return, `SSLRead` can return back to us and we can make progress. While technically correct, this is wildly ineffecient. (Thanks, Tim Apple!) Moving us to use an internal buffer that is the maximum size of a TLS packet (16kb) ensures that `SSLRead` will never buffer and it will always return everything that it read (albeit decrypted).
Igor Gnatenko dd704944 2020-03-03T11:05:04 Set proper pkg-config dependency for pcre2 Signed-off-by: Igor Raits <i.gnatenko.brain@gmail.com>
Patrick Steinhardt a48da8fa 2020-02-25T22:49:16 Merge pull request #5417 from pks-t/pks/ntlmclient-htonll deps: ntlmclient: fix missing htonll symbols on FreeBSD and SunOS
Patrick Steinhardt ebade233 2020-02-24T21:49:43 transports: auth_ntlm: fix use of strdup/strndup In the NTLM authentication code, we accidentally use strdup(3P) and strndup(3P) instead of our own wrappers git__strdup and git__strndup, respectively. Fix the issue by using our own functions.
Patrick Steinhardt d8e71cb2 2020-02-24T21:07:34 cmake: fix ENABLE_TRACE parameter being too strict In order to check whether tracing support should be turned on, we check whether ENABLE_TRACE equals "ON". This is being much too strict, as CMake will also treat "on", "true", "yes" and others as true-ish, but passing them will disable tracing support now. Fix the issue by simply removing the STREQUAL, which will cause CMake to do the right thing automatically.
Sven Strickroth ff46c5d3 2020-02-20T20:47:22 Fix typo on GIT_USE_NEC Signed-off-by: Sven Strickroth <email@cs-ware.de>
Patrick Steinhardt 4f1923e8 2020-02-19T12:14:32 Merge pull request #5390 from pks-t/pks/sha1-lookup sha1_lookup: inline its only function into "pack.c"
Patrick Steinhardt 8aa04a37 2020-02-19T12:14:16 Merge pull request #5391 from pks-t/pks/coverity-fixes Coverity fixes
Patrick Steinhardt 0119e57d 2020-02-11T10:37:32 streams: openssl: switch approach to silence Valgrind errors As OpenSSL loves using uninitialized bytes as another source of entropy, we need to mark them as defined so that Valgrind won't complain about use of these bytes. Traditionally, we've been using the macro `VALGRIND_MAKE_MEM_DEFINED` provided by Valgrind, but starting with OpenSSL 1.1 the code doesn't compile anymore due to `struct SSL` having become opaque. As such, we also can't set it as defined anymore, as we have no way of knowing its size. Let's change gears instead by just swapping out the allocator functions of OpenSSL with our own ones. The twist is that instead of calling `malloc`, we just call `calloc` to have the bytes initialized automatically. Next to soothing Valgrind, this approach has the benefit of being completely agnostic of the memory sanitizer and is neatly contained at a single place. Note that we shouldn't do this for non-Valgrind builds. As we cannot set up memory functions for a given SSL context, only, we need to swap them at a global context. Furthermore, as it's possible to call `OPENSSL_set_mem_functions` once only, we'd prevent users of libgit2 to set up their own allocators.
Patrick Steinhardt 877054f3 2020-02-10T12:35:13 cmake: consolidate Valgrind option OpenSSL doesn't initialize bytes on purpose in order to generate additional entropy. Valgrind isn't too happy about that though, causing it to generate warninings about various issues regarding use of uninitialized bytes. We traditionally had some infrastructure to silence these errors in our OpenSSL stream implementation, where we invoke the Valgrind macro `VALGRIND_MAKE_MEMDEFINED` in various callbacks that we provide to OpenSSL. Naturally, we only include these instructions if a preprocessor define "VALGRIND" is set, and that in turn is only set if passing "-DVALGRIND" to CMake. We do that in our usual Azure pipelines, but we in fact forgot to do this in our nightly build. As a result, we get a slew of warnings for these nightly builds, but not for our normal builds. To fix this, we could just add "-DVALGRIND" to our nightly builds. But starting with commit d827b11b6 (tests: execute leak checker via CTest directly, 2019-06-28), we do have a secondary variable that directs whether we want to use memory sanitizers for our builds. As such, every user wishing to use Valgrind for our tests needs to pass both options "VALGRIND" and "USE_LEAK_CHECKER", which is cumbersome and error prone, as can be seen by our own builds. Instead, let's consolidate this into a single option, removing the old "-DVALGRIND" one. Instead, let's just add the preprocessor directive if USE_LEAK_CHECKER equals "valgrind" and remove "-DVALGRIND" from our own pipelines.
brian m. carlson 06f02300 2020-02-07T00:33:52 repository: handle format v1 Git has supported repository format version 1 for some time. This format is just like version 0, but it supports extensions. Implementations must reject extensions that they don't support. Add support for this format version and reject any extensions but extensions.noop, which is the only extension we currently support. While we're at it, also clean up an error message.
Patrick Steinhardt b3b92e09 2020-02-07T12:56:26 streams: openssl: ignore return value of `git_mutex_lock` OpenSSL pre-v1.1 required us to set up a locking function to properly support multithreading. The locking function signature cannot return any error codes, and as a result we can't do anything if `git_mutex_lock` fails. To silence static analysis tools, let's just explicitly ignore its return value by casting it to `void`.
Patrick Steinhardt 7d1b1774 2020-02-07T12:50:39 cache: fix invalid memory access in case updating cache entry fails When adding a new entry to our cache where an entry with the same OID exists already, then we only update the existing entry in case it is unparsed and the new entry is parsed. Currently, we do not check the return value of `git_oidmap_set` though when updating the existing entry. As a result, we will _not_ have updated the existing entry if `git_oidmap_set` fails, but have decremented its refcount and incremented the new entry's refcount. Later on, this may likely lead to dereferencing invalid memory. Fix the issue by checking the return value of `git_oidmap_set`. In case it fails, we will simply keep the existing stored instead, even though it's unparsed.
Patrick Steinhardt 775af015 2020-02-07T12:31:58 worktree: report errors when unable to read locking reason Git worktree's have the ability to be locked in order to spare them from deletion, e.g. if a worktree is absent due to being located on a removable disk it is a good idea to lock it. When locking such worktrees, it is possible to give a locking reason in order to help the user later on when inspecting status of any such locked trees. The function `git_worktree_is_locked` serves to read out the locking status. It currently does not properly report any errors when reading the reason file, and callers are unexpecting of any negative return values, too. Fix this by converting callers to expect error codes and checking the return code of `git_futils_readbuffer`.
Patrick Steinhardt 2288a713 2020-02-07T12:15:34 repository: check error codes when reading common link When checking whether a path is a valid repository path, we try to read the "commondir" link file. In the process, we neither confirm that constructing the file's path succeeded nor do we verify that reading the file succeeded, which might cause us to verify repositories on an empty or bogus path later on. Fix this by checking return values. As the function to verify repos doesn't currently support returning errors, this commit also refactors the function to return an error code, passing validity of the repo via an out parameter instead, and adjusts all existing callers.
Patrick Steinhardt b169cd52 2020-02-07T12:13:42 pack-objects: check return code of `git_zstream_set_input` While `git_zstream_set_input` cannot fail right now, it might change in the future if we ever decide to have it check its parameters more vigorously. Let's thus check whether its return code signals an error.
Patrick Steinhardt 90450d88 2020-02-07T12:10:12 indexer: check return code of `git_hash_ctx_init` Initialization of the hashing context may fail on some systems, most notably on Win32 via the legacy hashing context. As such, we need to always check the error code of `git_hash_ctx_init`, which is not done when creating a new indexer. Fix the issue by adding checks.
Patrick Steinhardt 6eebfc06 2020-02-07T11:57:48 push: check error code returned by `git_revwalk_hide` When queueing objects we want to push, we call `git_revwalk_hide` to hide all objects already known to the remote from our revwalk. We do not check its return value though, where the orginial intent was to ignore the case where the pushed OID is not a known committish. As `git_revwalk_hide` can fail due to other reasons like out-of-memory exceptions, we should still check its return value. Fix the issue by checking the function's return value, ignoring errors hinting that it's not a committish. As `git_revwalk__push_commit` currently clobbers these error codes, we need to adjust it as well in order to make it available downstream.
Patrick Steinhardt 31a577d0 2020-02-07T11:55:23 notes: check error code returned by `git_iterator_advance` When calling `git_note_next`, we end up calling `git_iterator_advance` but ignore its error code. The intent is that we do not want to return an error if it returns `GIT_ITEROVER`, as we want to return that value on the next invocation of `git_note_next`. We should still check for any other error codes returned by `git_iterator_advance` to catch unexpected internal errors. Fix this by checking the function's return value, ignoring `GIT_ITEROVER`.
Patrick Steinhardt 46228d86 2020-02-06T11:10:27 transports: http: fix custom headers not being applied In commit b9c5b15a7 (http: use the new httpclient, 2019-12-22), the HTTP code got refactored to extract a generic HTTP client that operates independently of the Git protocol. Part of refactoring was the creation of a new `git_http_request` struct that encapsulates the generation of requests. Our Git-specific HTTP transport was converted to use that in `generate_request`, but during the process we forgot to set up custom headers for the `git_http_request` and as a result we do not send out these headers anymore. Fix the issue by correctly setting up the request's custom headers and add a test to verify we correctly send them.
Patrick Steinhardt f0f1cd1d 2020-02-07T10:51:17 sha1_lookup: inline its only function into "pack.c" The file "sha1_lookup.c" contains a single function `sha1_position` only which is used only in the packfile implementation. As the function is comparatively small, to enable the compiler to optimize better and to remove symbol visibility, move it into "pack.c".
Patrick Steinhardt 93a9044f 2020-01-31T08:49:34 fetchhead: strip credentials from remote URL If fetching from an anonymous remote via its URL, then the URL gets written into the FETCH_HEAD reference. This is mainly done to give valuable context to some commands, like for example git-merge(1), which will put the URL into the generated MERGE_MSG. As a result, what gets written into FETCH_HEAD may become public in some cases. This is especially important considering that URLs may contain credentials, e.g. when cloning 'https://foo:bar@example.com/repo' we persist the complete URL into FETCH_HEAD and put it without any kind of sanitization into the MERGE_MSG. This is obviously bad, as your login data has now just leaked as soon as you do git-push(1). When writing the URL into FETCH_HEAD, upstream git does strip credentials first. Let's do the same by trying to parse the remote URL as a "real" URL, removing any credentials and then re-formatting the URL. In case this fails, e.g. when it's a file path or not a valid URL, we just fall back to using the URL as-is without any sanitization. Add tests to verify our behaviour.
Patrick Steinhardt aa4cd778 2020-01-30T10:40:44 Merge pull request #5336 from libgit2/ethomson/credtype cred: change enum to git_credential_t and GIT_CREDENTIAL_*