Log

Author Commit Date CI Message
Patrick Steinhardt 7574564e 2019-07-18T13:40:34 sha1: win32: fix compilation due to unknown type In commit bbf034ab9 (hash: move `git_hash_prov` into Win32 backend, 2019-02-22), the `git_hash_prov`'s structure name has been removed in favour of its typedef'ed name. But as we have no CI that compiles with the WinHTTPS hashing backend right now, it wasn't noticed that the implementation that uses this struct wasn't changed correctly. Fix the struct type to make it compile again.
Patrick Steinhardt b7c247b3 2019-07-18T13:37:02 cmake: include SHA1 headers into our source files When selecting the SHA1 backend, we only include the respective C implementation of the selected backend. But since commit bd48bf3fb (hash: introduce source files to break include circles, 2019-06-14), we have introduced separate headers and compilation units for all hashes. So by not including the headers, we may not honor them to compute whether a file needs to be recompiled and they also will not be displayed in IDEs. Add the header files to fix this problem.
Patrick Steinhardt 368b9795 2019-07-18T11:27:21 Merge pull request #5168 from tiennou/clar/fix-data-suite-count clar: correctly account for "data" suites when counting
Edward Thomson 51124a5b 2019-07-17T17:33:34 Merge pull request #5170 from bk2204/packbuilder-efficient-realloc Allocate memory more efficiently when packing objects
brian m. carlson c4df926b 2019-07-16T21:54:10 pack-objects: allocate memory more efficiently The packbuilder code allocates memory in chunks. When it needs to allocate, it tries to add 1024 to the number of objects and multiply by 3/2. However, it actually multiplies by 1 instead, since it performs an integral division in the expression "3 / 2" and only then multiplies by the increased number of objects. The current behavior causes the code to waste massive amounts of time copying memory when it reallocates, causing inserting all non-blob objects in the Linux repository into a new pack to take some indeterminate time greater than 5 minutes instead of 52 seconds. Correct this error by first dividing by two, and only then multiplying by 3. We still check for overflow for the multiplication, which is the only part that can overflow. This appears to be the only place in the code base which has this problem.
Etienne Samson 4cd8dfaa 2019-07-16T20:20:55 clar: correctly account for "data" suites when counting Failing to do that makes clar miss the last of the suites, as all duplicated "data" would have not been accounted for.
Patrick Steinhardt f92d495d 2019-07-12T10:48:14 Merge pull request #5131 from pks-t/pks/fileops-mkdir-in-root fileops: fix creation of directory in filesystem root
Patrick Steinhardt dacac9e1 2019-07-12T08:30:07 Merge pull request #5160 from pks-t/pks/win32-fuzzers win32: fix fuzzers and have CI build them
Patrick Steinhardt 5ae22a63 2019-06-21T08:13:31 fileops: fix creation of directory in filesystem root In commit 45f24e787 (git_repository_init: stop traversing at windows root, 2019-04-12), we have fixed `git_futils_mkdir` to correctly handle the case where we create a directory in Windows-style filesystem roots like "C:\repo". The problem here is an off-by-one: previously, to that commit, we've been checking wether the parent directory's length is equal to the root directory's length incremented by one. When we call the function with "/example", then the parent directory's length ("/") is 1, but the root directory offset is 0 as the path is directly rooted without a drive prefix. This resulted in `1 == 0 + 1`, which was true. With the change, we've stopped incrementing the root directory length, and thus now compare `1 <= 0`, which is false. The previous way of doing it was kind of finicky any non-obvious, which is also why the error was introduced. So instead of just re-adding the increment, let's explicitly add a condition that aborts finding the parent if the current parent path is "/". Making this change causes Azure Pipelines to fail the testcase repo::init::nonexistent_paths on Unix-based systems. This is because we have just fixed creating directories in the filesystem root, which previously didn't work. As Docker-based tests are running as root user, we are thus able to create the non-existing path and will now succeed to create the repository that was expected to actually fail. Let's split this up into three different tests: - A test to verify that we do not create repos in a non-existing parent directoy if the flag `GIT_REPOSITORY_INIT_MKPATH` is not set. - A test to verify that we fail if the root directory does not exist. As there is a common root directory on Unix-based systems that always exist, we can only test for this on Windows-based systems. - A test to verify that we fail if trying to create a repository in an unwriteable parent directory. We can only test this if not running tests as root user, as CAP_DAC_OVERRIDE will cause us to ignore permissions when creating files.
Patrick Steinhardt a6ad9e8a 2019-07-11T14:03:21 Merge pull request #5134 from pks-t/pks/config-parser-separation Config parser separation
Patrick Steinhardt dbeadf8a 2019-07-11T10:56:05 config_parse: provide parser init and dispose functions Right now, all configuration file backends are expected to directly mess with the configuration parser's internals in order to set it up. Let's avoid doing that by implementing both a `git_config_parser_init` and `git_config_parser_dispose` function to clearly define the interface between configuration backends and the parser. Ideally, we would make the `git_config_parser` structure definition private to its implementation. But as that would require an additional memory allocation that was not required before we just live with it being visible to others.
Patrick Steinhardt 32157526 2019-07-11T11:10:02 config_file: refactor error handling in `config_write` Error handling in `config_write` is rather convoluted and does not match our current code style. Refactor it to make it easier to understand.
Patrick Steinhardt 820fa1a3 2019-07-11T11:04:33 config_file: internalize `git_config_file` struct With the previous commits, we have finally separated the config parsing logic from the specific configuration file backend. Due to that, we can now move the `git_config_file` structure into the config file backend's implementation so that no other code may accidentally start using it again. Furthermore, we rename the structure to `diskfile` to make it obvious that it is internal, only, and to unify it with naming scheme of the other diskfile structures.
Patrick Steinhardt 6e6da75f 2019-07-11T11:00:05 config_parse: remove use of `git_config_file` The config parser code needs to keep track of the current parsed file's name so that we are able to provide proper error messages to the user. Right now, we do that by storing a `git_config_file` in the parser structure, but as that is a specific backend and the parser aims to be generic, it is a layering violation. Switch over to use a simple string to fix that.
Patrick Steinhardt 76749dfb 2019-06-21T12:33:31 config_parse: rename `data` parameter to `payload` for clarity By convention, parameters that get passed to callbacks are usually named `payload` in our codebase. Rename the `data` parameters in the configuration parser callbacks to `payload` to avoid confusion.
Patrick Steinhardt 54d350e0 2019-06-21T12:53:43 config_file: embed file in diskfile parse data The config file code needs to keep track of the actual `git_config_file` structure, as it not only contains the path of the current configuration file, but it also keeps tracks of all includes of that file. Right now, we keep track of that structure via the `git_config_parser`, but as that's supposed to be a backend generic implementation of configuration parsing it's a layering violation to have it in there. Switch over the config file backend to use its own config file structure that's embedded in the backend parse data. This allows us to switch over the generic config parser to avoid using the `git_config_file` structure.
Patrick Steinhardt ba9725a2 2019-07-11T10:48:49 Merge pull request #5132 from pks-t/pks/config-stat-cache config_file: implement stat cache to avoid repeated rehashing
Patrick Steinhardt 2ba7020f 2019-06-27T09:23:59 config_file: avoid re-reading files on write When we rewrite the configuration file due to any of its values being modified, we call `config_refresh` to update the in-memory representation of our config file backend. This is needlessly wasteful though, as `config_refresh` will always open the on-disk representation to reads the file contents while we already know the complete file contents at this point in time as we have just written it to disk. Implement a new function `config_refresh_from_buffer` that will refresh the backend's config entries from a buffer instead of from the config file itself. Note that this will thus _not_ update the backend's timestamp, which will cause us to re-read the buffer when performing a read operation on it. But this is still an improvement as we now lazily re-read the contents, and most importantly we will avoid constantly re-reading the contents if we perform multiple write operations. The following strace demonstrates this if we're re-writing a key multiple times. It uses our config example with `config_set` changed to update the file 10 times with different keys: $ strace lg2 config x.x z |& grep '^open.*config' open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 And now with the optimization of `config_refresh_from_buffer`: $ strace lg2 config x.x z |& grep '^open.*config' open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4 As can be seen, this is quite a lot of `open` calls less.
Patrick Steinhardt a0dc3027 2019-06-27T08:54:51 config_file: split out function that sets config entries Updating a config file backend's config entries is a bit more involved, as it requires clearing of the old config entries as well as handling locking correctly. As we will need this functionality in a future patch to refresh config entries from a buffer, let's extract this into its own function `config_set_entries`.
Patrick Steinhardt 985f5cdf 2019-06-27T08:41:16 config_file: split out function that reads entries from a buffer The `config_read` function currently performs both reading the on-disk config file as well as parsing the retrieved buffer contents. To optimize how we refresh our config entries from an in-memory buffer, we need to be able to directly parse buffers, though, without involving any on-disk files at all. Extract a new function `config_read_buffer` that sets up the parsing logic and then parses config entries from a buffer, only. Have `config_read` use it to avoid duplicated logic.
Patrick Steinhardt 3e1c137a 2019-06-27T08:24:21 config_file: move refresh into `write` function We are quite lazy in how we refresh our config file backend when updating any of its keys: instead of just updating our in-memory representation of the keys, we just discard the old set of keys and then re-read the config file contents from disk. This refresh currently happens separately at every callsite of `config_write`, but it is clear that we _always_ want to refresh if we have written the config file to disk. If we didn't, then we'd run around with an outdated config file backend that does not represent what we have on disk. By moving the refresh into `config_write`, we are also able to optimize the case where the config file is currently locked. Before, we would've tried to re-read the file even if we have only updated its cached contents without touching the on-disk file. Thus we'd have unnecessarily stat'd the file, even though we know that it shouldn't have been modified in the meantime due to its lock.
Patrick Steinhardt d7f58eab 2019-06-21T11:55:21 config_file: implement stat cache to avoid repeated rehashing To decide whether a config file has changed, we always hash its complete contents. This is unnecessarily expensive, as well-behaved filesystems will always update stat information for files which have changed. So before computing the hash, we should first check whether the stat info has actually changed for either the configuration file or any of its includes. This avoids having to re-read the configuration file and its includes every time when we check whether it's been modified. Tracing the for-each-ref example previous to this commit, one can see that we repeatedly re-open both the repo configuration as well as the global configuration: $ strace lg2 for-each-ref |& grep config access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) access("/home/pks/.config/git/config", F_OK) = 0 access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 access("/tmp/repo/.git/config", F_OK) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05290) = -1 ENOENT (No such file or directory) access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 access("/home/pks/.config/git/config", F_OK) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c051f0) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 With the change, we only do stats for those files and open them a single time, only: $ strace lg2 for-each-ref |& grep config access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) access("/home/pks/.config/git/config", F_OK) = 0 access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 access("/tmp/repo/.git/config", F_OK) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffe70540d20) = -1 ENOENT (No such file or directory) access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 access("/home/pks/.config/git/config", F_OK) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540ca0) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540c80) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 The following benchmark has been performed with and without the stat cache in a best-of-ten run: ``` int lg2_repro(git_repository *repo, int argc, char **argv) { git_config *cfg; int32_t dummy; int i; UNUSED(argc); UNUSED(argv); check_lg2(git_repository_config(&cfg, repo), "Could not obtain config", NULL); for (i = 1; i < 100000; ++i) git_config_get_int32(&dummy, cfg, "foo.bar"); git_config_free(cfg); return 0; } ``` Without stat cache: $ time lg2 repro real 0m1.528s user 0m0.568s sys 0m0.944s With stat cache: $ time lg2 repro real 0m0.526s user 0m0.268s sys 0m0.258s This benchmark shows a nearly three-fold performance improvement. This change requires that we check our configuration stress tests as we're now in fact becoming more racy. If somebody is writing a configuration file at nearly the same time (there is a window of 100ns on Windows-based systems), then it might be that we realize that this file has actually changed and thus may not re-read it. This will only happen if either an external process is rewriting the configuration file or if the same process has multiple `git_config` structures pointing to the same time, where one of both is being used to write and the other one is used to read values.
Patrick Steinhardt d0868646 2019-06-21T11:43:09 config: use `git_config_file` in favor of `struct config_file`
Patrick Steinhardt 8ee3d39a 2019-06-27T09:18:19 examples: implement config example Implement a new example that resembles git-config(1). Right now, this example can both read and set configuration keys, only.
Patrick Steinhardt df54c7fb 2019-06-27T07:34:43 cmake: report whether we are using sub-second stat information Depending on the platform and on build options, we may or may not build libgit2 with support for nanoseconds when using `stat` calls. It's currently unclear though whether sub-second stat information is used at all. Add feature info for this to tell at configure time whether it's being used or not.
Patrick Steinhardt eb27fb9b 2019-07-05T11:59:17 ci: build fuzzers on Powershell based build jobs In order to guarantee that our fuzzers build just fine on the Windows platform, let's enable building fuzzers on all Powershell-based builds.
Patrick Steinhardt 3c966fb4 2019-06-28T10:53:03 fuzzers: clean up header includes There's multiple headers included in our fuzzers that aren't required at all. Furthermore, some of them are not available on Win32, causing builds to fail. Remove them to fix this.
Patrick Steinhardt 9d43d45b 2019-06-28T12:10:51 fuzzers: use `git_buf_printf` instead of `snprintf` The `snprintf` function does not exist on Win32, it only has `_snprintf_s` available. Let's just avoid any cross-platform hassle and use our own `git_buf` functionality instead.
Patrick Steinhardt a6b2fffd 2019-06-28T11:04:21 fuzzers: use POSIX emulation layer to unlink files Use `p_unlink` instead of `unlink` to remove the generated packfiles in our packfile fuzzer. Like this, we do not have to worry about using proper includes that are known on all platforms, especially Win32.
Patrick Steinhardt 69055813 2019-06-28T10:50:01 fuzzers: make printf formatters cross-platform compatible The `printf` formatters in our standalone fuzzing driver are currently using the "%m" specifier, which is a GNU extension that prints the error message for the error code in `errno`. As we're using libgit2 functions in both cases anyway, let's just use `git_error_last` instead to make this valid on all platforms.
Patrick Steinhardt 48d56328 2019-06-28T10:47:37 fuzzers: implement `mkdtemp` alternative for Win32 The `mkdtemp` function is not available on Windows, so our download_refs fuzzer will fail to compile on Windows. Provide an alternative implementation to fix it.
Patrick Steinhardt 398412cc 2019-07-05T11:56:16 Merge pull request #5143 from libgit2/ethomson/warnings ci: build with ENABLE_WERROR on Windows
Patrick Steinhardt a3afda9f 2019-06-28T11:50:32 tests: trace: fix parameter type of aux callback The function `git_win32__stack__set_aux_cb` expects the second parameter to be a function callback of type `git_win32__stack__aux_cb_lookup`, which expects a `size_t` parameter. In our test suite trace::windows::stacktrace, we declare the callback with `unsigned int` as parameter, though, causing a compiler warning. Correct the parameter type to silence the warning.
Patrick Steinhardt 2f14c4fc 2019-06-28T14:39:20 w32_stack: convert buffer length param to `size_t` In both `git_win32__stack_format` and `git_win32__stack`, we handle buffer lengths via an integer variable. As we only ever pass buffer sizes to it, this should be a `size_t` though to avoid loss of precision. As we also use it to compare with other `size_t` variables, this also silences signed/unsigned comparison warnings.
Patrick Steinhardt 77d7e5eb 2019-06-27T15:29:36 clar: use `size_t` to keep track of current line number We use the `__LINE__` macro in several places throughout clar to allow easier traceability when e.g. a test fails. While `__LINE__` is of type `size_t`, the clar functions all accept an integer and thus may loose precision. While unlikely that any file in our codebase will exceed a linecount of `INT_MAX`, let's convert it anyway to silence any compiler warnings.
Patrick Steinhardt 2dea4736 2019-06-27T15:27:29 examples: avoid warning when iterating over index entries When iterating over index entries, we store the indices in an unsigned int. As the index entrycount is a `size_t` though, this may be a loss of precision which a compiler might rightfully complain about. Use `size_t` instead to fix any warnings.
Patrick Steinhardt abf24a30 2019-06-27T15:25:17 examples: avoid conversion warnings when calculating progress When computing the progress, we perform some arithmetics that are implicitly converting from `size_t` to `int`. In one case we're calclulating a percentage, so we know that it should always be in the range of [0,100] and thus we're fine. In the other case we convert from bytes to kilobytes -- this should be stored in a `size_t` to avoid loss of precision, even though it probably won't matter due to limited download rates.
Patrick Steinhardt e7bb1fe8 2019-06-27T15:14:08 examples: avoid passing signed integer to `memchr` The memchr(3P) function expects a `size_t` as its last parameter, but we do pass it an object size, which is of signed type `git_off_t`. As we can be sure that the result will be non-negative, let's just cast the parameter to a `size_t`.
Patrick Steinhardt 976eed80 2019-06-27T15:12:11 examples: cast away constness for reallocating head arrays When reallocating commit arrays in `opts_add_commit` and `opts_add_refish`, respectively, we simply pass the const pointer to `xrealloc`. As `xrealloc` expects a non-const pointer, though, this will generate a warning with some compilers. Cast away the constness to silence compilers.
Patrick Steinhardt 5c87b5a8 2019-07-04T12:19:07 Merge pull request #5152 from csware/attr-system-attr-file Fix Regression: attr: Correctly load system attr file (on Windows)
Patrick Steinhardt c87abeca 2019-07-04T11:45:02 tests: attr: add tests for system-level attributes There were no tests that verified that system-level gitattributes files get handled correctly. In fact, we have recently introduced a regression that caused us to abort if there was a system-level gitattributes file available. Add two tests that verify that we're able to handle system-level gitattributes files. The test attr::repo::sysdir_with_session would've failed without the fix to the described regression.
Patrick Steinhardt 1bbec26d 2019-07-04T11:41:21 attr_file: completely initialize attribute sessions The function `git_attr_session__init` is currently only initializing setting up the attribute's session key by incrementing the repo-global key by one. Most notably, all other members of the `git_attr_session` struct are not getting initialized at all. So if one is to allocate a session on the stack and then calls `git_attr_session__init`, the session will still not be fully initialized. We have fared just fine with that until now as all users of the function have allocated the session structure as part of bigger structs with `calloc`, and thus its contents have been zero-initialized implicitly already. Fix this by explicitly zeroing out the session to enable allocation of sessions on the stack.
Sven Strickroth 18a6d9f3 2019-06-29T16:19:08 attr: Don't fail in attr_setup if there exists a system attributes file Regression introduced in commit 5452e49fce21f726bec19519da7f012e3f19e736 on PR #4967. Signed-off-by: Sven Strickroth <email@cs-ware.de>
Patrick Steinhardt c4c1500a 2019-06-27T14:18:19 Merge pull request #5145 from pks-t/pks/hash-algo-uninit-return hash: fix missing error return on production builds
Patrick Steinhardt 7fd3f32b 2019-06-27T13:54:55 hash: fix missing error return on production builds When no hash algorithm has been initialized in a given hash context, then we will simply `assert` and not return a value at all. This works just fine in debug builds, but on non-debug builds the assert will be converted to a no-op and thus we do not have a proper return value. Fix this by returning an error code in addition to the asserts.
Patrick Steinhardt 73427b94 2019-06-27T13:23:17 Merge pull request #5142 from scottfurry/StaticChkFixExamples Resolve static check warnings in example code
Scott Furry 2ba7dbbe 2019-06-24T14:55:15 Resolve static check warnings in example code Using cppcheck on libgit2 sources indicated two warnings in example code. merge.c was reported as having a memory leak. Fix applied was to `free()` memory pointed to by `parents`. init.c was reported as having a null pointer dereference on variable arg. Function 'usage' was being called with a null variable. Changed supplied parameter to empty string.
Patrick Steinhardt e9102def 2019-06-27T11:38:04 Merge pull request #4438 from pks-t/pks/hash-algorithm Multiple hash algorithms
Patrick Steinhardt b6625a3b 2019-06-27T10:12:16 Merge pull request #5128 from tiennou/fix/docs More documentation
Patrick Steinhardt 3d22394a 2019-06-27T10:11:23 Merge pull request #4967 from tiennou/fix/4671 Incomplete commondir support
Etienne Samson 33448b45 2019-06-19T19:46:12 docs: More of it
Etienne Samson 501c51b2 2019-06-26T14:49:50 repo: commondir resolution can sometimes fallback to the repodir For example, https://git-scm.com/docs/gitrepository-layout says: info Additional information about the repository is recorded in this directory. This directory is ignored if $GIT_COMMON_DIR is set and "$GIT_COMMON_DIR/info" will be used instead. So when looking for `info/attributes`, we need to check the commondir first, or fallback to "our" `info/attributes`.
Etienne Samson 9f723c97 2019-06-26T14:49:37 docs: fixups
Etienne Samson b883d370 2019-06-26T14:49:30 ignore: fix a missing commondir causing failures As with the preceding commit, the ignore code tries to load code from info/exclude, and we fail to ignore a non-existent file here.
Patrick Steinhardt 82c7a9bc 2019-06-26T14:49:24 attr: fix attribute lookup if repo has no common directory If creating a repository without a common directory (e.g. by using `git_repository_new`), then `git_repository_item_path` will return `GIT_ENOTFOUND` for every file that's usually located in this directory. While we do not care for this case when looking up the "info/attributes" file, we fail to properly ignore these errors when setting up or collecting attributes files. Thus, the gitattributes lookup is broken and will only ever return `GIT_ENOTFOUND`. Fix this issue by properly ignoring `GIT_ENOTFOUND` returned by `git_repository_item_path`.
Patrick Steinhardt 5452e49f 2019-06-26T14:49:17 attr: refactor setup to match current coding style The code in the `attr_setup` function is not really matching our current coding style. Besides alignment issues, it's also hard to see what functions calls depend on one another because they're split up over multiple conditional statements. Fix these issues by grouping together dependent function calls and adjusting the alignment.
Edward Thomson b6b2d9d7 2019-06-25T15:05:23 examples: ssize_t is signed, not unsigned
Edward Thomson cd67a903 2019-06-25T14:55:51 examples: cast away const-ness
Edward Thomson 1118dd9a 2019-06-25T14:50:12 examples: don't lose `const`
Edward Thomson ede458b4 2019-06-25T14:48:10 example: use `git_off_t` for the object size
Edward Thomson f48cf5b3 2019-06-25T14:46:31 w32_stack: treat a len as an size_t
Edward Thomson 6b224054 2019-06-25T14:41:36 ci: build with ENABLE_WERROR on Windows Build with -Werror's equivalent (/WX) on MSVC
Edward Thomson c5448501 2019-06-25T12:55:10 Merge pull request #5078 from libgit2/ethomson/warnings Remove warnings
Edward Thomson a064920e 2019-06-24T23:27:47 Merge pull request #5140 from libgit2/ethomson/flaky_ci Re-run flaky tests
Edward Thomson c7b4ce55 2019-06-23T20:22:29 ci: add flaky test re-execution on Windows Our online tests are occasionally flaky since they hit real network endpoints. Re-run them up to 5 times if they fail, to allow us to avoid having to fail the whole build.
Edward Thomson 6d8a34ad 2019-06-23T19:57:05 ci: add flaky test re-execution on Unix Our online tests are occasionally flaky since they hit real network endpoints. Re-run them up to 5 times if they fail, to allow us to avoid having to fail the whole build.
Patrick Steinhardt b7187ed7 2019-02-22T14:38:31 hash: add ability to distinguish algorithms Create an enum that allows us to distinguish between different hashing algorithms. This enum is embedded into each `git_hash_ctx` and will instruct the code to which hashing function the particular request shall be dispatched. As we do not yet have multiple hashing algorithms, we simply initialize the hash algorithm to always be SHA1. At a later point, we will have to extend the `git_hash_init_ctx` function to get as parameter which algorithm shall be used.
Patrick Steinhardt 8832172e 2019-02-22T14:32:40 hash: move SHA1 implementations to its own hashing context Create a separate `git_hash_sha1_ctx` structure that is specific to the SHA1 implementation and move all SHA1 functions over to use that one instead of the generic `git_hash_ctx`. The `git_hash_ctx` for now simply has a union containing this single SHA1 implementation, only, without any mechanism to distinguish between different algortihms.
Patrick Steinhardt d46d3b53 2019-04-05T10:59:46 hash: split into generic and SHA1-specific interface As a preparatory step to allow multiple hashing APIs to exist at the same time, split the hashing functions into one layer for generic hashing and one layer for SHA1-specific hashing. Right now, this is simply an additional indirection layer that doesn't yet serve any purpose. In the future, the generic API will be extended to allow for choosing which hash to use, though, by simply passing an enum to the hash context initialization function. This is necessary as a first step to be ready for Git's move to SHA256.
Patrick Steinhardt fda20622 2019-06-14T14:22:19 hash: move SHA1 implementations into 'sha1/' folder As we will include additional hash algorithms in the future due to upstream git discussing a move away from SHA1, we should accomodate for that and prepare for the move. As a first step, move all SHA1 implementations into a common subdirectory. Also, create a SHA1-specific header file that lives inside the hash folder. This header will contain the SHA1-specific header includes, function declarations and the SHA1 context structure.
Edward Thomson e61b92e0 2019-06-24T15:22:39 clang: disable documentation-deprecated-sync Add the `-Wno-documentation-deprecated-sync` switch when compiling with clang, since our documentation adds `deprecated` markers, but we do not add the deprecation attribute in the code itself. (ie, the code is out of sync with the docs). In fact, we do not _want_ to mark these items as deprecated in the code, at least not yet, as we are not quite ready to bother our end-users with this since they're not going away.
Edward Thomson 9f3441cc 2019-06-15T22:37:11 zlib: remove unused functions
Edward Thomson c512d58f 2019-06-15T22:26:23 win32: cast WinAPI to void * before casting GetProcAddress is prototyped to return a `FARPROC`, which is meant to be a generic function pointer. It's literally `int (FAR WINAPI * FARPROC)()` which gcc complains if you attempt to cast to a `void (*)(GIT_SRWLOCK *)`. Cast to a `void *` before casting to avoid warnings about the arguments.
Edward Thomson 38e6b287 2019-06-15T22:25:10 mingw: use `struct stat`
Edward Thomson 759ec7d4 2019-06-15T22:01:00 win32: cast GetProcAddress to void * before casting GetProcAddress is prototyped to return a `FARPROC`, which is meant to be a generic function pointer. It's literally `int (FAR WINAPI * FARPROC)()` which gcc complains if you attempt to cast to a `void (*)(GIT_SRWLOCK *)`. Cast to a `void *` before casting to avoid warnings about the arguments.
Edward Thomson 3cd123e9 2019-06-15T21:56:53 win32: define DWORD_MAX if it's not defined MinGW does not define DWORD_MAX. Specify it when it's not defined.
Edward Thomson d93b0aa0 2019-06-15T21:47:40 win32: decorate unused parameters
Edward Thomson 54a60ced 2019-06-15T21:45:26 mingw: disable format specification warnings MinGW uses gcc, which expects POSIX formatting for printf, but uses the Windows C library, which uses its own format specifiers. Therefore, it gets confused about format specifiers. Disable warnings for format specifiers.
Edward Thomson e2aba8ba 2019-06-15T20:45:22 wildmatch: explicitly cast to int
Edward Thomson 3dd1942b 2019-06-15T20:43:13 win32: don't re-define RtlCaptureStackBackTrace RtlCaptureStackBackTrace is well-defined in Windows, no need to redefine it.
Edward Thomson cc9e47c9 2019-06-15T18:51:40 win32: support upgrading warnings to errors (/WX) For MSVC, support warnings as errors by providing the /WX compiler flags. (/WX is the moral equivalent of -Werror.) Disable warnings as errors ass part of xdiff, since it contains warnings. But as a component of git itself, we want to avoid skew and keep our implementation as similar as possible to theirs. We'll work with upstream to fix these issues, but in the meantime, simply let those continue to warn.
Edward Thomson f6530438 2019-05-25T16:44:59 win32: stop inlining file_attribute_to_stat Move `git_win32__file_attribute_to_stat` to a regular function instead of an inlined function. This helps avoid header ordering issues and declarations.
Patrick Steinhardt bd48bf3f 2019-06-14T14:21:32 hash: introduce source files to break include circles The hash source files have circular include dependencies right now, which shows by our broken generic hash implementation. The "hash.h" header declares two functions and the `git_hash_ctx` typedef before actually including the hash backend header and can only declare the remaining hash functions after the include due to possibly static function declarations inside of the implementation includes. Let's break this cycle and help maintainability by creating a real implementation file for each of the hash implementations. Instead of relying on the exact include order, we now especially avoid the use of `GIT_INLINE` for function declarations.
Patrick Steinhardt bbf034ab 2019-02-22T13:43:16 hash: move `git_hash_prov` into Win32 backend The structure `git_hash_prov` is only ever used by the Win32 SHA1 backend. As such, it doesn't make much sense to expose it via the generic "hash.h" header, as it is an implementation detail of the Win32 backend only. Move the typedef of `git_hash_prov` into "hash/sha1/win32.h" to fix this.
Edward Thomson b11eb08f 2019-05-21T14:39:55 config parse: safely cast to int
Edward Thomson 6b349ecc 2019-05-21T14:36:57 odb loose: only read at most INT_MAX
Edward Thomson 8c925ef8 2019-05-21T14:30:28 smart protocol: validate progress message length Ensure that the server has not sent us overly-large sideband messages (ensure that they are no more than `INT_MAX` bytes), then cast to `int`.
Edward Thomson 7afe788c 2019-05-21T14:27:46 smart transport: use size_t for sizes
Edward Thomson db7f1d9b 2019-05-21T14:21:58 local transport: cast message size to int explicitly Our progress information messages are short (and bounded by their format string), cast the length to int for callers.
Edward Thomson 8048ba70 2019-05-21T14:18:40 winhttp: safely cast length to DWORD
Edward Thomson db6b8f7d 2019-05-21T14:15:58 strtol: cast error message length to int
Edward Thomson d103f008 2019-05-21T13:44:47 pool: use `size_t` for sizes
Edward Thomson c4a64b1b 2019-05-21T13:27:39 tree-cache: safely cast to uint32_t
Edward Thomson 2375be48 2019-05-21T12:57:28 tree: return `size_t` for treebuilder entrycount We keep the treebuilder entrycount as a `size_t` - return that instead of downcasting to an `unsigned int`. Callers who were storing this value in an `unsigned int` will continue to downcast themselves, so there should be no behavior change for callers.
Edward Thomson ad6f2153 2019-05-21T12:50:46 utf8: use size_t for length of buffer The `git__utf8_charlen` now takes `size_t` as the buffer length, since it contains the full length of the buffer at the current position. It now returns `-1` in all cases where utf8 codepoints are invalid, since callers only care about a valid length of a sequence of codepoints, or if the current position is not valid utf8.
Edward Thomson 5d5b76df 2019-05-21T12:35:19 worktree: use size_t for sizes
Edward Thomson f7597410 2019-05-21T10:57:30 netops: safely cast to int Only read at most INT_MAX from the underlying stream, so that we can accurately return the number of bytes read. Since callers are not guaranteed to get as many bytes as requested (due to availability of input), this is safe and callers should call in a loop until EOF.
Edward Thomson cfd44d6a 2019-05-20T07:57:46 trailer: use size_t for sizes
Edward Thomson fc3a94ba 2019-05-20T07:13:42 repository: use size_t for length
Edward Thomson b4a173b5 2019-05-20T07:12:36 rebase: use size_t for path length