src/config_parse.h


Log

Author Commit Date CI Message
Patrick Steinhardt e54343a4 2019-06-29T09:17:32 fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.
Patrick Steinhardt dbeadf8a 2019-07-11T10:56:05 config_parse: provide parser init and dispose functions Right now, all configuration file backends are expected to directly mess with the configuration parser's internals in order to set it up. Let's avoid doing that by implementing both a `git_config_parser_init` and `git_config_parser_dispose` function to clearly define the interface between configuration backends and the parser. Ideally, we would make the `git_config_parser` structure definition private to its implementation. But as that would require an additional memory allocation that was not required before we just live with it being visible to others.
Patrick Steinhardt 820fa1a3 2019-07-11T11:04:33 config_file: internalize `git_config_file` struct With the previous commits, we have finally separated the config parsing logic from the specific configuration file backend. Due to that, we can now move the `git_config_file` structure into the config file backend's implementation so that no other code may accidentally start using it again. Furthermore, we rename the structure to `diskfile` to make it obvious that it is internal, only, and to unify it with naming scheme of the other diskfile structures.
Patrick Steinhardt 6e6da75f 2019-07-11T11:00:05 config_parse: remove use of `git_config_file` The config parser code needs to keep track of the current parsed file's name so that we are able to provide proper error messages to the user. Right now, we do that by storing a `git_config_file` in the parser structure, but as that is a specific backend and the parser aims to be generic, it is a layering violation. Switch over to use a simple string to fix that.
Patrick Steinhardt 76749dfb 2019-06-21T12:33:31 config_parse: rename `data` parameter to `payload` for clarity By convention, parameters that get passed to callbacks are usually named `payload` in our codebase. Rename the `data` parameters in the configuration parser callbacks to `payload` to avoid confusion.
Patrick Steinhardt d7f58eab 2019-06-21T11:55:21 config_file: implement stat cache to avoid repeated rehashing To decide whether a config file has changed, we always hash its complete contents. This is unnecessarily expensive, as well-behaved filesystems will always update stat information for files which have changed. So before computing the hash, we should first check whether the stat info has actually changed for either the configuration file or any of its includes. This avoids having to re-read the configuration file and its includes every time when we check whether it's been modified. Tracing the for-each-ref example previous to this commit, one can see that we repeatedly re-open both the repo configuration as well as the global configuration: $ strace lg2 for-each-ref |& grep config access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) access("/home/pks/.config/git/config", F_OK) = 0 access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 access("/tmp/repo/.git/config", F_OK) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05290) = -1 ENOENT (No such file or directory) access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 access("/home/pks/.config/git/config", F_OK) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c051f0) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 With the change, we only do stats for those files and open them a single time, only: $ strace lg2 for-each-ref |& grep config access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) access("/home/pks/.config/git/config", F_OK) = 0 access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 access("/tmp/repo/.git/config", F_OK) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/home/pks/.gitconfig", 0x7ffe70540d20) = -1 ENOENT (No such file or directory) access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 access("/home/pks/.config/git/config", F_OK) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540ca0) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540c80) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0 stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory) stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory) stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0 The following benchmark has been performed with and without the stat cache in a best-of-ten run: ``` int lg2_repro(git_repository *repo, int argc, char **argv) { git_config *cfg; int32_t dummy; int i; UNUSED(argc); UNUSED(argv); check_lg2(git_repository_config(&cfg, repo), "Could not obtain config", NULL); for (i = 1; i < 100000; ++i) git_config_get_int32(&dummy, cfg, "foo.bar"); git_config_free(cfg); return 0; } ``` Without stat cache: $ time lg2 repro real 0m1.528s user 0m0.568s sys 0m0.944s With stat cache: $ time lg2 repro real 0m0.526s user 0m0.268s sys 0m0.258s This benchmark shows a nearly three-fold performance improvement. This change requires that we check our configuration stress tests as we're now in fact becoming more racy. If somebody is writing a configuration file at nearly the same time (there is a window of 100ns on Windows-based systems), then it might be that we realize that this file has actually changed and thus may not re-read it. This will only happen if either an external process is rewriting the configuration file or if the same process has multiple `git_config` structures pointing to the same time, where one of both is being used to write and the other one is used to read values.
Patrick Steinhardt d0868646 2019-06-21T11:43:09 config: use `git_config_file` in favor of `struct config_file`
Patrick Steinhardt b9affa32 2018-08-10T19:23:00 config_parse: avoid unused static declared values The variables `git_config_escaped` and `git_config_escapes` are both defined as static const character pointers in "config_parse.h". In case where "config_parse.h" is included but those two variables are not being used, the compiler will thus complain about defined but unused variables. Fix this by declaring them as external and moving the actual initialization to the C file. Note that it is not possible to simply make this a #define, as we are indexing into those arrays.
Patrick Steinhardt e51e29e8 2017-11-12T13:59:47 config_parse: have `git_config_parse` own entry value and name The function `git_config_parse` uses several callbacks to pass data along to the caller as it parses the file. One design shortcoming here is that strings passed to those callbacks are expected to be freed by them, which is really confusing. Fix the issue by changing memory ownership here. Instead of expecting the `on_variable` callbacks to free memory for `git_config_parse`, just do it inside of `git_config_parse`. While this obviously requires a bit more memory allocation churn due to having to copy both name and value at some places, this shouldn't be too much of a burden.
Edward Thomson abb04caa 2018-02-01T15:55:48 consistent header guards use consistent names for the #include / #define header guard pattern.
Patrick Steinhardt 9e66590b 2017-07-21T13:01:43 config_parse: use common parser interface As the config parser is now cleanly separated from the config file code, we can easily refactor the code and make use of the common parser module. This removes quite a lot of duplicated functionality previously used for handling the actual parser state and replaces it with the generic interface provided by the parser context.
Patrick Steinhardt 1953c68b 2017-11-11T17:12:31 config_file: split out module to parse config files The configuration file code grew quite big and intermingles both actual configuration logic as well as the parsing logic of the configuration syntax. This makes it hard to refactor the parsing logic on its own and convert it to make use of our new parsing context module. Refactor the code and split it up into two parts. The config file code will only handle actual handling of configuration files, includes and writing new files. The newly created config parser module is then only responsible for parsing the actual contents of a configuration file, leaving everything else to callbacks provided to its provided function `git_config_parse`.