src/config_parse.c


Log

Author Commit Date CI Message
Edward Thomson f0e693b1 2021-09-07T17:53:49 str: introduce `git_str` for internal, `git_buf` is external libgit2 has two distinct requirements that were previously solved by `git_buf`. We require: 1. A general purpose string class that provides a number of utility APIs for manipulating data (eg, concatenating, truncating, etc). 2. A structure that we can use to return strings to callers that they can take ownership of. By using a single class (`git_buf`) for both of these purposes, we have confused the API to the point that refactorings are difficult and reasoning about correctness is also difficult. Move the utility class `git_buf` to be called `git_str`: this represents its general purpose, as an internal string buffer class. The name also is an homage to Junio Hamano ("gitstr"). The public API remains `git_buf`, and has a much smaller footprint. It is generally only used as an "out" param with strict requirements that follow the documentation. (Exceptions exist for some legacy APIs to avoid breaking callers unnecessarily.) Utility functions exist to convert a user-specified `git_buf` to a `git_str` so that we can call internal functions, then converting it back again.
Basile Henry 574c590f 2021-09-09T21:53:45 Fix multiline strip_comments logic The strip_comments function uses the count of quotes to know if a comment char (';' or '#') is the start of a comment or part of the multiline as a string. Unfortunately converting the count of quotes from previous lines to a boolean meant that it would only work as expected in some cases (0 quotes or an odd number of quotes).
Edward Thomson 14f6950b 2021-05-10T23:14:17 buf: bom enum is in the buf namespace Instead of a `git_bom_t` that a `git_buf` function returns, let's keep it `git_buf_bom_t`.
Edward Thomson d525e063 2021-05-10T23:04:59 buf: remove internal `git_buf_text` namespace The `git_buf_text` namespace is unnecessary and strange. Remove it, just keep the functions prefixed with `git_buf`.
Edward Thomson e7604da8 2020-04-05T14:51:56 config: use GIT_ASSERT
Sven Strickroth 6ac18625 2020-09-09T17:51:38 Fix config file parsing with multi line values containing quoted parts Signed-off-by: Sven Strickroth <email@cs-ware.de>
Patrick Steinhardt dbeadf8a 2019-07-11T10:56:05 config_parse: provide parser init and dispose functions Right now, all configuration file backends are expected to directly mess with the configuration parser's internals in order to set it up. Let's avoid doing that by implementing both a `git_config_parser_init` and `git_config_parser_dispose` function to clearly define the interface between configuration backends and the parser. Ideally, we would make the `git_config_parser` structure definition private to its implementation. But as that would require an additional memory allocation that was not required before we just live with it being visible to others.
Patrick Steinhardt 6e6da75f 2019-07-11T11:00:05 config_parse: remove use of `git_config_file` The config parser code needs to keep track of the current parsed file's name so that we are able to provide proper error messages to the user. Right now, we do that by storing a `git_config_file` in the parser structure, but as that is a specific backend and the parser aims to be generic, it is a layering violation. Switch over to use a simple string to fix that.
Patrick Steinhardt 76749dfb 2019-06-21T12:33:31 config_parse: rename `data` parameter to `payload` for clarity By convention, parameters that get passed to callbacks are usually named `payload` in our codebase. Rename the `data` parameters in the configuration parser callbacks to `payload` to avoid confusion.
Edward Thomson b11eb08f 2019-05-21T14:39:55 config parse: safely cast to int
Edward Thomson 355b02a1 2019-05-22T11:48:28 config: rename subsection header parser func The `parse_section_header_ext` name suggests that it as an extended function for parsing the section header. It is not. Rename it to `parse_subsection_header` to better reflect its true mission.
Edward Thomson 23c5699e 2019-05-16T09:37:25 config: validate quoted section value When we reach a whitespace after a section name, we assume that what will follow will be a quoted subsection name. Pass the current position of the line being parsed to the subsection parser, so that it can validate that subsequent characters are additional whitespace or a single quote. Previously we would begin parsing after the section name, looking for the first quotation mark. This allows invalid characters to embed themselves between the end of the section name and the first quotation mark, eg `[section foo "subsection"]`, which is illegal.
Edward Thomson b83bd037 2019-05-16T08:57:10 config: don't write invalid column When we don't specify a particular column, don't write it in the error message. (column "0" is unhelpful.)
Edward Thomson 42dd38dd 2019-05-16T08:55:40 config: lowercase error messages Update the configuration parsing error messages to be lower-cased for consistency with the rest of the library.
Edward Thomson f673e232 2018-12-27T13:47:34 git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
Carlos Martín Nieto 1d718fa5 2018-09-28T11:55:34 config: variables might appear on the same line as a section header While rare and a machine would typically not generate such a configuration file, it is nevertheless valid to write [foo "bar"] baz = true and we need to deal with that instead of assuming everything is on its own line.
Patrick Steinhardt 2be39cef 2018-08-10T19:38:57 config: introduce new read-only in-memory backend Now that we have abstracted away how to store and retrieve config entries, it became trivial to implement a new in-memory backend by making use of this. And thus we do so. This commit implements a new read-only in-memory backend that can parse a chunk of memory into a `git_config_backend` structure.
Patrick Steinhardt b9affa32 2018-08-10T19:23:00 config_parse: avoid unused static declared values The variables `git_config_escaped` and `git_config_escapes` are both defined as static const character pointers in "config_parse.h". In case where "config_parse.h" is included but those two variables are not being used, the compiler will thus complain about defined but unused variables. Fix this by declaring them as external and moving the actual initialization to the C file. Note that it is not possible to simply make this a #define, as we are indexing into those arrays.
Patrick Steinhardt bc63e1ef 2018-09-03T10:49:46 config_parse: refactor error handling when parsing multiline variables The current error handling for the multiline variable parser is a bit fragile, as each error condition has its own code to clear memory. Instead, unify error handling as far as possible to avoid this repetitive code. While at it, make use of `GITERR_CHECK_ALLOC` to correctly handle OOM situations and verify that the buffer we print into does not run out of memory either.
Nelson Elhage 38b85255 2018-09-01T03:50:26 config: Fix a leak parsing multi-line config entries
Nelson Elhage a03113e8 2018-08-25T17:04:39 config: convert unbounded recursion into a loop
Nelson Elhage b8a67eda 2018-07-22T23:47:12 Fix a double-free in config parsing
Patrick Steinhardt 83b5f161 2017-11-12T14:09:24 config_parse: always sanitize out-parameters in `parse_variable` The `parse_variable` function has two out parameters `var_name` and `var_value`. Currently, those are not being sanitized to `NULL`. when. any error happens inside of the `parse_variable` function. Fix that. While at it, the coding style is improved to match our usual coding practices more closely.
Patrick Steinhardt e51e29e8 2017-11-12T13:59:47 config_parse: have `git_config_parse` own entry value and name The function `git_config_parse` uses several callbacks to pass data along to the caller as it parses the file. One design shortcoming here is that strings passed to those callbacks are expected to be freed by them, which is really confusing. Fix the issue by changing memory ownership here. Instead of expecting the `on_variable` callbacks to free memory for `git_config_parse`, just do it inside of `git_config_parse`. While this obviously requires a bit more memory allocation churn due to having to copy both name and value at some places, this shouldn't be too much of a burden.
Patrick Steinhardt ecf4f33a 2018-02-08T11:14:48 Convert usage of `git_buf_free` to new `git_buf_dispose`
Patrick Steinhardt ba4faf6e 2018-02-08T17:15:33 buf_text: remove `offset` parameter of BOM detection function The function to detect a BOM takes an offset where it shall look for a BOM. No caller uses that, and searching for the BOM in the middle of a buffer seems to be very unlikely, as a BOM should only ever exist at file start. Remove the parameter, as it has already caused confusion due to its weirdness.
Patrick Steinhardt 2eea5f1c 2018-02-08T10:27:31 config_parse: fix reading files with BOM The function `skip_bom` is being used to detect and skip BOM marks previously to parsing a configuration file. To do so, it simply uses `git_buf_text_detect_bom`. But since the refactoring to use the parser interface in commit 9e66590bd (config_parse: use common parser interface, 2017-07-21), the BOM detection was actually broken. The issue stems from a misunderstanding of `git_buf_text_detect_bom`. It was assumed that its third parameter limits the length of the character sequence that is to be analyzed, while in fact it was an offset at which we want to detect the BOM. Fix the parameter to be `0` instead of the buffer length, as we always want to check the beginning of the configuration file.
Patrick Steinhardt 848153f3 2018-02-08T10:02:29 config_parse: handle empty lines with CRLF Currently, the configuration parser will fail reading empty lines with just an CRLF-style line ending. Special-case the '\r' character in order to handle it the same as Unix-style line endings. Add tests to spot this regression in the future.
Patrick Steinhardt 5340ca77 2018-02-08T09:31:51 config_parse: add comment to clarify logic getting next character Upon each line, the configuration parser tries to get either the first non-whitespace character or the first whitespace character, in case there is no non-whitespace character. The logic handling this looks rather odd and doesn't immediately convey this meaning, so add a comment to clarify what happens.
Patrick Steinhardt 9e66590b 2017-07-21T13:01:43 config_parse: use common parser interface As the config parser is now cleanly separated from the config file code, we can easily refactor the code and make use of the common parser module. This removes quite a lot of duplicated functionality previously used for handling the actual parser state and replaces it with the generic interface provided by the parser context.
Patrick Steinhardt 1953c68b 2017-11-11T17:12:31 config_file: split out module to parse config files The configuration file code grew quite big and intermingles both actual configuration logic as well as the parsing logic of the configuration syntax. This makes it hard to refactor the parsing logic on its own and convert it to make use of our new parsing context module. Refactor the code and split it up into two parts. The config file code will only handle actual handling of configuration files, includes and writing new files. The newly created config parser module is then only responsible for parsing the actual contents of a configuration file, leaving everything else to callbacks provided to its provided function `git_config_parse`.