src/xkbcomp/scanner.c

Branch


Log

Author Commit Date CI Message
Pierre Le Marre 82e9293e 2023-10-30T15:28:10 xkbcomp: early detection of invalid encoding
Pierre Le Marre f937c308 2023-10-29T07:31:34 xkbcomp: skip heading UTF-8 encoded BOM (U+FEFF) Leading BOM is legal and is used as a signature — an indication that an otherwise unmarked text file is in UTF-8. See: https://www.unicode.org/faq/utf_bom.html#bom5 for further details.
Pierre Le Marre 9d15c6a7 2023-09-26T17:05:14 Show invalid escape sequences It is easier to debug when the message actually displays the offending escape sequence.
Pierre Le Marre ca7aa69c 2023-09-26T17:05:05 Disallow producing NULL character with escape sequences NULL usually terminates the strings; allowing to produce it via escape sequences may lead to undefined behaviour. - Make NULL escape sequences (e.g. `\0` and `\x0`) invalid. - Add corresponding test. - Introduce the new message: XKB_WARNING_INVALID_ESCAPE_SEQUENCE.
Pierre Le Marre c0065c95 2023-09-21T20:06:27 Messages: merge macros with and without message code Previously we had two types of macros for logging: with and without message code. They were intended to be merged afterwards. The idea is to use a special code – `XKB_LOG_MESSAGE_NO_ID = 0` – that should *not* be displayed. But we would like to avoid checking this special code at run time. This is achieved using macro tricks; they are detailed in the code (see: `PREPEND_MESSAGE_ID`). Now it is also easier to spot the remaining undocumented log entries: just search `XKB_LOG_MESSAGE_NO_ID`.
Pierre Le Marre ef81d04e 2023-09-18T18:17:34 Structured log messages with a message registry Currently there is little structure in the log messages, making difficult to use them for the following use cases: - A user looking for help about a log message: the user probably uses a search engine, thus the results will depend on the proper indexing of our documentation and the various forums. It relies only on the wording of the message, which may change with time. - A user wants to filter the logs resulting of the use of one of the components of xkbcommon. A typical example would be testing xkeyboard-config against libxkbcommon. It requires the use of a pattern (simple words detection or regex). The issue is that the pattern may become silently out-of-sync with xkbcommon. A common practice (e.g. in compilers) is to assign unique error codes to reference theses messages, along with an error index for documentation. Thus this commit implements the following features: - Create a message registry (message-registry.yaml) that defines the log messages produced by xkbcommon. This is a simple YAML file that provides, for each message: - A unique numeric code as a short identifier. It is used in the output message and thus can be easily be filtered to spot errors or searched in the internet. It must not change: if the semantics of message changes, it is better to introduce a new message for clarity. - A unique text identifier, meant for two uses: 1. Generate constants dealing with log information in our code base. 2. Generate human-friendly names for the documentation. - A type: currently warning or error. Used to prefix the constants (see hereinabove) and for basic classification in documentation. - A short description, used as concise and mandatory documentation. - An optionnal detailed description. - Optional examples, intended to help the user to fix issues themself. - Version of xkbcommon it was added. For old entries this often unknown, so they will default to 1.0.0. - Version of xkbcommon it was removed (optional) No entry should ever be deleted from this index, even if the message is not used anymore: it ensures we have unique identifiers along the history of xkbcommon, and that users can refer to the documentation even for older versions. - Add the script update-message-registry.py to generate the following files: - messages.h: message code enumeration for the messages currently used in the code base. Currently a private API. - message.registry.md: the error index documentation page. - Modify the logging functions to use structured messages. This is a work in progress.
Ran Benita 0b3d9092 2022-03-14T16:44:13 scanner: prefix functions with `scanner_` to avoid symbol conflicts Particularly `eof()` in mingw-w64. Fixes: https://github.com/xkbcommon/libxkbcommon/pull/285 Reported-by: Marko Lindqvist Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita 521bb498 2019-12-27T22:08:57 xkbcomp: remove cast which triggers warning on gcc Will need some other way to take care of the warning on MSVC. Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita fbd0e643 2019-12-27T21:51:34 xkbcomp: make a couple of casts explicit to mark them as checked This acknowledges some "possible loss of data cast" warnings from MSVC. Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita 40aab05e 2019-12-27T13:03:20 build: include config.h manually Previously we included it with an `-include` compiler directive. But that's not portable. And it's better to be explicit anyway. Every .c file should have `include "config.h"` first thing. Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita 2cca0289 2015-11-19T00:44:27 src/utils: change map_file to not take const string argument map_file() uses PROT_READ, so const seems fitting; however unmap_file calls munmap/free, which do not take const, so an UNCONSTIFY is needed. To avoid the UNCONSTIFY hack, which is likely undefined behavior or some such, just remove the const. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita a3116f97 2014-10-13T18:51:12 compose/parser: fix segfault when including The keysym cache for the new scanner was not initialized. To avoid such errors also in the future, require passing the priv argument in scanner_init(), instead of initializing it separately. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 8a0acf2c 2014-10-07T23:42:08 scanner-utils: optimize one-line comments Compose files have a lot of those. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita d0c6fce2 2014-09-20T15:06:13 parser: use "atom" instead of "sval" in yylval "sval" is already used for "struct sval". Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita cb4bae71 2014-06-30T14:52:30 parser: don't shadow "str" It's a name of a function in scanner-utils.h and also of some parameters. https://bugs.freedesktop.org/show_bug.cgi?id=79898 Reported-by: Bryce Harrington <b.harrington@samsung.com> Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 28d5f770 2014-02-10T20:33:34 scanner: sort out scanner logging functions First, make the rules and xkb scanners/parsers use the same logging functions instead of rolling their own. Second, use the gcc ##__VA_ARGS extension instead of dealing with C99 stupidity. I hope all relevant compilers support it. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 68b03097 2014-02-08T17:22:14 scanner: make line and column unsigned Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita b82a0a86 2014-02-07T18:09:30 scanner: avoid strlen in keyword lookup, we know the len Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 917c7515 2014-01-12T14:37:39 context: remove mostly useless log wrappers Just use xkb_log directly. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita ba7530fa 2013-11-27T13:43:57 scanner: restore lost DIVIDE token I don't know how this could have happened. Luckily this token is completely useless. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita dcdd4e10 2013-10-14T18:59:53 Replace ctype.h functions with ascii ones ctype.h is locale-dependent, so using it in our scanners is not optimal. Let's be deterministic with our own simple functions. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita c35c388b 2013-10-08T18:35:05 scanner: remove unnecessary cast 'tok' is already an int now. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 409f27d7 2013-09-29T00:41:17 parser: don't use %locations byacc doesn't support this feature. We print the line/col of the last scanned token instead. This is slightly less in case of *parser* errors (not syntax errors), but I couldn't make it point to another line, and this are pretty cryptic anyways. So it's good enough. Also might be a bit faster, but haven't checked. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita e4c00e90 2013-09-29T00:19:32 parser: don't use enum yytokentype byacc doesn't support this, it just puts out #define's for the tokens. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 7caa1af2 2013-08-13T14:45:33 scanner: don't fail over unknown escape sequence This is too strict, and causes symbols/cz to fail parsing. Instead, just emit a warning (not shown by default): xkbcommon: WARNING: cz:75:19: unknown escape sequence in string literal https://bugs.freedesktop.org/show_bug.cgi?id=68056 Reported-By: Gatis Paeglis <gatis.paeglis@digia.com> Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita aa9c9194 2013-08-02T14:41:19 scanner: fix compiler warning src/xkbcomp/scanner.c:158:17: warning: comparison of constant -1 with expression of type 'enum yytokentype' is always true [-Wtautological-constant-out-of-range-compare] if (tok != -1) return tok; ~~~ ^ ~~ Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita e91d2653 2013-08-01T23:09:46 scanner: allow empty key name literals Some keymaps actually have this, like the quartz.xkb which is tested. We need to support these. https://bugs.freedesktop.org/show_bug.cgi?id=67654 Reported-By: Gatis Paeglis <gatis.paeglis@digia.com> Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 9e801ff7 2013-07-21T17:01:20 ctx: adapt to the len-aware atom functions xkb_atom_intern now takes a len parameter. Turns out though that almost all of our xkb_atom_intern calls are called on string literals, the length of which we know statically. So we add a macro to micro-optimize this case. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita a392d268 2012-08-12T11:40:02 Replace flex scanner with a hand-written one The scanner is very similar in structure to the one in xkbcomp/rules.c. It avoids copying and has nicer error reporting. It uses gperf to generate a hashtable for the keywords, which gives a nice speed boost (compared to the naive strcasecmp method at least). But since there's hardly a reason to regenerate it every time and require people to install gperf, the output (keywords.c) is added here as well. Here are some stats from test/rulescomp: Before: compiled 1000 keymaps in 4.052939625s ==22063== total heap usage: 101,101 allocs, 101,101 frees, 11,840,834 bytes allocated After: compiled 1000 keymaps in 3.519665434s ==26505== total heap usage: 99,945 allocs, 99,945 frees, 7,033,608 bytes allocated Signed-off-by: Ran Benita <ran234@gmail.com>