src/scanner-utils.h


Log

Author Commit Date CI Message
Pierre Le Marre 39b4b670 2025-06-06T18:40:29 Support including keymap components using %-expansion and absolute path Enable to use the same `include` features than *rules* files in *keymap components*: - *`%`-expansion*: `%H` home directory, `%S` sytem root and `%E` extra. - absolute file paths. This is useful if one wants to overwrite the system file with a user config (i.e. same name, but in `~/.config/xkb`), but still include the system file: ``` // File: ~/.config/xkb/symbols/de xkb_symbols "basic" { include "%S/de(basic)" key <AB01> { [z, Z] }; key <AD06> { [y, Y] }; } ```` Without the commit, using a mere `include "de(basic)"` would result in an include loop. Refactored by using the same code for rules and keymap components.
Pierre Le Marre 1d361b8f 2025-05-12T10:01:10 scanner: Ensure proper type for string length
Pierre Le Marre 5e557040 2025-04-09T11:17:00 xkbcomp: Fix Unicode escape sequence While the previous code correctly rejected malformed sequences such as `\u{` (incomplete) or `\u{123x}`, it should try to consume as much input as possible until reaching the corresponding closing `}` within the string. Else we can get leftovers and the error message does not reference the whole malformed sequence. Also added further tests with surrogates and noncharacters.
Pierre Le Marre 3d79f459 2025-03-29T11:46:34 xkbcomp: Add Unicode code point escape sequence \u{NNNN} Unicode code point escape sequences `\u{NNNN}` are replaced with the UTF-8 encoding of their corresponding code point `U+NNNN`, if legal. Supported Unicode code points are in the range `1‥0x10ffff`. Note that we will reject the `U+0000` NULL code point, as we reject it in the octal escape sequence `\0`. This is intended mainly for the upcoming feature to write keysyms as UTF-8 encoded strings. It can be used for various reasons: - avoid encoding issues; - avoid issue with font rendering (e.g. Asian scripts); - make white space or zero-width characters more readable.
Pierre Le Marre 7d91a753 2025-03-29T12:24:39 xkbcomp: Enable xkbcomp-style octal escape sequences Xorg xkbcomp only parses octal sequences with `\0`, while xkbcommon does not force the `0` prefix of the numeric part. However, we only parsed up to to 3 digits, which does not allow to parse e.g. `\0377` while `\377` parses fine. Fixed by parsing up to 4 octal digits, while checking the result fits into a byte.
Pierre Le Marre d5a91fa9 2025-04-04T16:38:16 xkbcomp: Use custom parsers instead of strtol* The use of `strtol*` functions was already restricted due to its slowness and its capacity to parse other stuff than digits (e.g. signs and spaces). There is also another *big* limitation: it requires a NULL-terminated string. This is incompatible with our functions that work on buffers, because we cannot guarantee this. This may lead to a memory violation if the last token is a number. We now roll out our own parsers, which are more efficients and compatible with buffers.
Pierre Le Marre 70d11abd 2025-03-26T07:38:05 messages: Add file encoding and invalid syntax entries Added: - `XKB_ERROR_INVALID_FILE_ENCODING` - `XKB_ERROR_INVALID_RULES_SYNTAX` - `XKB_ERROR_INVALID_COMPOSE_SYNTAX` Changed: - `XKB_ERROR_INVALID_SYNTAX` renamed to `XKB_ERROR_INVALID_XKB_SYNTAX`.
Pierre Le Marre e1892266 2025-02-13T16:57:46 clang-tidy: Miscellaneous fixes
Ran Benita f4e95280 2025-02-02T22:29:05 xkbcomp/scanner: avoid unneeded strdup of IDENT tokens The allocation is immediately discarded, either turned into a keysym or an atom. So use an sval slice into the input string instead strdup'ing. memusage ./release/bench-compile-keymap --iter=1000 --layout us,de --variant ,neo Before: Memory usage summary: heap total: 534063576, heap peak: 581022, stack peak: 18848 total calls total memory failed calls malloc| 11240525 291897104 0 realloc| 1447657 192307328 0 (nomove:37629, dec:0, free:0) calloc| 430573 49859144 0 free| 13993903 534063576 After: Memory usage summary: heap total: 506839909, heap peak: 581022, stack peak: 18960 total calls total memory failed calls malloc| 8016419 264673437 0 realloc| 1447657 192307328 0 (nomove:37278, dec:0, free:0) calloc| 430573 49859144 0 free| 10769797 506839909 Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita df2322d7 2025-02-05T14:41:21 Replace include guards by `#pragma once` We currently have a mix of include headers, pragma once and some missing. pragma once is not standard but is widely supported, and we already use it with no issues, so I'd say it's not a problem. Let's convert all headers to pragma once to avoid the annoying include guards. The public headers are *not* converted. Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita e120807b 2025-01-29T15:35:22 Update license notices to SDPX short identifiers + update LICENSE Fix #628. Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita 6e97f57e 2025-01-29T19:21:43 scanner: speed up token position -> location using a cache Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita 26807a90 2025-01-28T20:24:05 scanner: compute token line/column lazily on errors The scanner functions are hot, and the line/column location tracking is quite expensive. We only use it for errors, which don't need to be fast, because we bail if there are too many; and for warnings, which are usually not shown by default. So only keep the token start pos, and compute the line/column lazily from that. This will also allow some further improvements ahead. bench/rulescomp before: compiled 1000 keymaps in 1.669028s after: compiled 1000 keymaps in 1.550411s bench/compose: before: compiled 1000 compose tables in 2.145217s after: compiled 1000 compose tables in 2.016044s Signed-off-by: Ran Benita <ran@unusedvar.com>
Pierre Le Marre 53b3f446 2025-01-22T17:43:53 clang-tidy: Fix headers includes
Pierre Le Marre 4ea9d431 2023-11-16T17:12:03 rules: Add support for :all qualifier Some layout options require to be applied to every group to maintain consistency (e.g. a group switcher). Currently this must be done manually for all layout indexes. This is error prone and prevents the increase of the maximum group count. This commit introduces the `:all` qualifier for KcCGST values. When a rule with this qualifier is matched, it will expands the qualified value (and its optional merge mode) for every layout, e.g. `+group(toggle):all` (respectively `|group(toggle)`) would expand to `+group(toggle):1+group(toggle):2` (respectively `|group(toggle):1|group(toggle):2`) if there are 2 layouts, etc. If there is no merge mode, it defaults to *override* `+`, e.g. `x:all` expands to `x:1+x:2+x:3` for 3 layouts. Note that only the qualified *value* is expanded, e.g. `x+y:all` expands to `x+y:1+y:2` for 2 layouts. `:all` can be used in combination with special layout indexes. Since this can lead to an unexpected behaviour, a warning will be raised.
Pierre Le Marre ba896935 2024-09-24T21:28:12 logging: Make scanner_warn use a message ID
Pierre Le Marre c8bd57dd 2024-09-24T21:20:41 logging: Make scanner_err use a message ID
Pierre Le Marre a2da57ab 2023-10-30T14:50:00 Compose: early detection of invalid encoding Also move “unrecognized token” error message before skiping the line, in order to fix token position.
Pierre Le Marre 0038c866 2023-09-26T17:05:14 Prevent overflow of octal escape sequences The octal parser accepts the range `\1..\777`. The result is cast to `char` which will silently overflow. This commit prevents overlow and will treat `\400..\777` as invalid escape sequences.
Pierre Le Marre ef81d04e 2023-09-18T18:17:34 Structured log messages with a message registry Currently there is little structure in the log messages, making difficult to use them for the following use cases: - A user looking for help about a log message: the user probably uses a search engine, thus the results will depend on the proper indexing of our documentation and the various forums. It relies only on the wording of the message, which may change with time. - A user wants to filter the logs resulting of the use of one of the components of xkbcommon. A typical example would be testing xkeyboard-config against libxkbcommon. It requires the use of a pattern (simple words detection or regex). The issue is that the pattern may become silently out-of-sync with xkbcommon. A common practice (e.g. in compilers) is to assign unique error codes to reference theses messages, along with an error index for documentation. Thus this commit implements the following features: - Create a message registry (message-registry.yaml) that defines the log messages produced by xkbcommon. This is a simple YAML file that provides, for each message: - A unique numeric code as a short identifier. It is used in the output message and thus can be easily be filtered to spot errors or searched in the internet. It must not change: if the semantics of message changes, it is better to introduce a new message for clarity. - A unique text identifier, meant for two uses: 1. Generate constants dealing with log information in our code base. 2. Generate human-friendly names for the documentation. - A type: currently warning or error. Used to prefix the constants (see hereinabove) and for basic classification in documentation. - A short description, used as concise and mandatory documentation. - An optionnal detailed description. - Optional examples, intended to help the user to fix issues themself. - Version of xkbcommon it was added. For old entries this often unknown, so they will default to 1.0.0. - Version of xkbcommon it was removed (optional) No entry should ever be deleted from this index, even if the message is not used anymore: it ensures we have unique identifiers along the history of xkbcommon, and that users can refer to the documentation even for older versions. - Add the script update-message-registry.py to generate the following files: - messages.h: message code enumeration for the messages currently used in the code base. Currently a private API. - message.registry.md: the error index documentation page. - Modify the logging functions to use structured messages. This is a work in progress.
Ran Benita 0b3d9092 2022-03-14T16:44:13 scanner: prefix functions with `scanner_` to avoid symbol conflicts Particularly `eof()` in mingw-w64. Fixes: https://github.com/xkbcommon/libxkbcommon/pull/285 Reported-by: Marko Lindqvist Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita c3ac58a9 2019-12-27T14:06:47 scanner-utils: avoid possible implicit truncating of line/column This increases the size of the struct a bit but it's not very important. Fixes these MSVC warnings: src\scanner-utils.h(112): warning C4267: '+=': conversion from 'size_t' to 'unsigned int', possible loss of data src\scanner-utils.h(147): warning C4267: '+=': conversion from 'size_t' to 'unsigned int', possible loss of data Signed-off-by: Ran Benita <ran@unusedvar.com>
Ran Benita f774f819 2014-10-18T13:23:53 Replace some strncmp's with memcmp Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita a3116f97 2014-10-13T18:51:12 compose/parser: fix segfault when including The keysym cache for the new scanner was not initialized. To avoid such errors also in the future, require passing the priv argument in scanner_init(), instead of initializing it separately. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 8a0acf2c 2014-10-07T23:42:08 scanner-utils: optimize one-line comments Compose files have a lot of those. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 29a1a780 2014-09-12T18:40:18 scanner-utils: add priv member For when a user of the scanner wants to pass something along with it. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 94a8e01c 2014-02-03T14:55:37 scanner-utils: add helper for appending an entire string Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 8eb024d5 2013-10-27T20:17:29 scanner-utils: add helper for hex string escape Like the already existing oct. Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita 4ed68120 2014-10-01T19:14:36 scanner-utils: optimize str()/lit() Replace the dog-slow unneeded strncasecmp() with an inlineable memcmp(). Before: compiled 2500 keymaps in 8.348715629s After: compiled 2500 keymaps in 7.872640338s Signed-off-by: Ran Benita <ran234@gmail.com>
Ran Benita e55a0cea 2013-10-27T20:10:15 Move src/xkbcomp/scanner-utils.h to src/ As we'll use it for things unrelated to xkbcomp. Signed-off-by: Ran Benita <ran234@gmail.com>