scanner.c

Branch :

Log

Commit	Date	Message
3d79f459	2025-03-29 11:46:34	xkbcomp: Add Unicode code point escape sequence \u{NNNN} Unicode code point escape sequences `\u{NNNN}` are replaced with the UTF-8 encoding of their corresponding code point `U+NNNN`, if legal. Supported Unicode code points are in the range `1‥0x10ffff`. Note that we will reject the `U+0000` NULL code point, as we reject it in the octal escape sequence `\0`. This is intended mainly for the upcoming feature to write keysyms as UTF-8 encoded strings. It can be used for various reasons: - avoid encoding issues; - avoid issue with font rendering (e.g. Asian scripts); - make white space or zero-width characters more readable.
23bbec96	2025-03-29 12:33:53	xkbcomp: Add escape sequence \" `\"` seems like a very natural extension. However it is not supported by Xorg xkbcomp, so do not emit it when serializing.
aa8b572e	2025-03-29 12:04:26	keymap serialization: Ensure escaping relevant chars Previously we would write characters without any escaping in some cases (e.g.: names of indicators, types and groups). E.g. the string "new\nline" would be serialized as: "new line" which would raise a syntax error if parsed. Fixed by escaping any string that was not escaped after parsing (e.g. the section names are safe already).
d5a91fa9	2025-04-04 16:38:16	xkbcomp: Use custom parsers instead of strtol* The use of `strtol` functions was already restricted due to its slowness and its capacity to parse other stuff than digits (e.g. signs and spaces). There is also another big* limitation: it requires a NULL-terminated string. This is incompatible with our functions that work on buffers, because we cannot guarantee this. This may lead to a memory violation if the last token is a number. We now roll out our own parsers, which are more efficients and compatible with buffers.
500b260b	2025-03-28 09:38:58	xkbcomp: Fix parser failure on floating-point numbers Before this commit we used `strtold`, which depends on the locale. But the XKB syntax is fixed and uses a period as decimal separator. So ensure the syntax is correct without relying on `strtold` and truncate the result, as the parser does not use floating-point numbers.
70d11abd	2025-03-26 07:38:05	messages: Add file encoding and invalid syntax entries Added: - `XKB_ERROR_INVALID_FILE_ENCODING` - `XKB_ERROR_INVALID_RULES_SYNTAX` - `XKB_ERROR_INVALID_COMPOSE_SYNTAX` Changed: - `XKB_ERROR_INVALID_SYNTAX` renamed to `XKB_ERROR_INVALID_XKB_SYNTAX`.
e1892266	2025-02-13 16:57:46	clang-tidy: Miscellaneous fixes
2d111bbe	2025-02-12 13:54:51	xkbcomp: Fix possible overflow in numbers parser
558447d8	2025-02-11 17:34:27	xkbcomp: Explicit vars initialization The `Resolve*` functions do not always initialize the parameters that they can modify, so it is safer to always initialize them at the call site.
f4e95280	2025-02-02 22:29:05	xkbcomp/scanner: avoid unneeded strdup of IDENT tokens The allocation is immediately discarded, either turned into a keysym or an atom. So use an sval slice into the input string instead strdup'ing. memusage ./release/bench-compile-keymap --iter=1000 --layout us,de --variant ,neo Before: Memory usage summary: heap total: 534063576, heap peak: 581022, stack peak: 18848 total calls total memory failed calls malloc\| 11240525 291897104 0 realloc\| 1447657 192307328 0 (nomove:37629, dec:0, free:0) calloc\| 430573 49859144 0 free\| 13993903 534063576 After: Memory usage summary: heap total: 506839909, heap peak: 581022, stack peak: 18960 total calls total memory failed calls malloc\| 8016419 264673437 0 realloc\| 1447657 192307328 0 (nomove:37278, dec:0, free:0) calloc\| 430573 49859144 0 free\| 10769797 506839909 Signed-off-by: Ran Benita <ran@unusedvar.com>
7e84c845	2025-02-01 17:04:33	xkbcomp/scanner: avoid extra copies for keynames, keywords, identifiers The tokens don't have escapes so no need to use the `buf` for them. Signed-off-by: Ran Benita <ran@unusedvar.com>
e120807b	2025-01-29 15:35:22	Update license notices to SDPX short identifiers + update LICENSE Fix #628. Signed-off-by: Ran Benita <ran@unusedvar.com>
26807a90	2025-01-28 20:24:05	scanner: compute token line/column lazily on errors The scanner functions are hot, and the line/column location tracking is quite expensive. We only use it for errors, which don't need to be fast, because we bail if there are too many; and for warnings, which are usually not shown by default. So only keep the token start pos, and compute the line/column lazily from that. This will also allow some further improvements ahead. bench/rulescomp before: compiled 1000 keymaps in 1.669028s after: compiled 1000 keymaps in 1.550411s bench/compose: before: compiled 1000 compose tables in 2.145217s after: compiled 1000 compose tables in 2.016044s Signed-off-by: Ran Benita <ran@unusedvar.com>
ba896935	2024-09-24 21:28:12	logging: Make scanner_warn use a message ID
c8bd57dd	2024-09-24 21:20:41	logging: Make scanner_err use a message ID
82e9293e	2023-10-30 15:28:10	xkbcomp: early detection of invalid encoding
f937c308	2023-10-29 07:31:34	xkbcomp: skip heading UTF-8 encoded BOM (U+FEFF) Leading BOM is legal and is used as a signature — an indication that an otherwise unmarked text file is in UTF-8. See: https://www.unicode.org/faq/utf_bom.html#bom5 for further details.
9d15c6a7	2023-09-26 17:05:14	Show invalid escape sequences It is easier to debug when the message actually displays the offending escape sequence.
ca7aa69c	2023-09-26 17:05:05	Disallow producing NULL character with escape sequences NULL usually terminates the strings; allowing to produce it via escape sequences may lead to undefined behaviour. - Make NULL escape sequences (e.g. `\0` and `\x0`) invalid. - Add corresponding test. - Introduce the new message: XKB_WARNING_INVALID_ESCAPE_SEQUENCE.
c0065c95	2023-09-21 20:06:27	Messages: merge macros with and without message code Previously we had two types of macros for logging: with and without message code. They were intended to be merged afterwards. The idea is to use a special code – `XKB_LOG_MESSAGE_NO_ID = 0` – that should not be displayed. But we would like to avoid checking this special code at run time. This is achieved using macro tricks; they are detailed in the code (see: `PREPEND_MESSAGE_ID`). Now it is also easier to spot the remaining undocumented log entries: just search `XKB_LOG_MESSAGE_NO_ID`.
ef81d04e	2023-09-18 18:17:34	Structured log messages with a message registry Currently there is little structure in the log messages, making difficult to use them for the following use cases: - A user looking for help about a log message: the user probably uses a search engine, thus the results will depend on the proper indexing of our documentation and the various forums. It relies only on the wording of the message, which may change with time. - A user wants to filter the logs resulting of the use of one of the components of xkbcommon. A typical example would be testing xkeyboard-config against libxkbcommon. It requires the use of a pattern (simple words detection or regex). The issue is that the pattern may become silently out-of-sync with xkbcommon. A common practice (e.g. in compilers) is to assign unique error codes to reference theses messages, along with an error index for documentation. Thus this commit implements the following features: - Create a message registry (message-registry.yaml) that defines the log messages produced by xkbcommon. This is a simple YAML file that provides, for each message: - A unique numeric code as a short identifier. It is used in the output message and thus can be easily be filtered to spot errors or searched in the internet. It must not change: if the semantics of message changes, it is better to introduce a new message for clarity. - A unique text identifier, meant for two uses: 1. Generate constants dealing with log information in our code base. 2. Generate human-friendly names for the documentation. - A type: currently warning or error. Used to prefix the constants (see hereinabove) and for basic classification in documentation. - A short description, used as concise and mandatory documentation. - An optionnal detailed description. - Optional examples, intended to help the user to fix issues themself. - Version of xkbcommon it was added. For old entries this often unknown, so they will default to 1.0.0. - Version of xkbcommon it was removed (optional) No entry should ever be deleted from this index, even if the message is not used anymore: it ensures we have unique identifiers along the history of xkbcommon, and that users can refer to the documentation even for older versions. - Add the script update-message-registry.py to generate the following files: - messages.h: message code enumeration for the messages currently used in the code base. Currently a private API. - message.registry.md: the error index documentation page. - Modify the logging functions to use structured messages. This is a work in progress.
0b3d9092	2022-03-14 16:44:13	scanner: prefix functions with `scanner_` to avoid symbol conflicts Particularly `eof()` in mingw-w64. Fixes: https://github.com/xkbcommon/libxkbcommon/pull/285 Reported-by: Marko Lindqvist Signed-off-by: Ran Benita <ran@unusedvar.com>
521bb498	2019-12-27 22:08:57	xkbcomp: remove cast which triggers warning on gcc Will need some other way to take care of the warning on MSVC. Signed-off-by: Ran Benita <ran@unusedvar.com>
fbd0e643	2019-12-27 21:51:34	xkbcomp: make a couple of casts explicit to mark them as checked This acknowledges some "possible loss of data cast" warnings from MSVC. Signed-off-by: Ran Benita <ran@unusedvar.com>
40aab05e	2019-12-27 13:03:20	build: include config.h manually Previously we included it with an `-include` compiler directive. But that's not portable. And it's better to be explicit anyway. Every .c file should have `include "config.h"` first thing. Signed-off-by: Ran Benita <ran@unusedvar.com>
2cca0289	2015-11-19 00:44:27	src/utils: change map_file to not take const string argument map_file() uses PROT_READ, so const seems fitting; however unmap_file calls munmap/free, which do not take const, so an UNCONSTIFY is needed. To avoid the UNCONSTIFY hack, which is likely undefined behavior or some such, just remove the const. Signed-off-by: Ran Benita <ran234@gmail.com>
a3116f97	2014-10-13 18:51:12	compose/parser: fix segfault when including The keysym cache for the new scanner was not initialized. To avoid such errors also in the future, require passing the priv argument in scanner_init(), instead of initializing it separately. Signed-off-by: Ran Benita <ran234@gmail.com>
8a0acf2c	2014-10-07 23:42:08	scanner-utils: optimize one-line comments Compose files have a lot of those. Signed-off-by: Ran Benita <ran234@gmail.com>
d0c6fce2	2014-09-20 15:06:13	parser: use "atom" instead of "sval" in yylval "sval" is already used for "struct sval". Signed-off-by: Ran Benita <ran234@gmail.com>
cb4bae71	2014-06-30 14:52:30	parser: don't shadow "str" It's a name of a function in scanner-utils.h and also of some parameters. https://bugs.freedesktop.org/show_bug.cgi?id=79898 Reported-by: Bryce Harrington <b.harrington@samsung.com> Signed-off-by: Ran Benita <ran234@gmail.com>
28d5f770	2014-02-10 20:33:34	scanner: sort out scanner logging functions First, make the rules and xkb scanners/parsers use the same logging functions instead of rolling their own. Second, use the gcc ##__VA_ARGS extension instead of dealing with C99 stupidity. I hope all relevant compilers support it. Signed-off-by: Ran Benita <ran234@gmail.com>
68b03097	2014-02-08 17:22:14	scanner: make line and column unsigned Signed-off-by: Ran Benita <ran234@gmail.com>
b82a0a86	2014-02-07 18:09:30	scanner: avoid strlen in keyword lookup, we know the len Signed-off-by: Ran Benita <ran234@gmail.com>
917c7515	2014-01-12 14:37:39	context: remove mostly useless log wrappers Just use xkb_log directly. Signed-off-by: Ran Benita <ran234@gmail.com>
ba7530fa	2013-11-27 13:43:57	scanner: restore lost DIVIDE token I don't know how this could have happened. Luckily this token is completely useless. Signed-off-by: Ran Benita <ran234@gmail.com>
dcdd4e10	2013-10-14 18:59:53	Replace ctype.h functions with ascii ones ctype.h is locale-dependent, so using it in our scanners is not optimal. Let's be deterministic with our own simple functions. Signed-off-by: Ran Benita <ran234@gmail.com>
c35c388b	2013-10-08 18:35:05	scanner: remove unnecessary cast 'tok' is already an int now. Signed-off-by: Ran Benita <ran234@gmail.com>
409f27d7	2013-09-29 00:41:17	parser: don't use %locations byacc doesn't support this feature. We print the line/col of the last scanned token instead. This is slightly less in case of parser errors (not syntax errors), but I couldn't make it point to another line, and this are pretty cryptic anyways. So it's good enough. Also might be a bit faster, but haven't checked. Signed-off-by: Ran Benita <ran234@gmail.com>
e4c00e90	2013-09-29 00:19:32	parser: don't use enum yytokentype byacc doesn't support this, it just puts out #define's for the tokens. Signed-off-by: Ran Benita <ran234@gmail.com>
7caa1af2	2013-08-13 14:45:33	scanner: don't fail over unknown escape sequence This is too strict, and causes symbols/cz to fail parsing. Instead, just emit a warning (not shown by default): xkbcommon: WARNING: cz:75:19: unknown escape sequence in string literal https://bugs.freedesktop.org/show_bug.cgi?id=68056 Reported-By: Gatis Paeglis <gatis.paeglis@digia.com> Signed-off-by: Ran Benita <ran234@gmail.com>
aa9c9194	2013-08-02 14:41:19	scanner: fix compiler warning src/xkbcomp/scanner.c:158:17: warning: comparison of constant -1 with expression of type 'enum yytokentype' is always true [-Wtautological-constant-out-of-range-compare] if (tok != -1) return tok; ~~~ ^ ~~ Signed-off-by: Ran Benita <ran234@gmail.com>
e91d2653	2013-08-01 23:09:46	scanner: allow empty key name literals Some keymaps actually have this, like the quartz.xkb which is tested. We need to support these. https://bugs.freedesktop.org/show_bug.cgi?id=67654 Reported-By: Gatis Paeglis <gatis.paeglis@digia.com> Signed-off-by: Ran Benita <ran234@gmail.com>
9e801ff7	2013-07-21 17:01:20	ctx: adapt to the len-aware atom functions xkb_atom_intern now takes a len parameter. Turns out though that almost all of our xkb_atom_intern calls are called on string literals, the length of which we know statically. So we add a macro to micro-optimize this case. Signed-off-by: Ran Benita <ran234@gmail.com>
a392d268	2012-08-12 11:40:02	Replace flex scanner with a hand-written one The scanner is very similar in structure to the one in xkbcomp/rules.c. It avoids copying and has nicer error reporting. It uses gperf to generate a hashtable for the keywords, which gives a nice speed boost (compared to the naive strcasecmp method at least). But since there's hardly a reason to regenerate it every time and require people to install gperf, the output (keywords.c) is added here as well. Here are some stats from test/rulescomp: Before: compiled 1000 keymaps in 4.052939625s ==22063== total heap usage: 101,101 allocs, 101,101 frees, 11,840,834 bytes allocated After: compiled 1000 keymaps in 3.519665434s ==26505== total heap usage: 99,945 allocs, 99,945 frees, 7,033,608 bytes allocated Signed-off-by: Ran Benita <ran234@gmail.com>

kc3-lang/libxkbcommon/src/xkbcomp/scanner.c

Log

kc3-lang/libxkbcommon /src/xkbcomp/scanner.c