kmx git

Commit	Date	Message
ac9cd053	2025-06-11T19:00:47	test: Check extended layout indexes
7f39be25	2025-06-10T15:46:45	test: Use explicit keymap output format for test_compile_output()
0f89ad97	2025-06-09T19:26:13	dump: Always use numeric group indexes The upcoming raise of the maximum groups count will require to use numeric group indexes instead of the syntax `GroupN` if groups > 8. Let’s not bother with handling two cases (group count ≤ 8 or > 8) and always serialize group indexes as numeric values.
f3386743	2025-06-09T16:44:54	test: Use explicit keymap format in test_compile_output()
2acf5eca	2025-06-09T16:26:56	test: Use explicit keymap format in test_compile_buffer()
79e95509	2025-06-09T11:07:36	test: Use explicit keymap format in test_compile_rules()
39b4b670	2025-06-06T18:40:29	Support including keymap components using %-expansion and absolute path Enable to use the same `include` features than rules files in keymap components: - `%`-expansion: `%H` home directory, `%S` sytem root and `%E` extra. - absolute file paths. This is useful if one wants to overwrite the system file with a user config (i.e. same name, but in `~/.config/xkb`), but still include the system file: ``` // File: ~/.config/xkb/symbols/de xkb_symbols "basic" { include "%S/de(basic)" key <AB01> { [z, Z] }; key <AD06> { [y, Y] }; } ```` Without the commit, using a mere `include "de(basic)"` would result in an include loop. Refactored by using the same code for rules and keymap components.
9b4fd82b	2025-05-13T11:46:46	test: Skip checked arithmetic if not available
fb9fec18	2025-05-10T10:18:38	xkbcomp: Checked arithmetic Use a polyfill for C23 checked arithmetic. This is a bit paranoid, as we expect the user to use only 32 bit integers, so the signed 64 bit integer we use to store the result should be more than enough. Use jtckdint v1.0: - repository: https://github.com/jart/jtckdint - commit: 339450d13d8636f05dcb71ba36efddb226db481e - removed all C++-specific code
22d27277	2025-05-10T10:12:31	actions: Reject arguments if they are not expected `NoAction`, `VoidAction` and `TerminateServer` do not accept arguments.
d239a3f0	2025-05-11T11:42:20	actions: Improve unsupported legacy X11 actions handling - Display a warning - Document drawbacks of degrading to `NoAction()`
b4c89600	2025-05-09T15:15:10	actions: Add VoidAction(), mirroring NoSymbol/VoidSymbol. Added `VoidAction()` action to match the keysym pair `NoSymbol` / `VoidSymbol`. It enables overriding a previous action and breaks latches. This is a libxkbcommon extension. When serializing it will be converted to `LockControls(controls=none,affect=neither)` for backward compatibility. We cannot serialize it to `NoAction()`, as it would be dropped in e.g. the context of multiple actions.
c2d3694b	2025-05-06T07:01:01	xkbcomp: Do not discard extra bits in vmod masks Since we accept numeric values for the vmod mask in the keymap, we may have extra bits set that encode no real/virtual modifier. Keep them unchanged for consistency. E.g. the following keymap: xkb_keymap { xkb_keycodes { <a> = 38; }; xkb_symbols { virtual_modifiers X = 0xf0000000; key <a> { [ SetMods(mods = 0x00001100) ] }; }; }; would compile to: xkb_keymap { xkb_keycodes { <a> = 38; }; xkb_symbols { virtual_modifiers X = 0xf0000000; // Internal state key <a> { [ SetMods(mods = 0xf0001000) ] }; // Serialization key <a> { [ SetMods(mods = 0x00001100) ] }; }; };
9b0b8c68	2025-04-15T19:53:28	xkbcomp: Stricter handling of default map include Before this commit, including a default map, i.e. without an explicit section name (e.g. `include "au"` vs `include "au(basic)"`) would match the first section of the first matching file in the XKB include paths, even if this section is not an explicit default map (i.e. tagged with `default`) but an implicit default map (i.e. the first map of the file, i.e. a weak match). It makes user configuration risky: say a user wants to create a custom version `au(custom)` of the `au` layout: - `./config/xkb/symbols/au`: custom layout in section “custom”. - `/usr/share/X11/xkb/symbols/au`: system layout, with default section “basic”. In this setup any layout that imports the default map from `au` would in fact import the implicit default map `au(custom)` instead of the explicit default map `au(basic)`. This incorrect behavior may thus break setups with multiple layouts. This is especially true for symbols files such as: `pc`, `us` or `latin`. Fixed by trying harder to found the exact default map, defaulting to the old behavior (weak match) only if no explicit default map (exact match) has been found in the XKB include paths.
66f71890	2025-03-31T08:01:29	symbols: Enable writing keysyms list as UTF-8 strings Each Unicode code point of the string will be translated to their respective keysym, if possible. An empty string denotes `NoSymbol`. When such conversion is not possible, this will raise a syntax error. This introduces the following syntax: ```c // Empty string = `NoSymbol` key <1> {[""]}; // NoSymbol // Single code point = single keysym key <2> {["é"]}; // eacute // String = translate each code point to their respective keysym key <3> {["sßξك🎺"]}; // {s, ssharp, Greek_xi, Arabic_kaf, U1F3BA} // Mix string and keysyms key <4> {[{"ξ", Greek_kappa, "β"}]}; // { Greek_xi, Greek_kappa, Greek_beta} ``` It can also be used wherever a keysym is required, e.g. in `interpret` and `modifier_map` statements. In these cases a single keysym is expected, so the string should contain exactly one Unicode code point.
ead3ce77	2025-03-28T21:44:27	scanner: Enable LRM and RLM marks for BiDi text Enable displaying bidirectional text in XKB files using: - U+200E LEFT-TO-RIGHT MARK - U+200F RIGHT-TO-LEFT MARK We now parse these marks as white space. As such, they are dropped; note that a later serialization may not display correctly without the marks, although it will parse. References: - https://www.w3.org/International/articles/inline-bidi-markup/uba-basics - https://www.w3.org/International/questions/qa-bidi-unicode-controls - https://www.unicode.org/reports/tr31/#Whitespace - https://www.unicode.org/reports/tr55/
bc3e464b	2025-04-09T12:35:05	keysyms: Fix Unicode handling - `xkb_utf32_to_keysym`: Allow [Unicode noncharacters]. There is no requirement to drop them and this would be the only function of our API doing so. From the Unicode Standard 16.0, section 23.7 “Noncharacters”: > Applications are free to use any of these noncharacter code points > internally. They have no standard interpretation when exchanged > outside the context of internal use. However, they are not illegal > in interchange, nor does their presence cause Unicode text to be > ill-formed. > If a noncharacter is received in open interchange, an application is > not required to interpret it in any way. It is good practice, > however, to recognize it as a noncharacter and to take appropriate > action, such as replacing it with `U+FFFD` REPLACEMENT CHARACTER, > to indicate the problem in the text. The key part is: > an application is not required to interpret it in any way Since we handle the reverse conversion with `xkb_keysym_to_utf32` just fine, I do not see a good motivation to keep this asymmetry. This is the only function with a special case for these code points. - `xkb_keysym_from_name`: - Unicode format `UNNNN`: allow control characters C0 and C1 and use `xkb_utf32_to_keysym` for the conversion when `NNNN < 0x100`, for backward compatibility. - Numeric hexadecimal format `0xNNNN`: unchanged. Contrary to the Unicode format, it does not normalize any keysym values in order to enable roundtrip with `xkb_keysym_get_name`. Also added tests to ensure various properties and consistency. Note about surrogates: they are valid valid code points but invalid Unicode scalar values, i.e. they cannot be encoded in any Unicode encoding form (UTF-8, UTF-16, UTF-32). So their corresponding Unicode keysyms are valid, but: - cannot be used as input of `xkb_keysym_to_utf32` nor `xkb_keysym_to_utf8` - cannot result as output of `xkb_utf32_to_keysym`. Otherwise they are valid e.g. in the Unicode keysym notation. [Unicode noncharacters]: https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Noncharacters
5e557040	2025-04-09T11:17:00	xkbcomp: Fix Unicode escape sequence While the previous code correctly rejected malformed sequences such as `\u{` (incomplete) or `\u{123x}`, it should try to consume as much input as possible until reaching the corresponding closing `}` within the string. Else we can get leftovers and the error message does not reference the whole malformed sequence. Also added further tests with surrogates and noncharacters.
36442baa	2025-04-03T15:01:46	xkbcomp: Support multiple actions in interpret Before this commit we supported multiple actions per level, but not in interpret statements. Let’s fix this asymmetry, so we can equivalently assign all actions sets either implicitly or explicitly.
3d79f459	2025-03-29T11:46:34	xkbcomp: Add Unicode code point escape sequence \u{NNNN} Unicode code point escape sequences `\u{NNNN}` are replaced with the UTF-8 encoding of their corresponding code point `U+NNNN`, if legal. Supported Unicode code points are in the range `1‥0x10ffff`. Note that we will reject the `U+0000` NULL code point, as we reject it in the octal escape sequence `\0`. This is intended mainly for the upcoming feature to write keysyms as UTF-8 encoded strings. It can be used for various reasons: - avoid encoding issues; - avoid issue with font rendering (e.g. Asian scripts); - make white space or zero-width characters more readable.
7d91a753	2025-03-29T12:24:39	xkbcomp: Enable xkbcomp-style octal escape sequences Xorg xkbcomp only parses octal sequences with `\0`, while xkbcommon does not force the `0` prefix of the numeric part. However, we only parsed up to to 3 digits, which does not allow to parse e.g. `\0377` while `\377` parses fine. Fixed by parsing up to 4 octal digits, while checking the result fits into a byte.
aa8b572e	2025-03-29T12:04:26	keymap serialization: Ensure escaping relevant chars Previously we would write characters without any escaping in some cases (e.g.: names of indicators, types and groups). E.g. the string "new\nline" would be serialized as: "new line" which would raise a syntax error if parsed. Fixed by escaping any string that was not escaped after parsing (e.g. the section names are safe already).
d5a91fa9	2025-04-04T16:38:16	xkbcomp: Use custom parsers instead of strtol* The use of `strtol` functions was already restricted due to its slowness and its capacity to parse other stuff than digits (e.g. signs and spaces). There is also another big* limitation: it requires a NULL-terminated string. This is incompatible with our functions that work on buffers, because we cannot guarantee this. This may lead to a memory violation if the last token is a number. We now roll out our own parsers, which are more efficients and compatible with buffers.
44480f7c	2025-04-01T08:28:02	xkbcomp: Enable lists of keysyms and actions {} and {a} Motivations: - Follow the principle of least astonishment; - Ensure consistency; - Enhance the use of custom defaults; - Facilitate the tests. There is some ambiguity because we use `{}` to denote both an empty list of keysyms and an empty list of actions. But as soon as we get a keysym or an action, we know whether it is a `MultiKeySymList` or a `MultiActionList`. So we just count the `{}` at the beginning using `NoSymbolOrActionList`, then replace it by the relevant count of `NoSymbol` or `NoAction()` once the ambiguity is solved. If not, this is a list of empties of some type: we drop those empties and delegate the type resolution using `ExprEmptyList()`.
e09cbe66	2025-04-02T10:46:06	symbols: Fix handling of empty keys Before this commit, the following symbols: ```c xkb_symbols { virtual_modifiers M1, M2; key <A> {}; key <B> { [] }; key.vmods = M1; key <C> {}; key <D> { vmods = M2 }; }; ``` would be equivalent to: ```c xkb_symbols { virtual_modifiers M1,M2; key <B> { [ NoSymbol ] }; }; ``` `<B>` entry could be skipped but is harmless. However, `<C>` and `<D>` are missing, which would lead to the mapping resolution of `M1` and `M2` failing. After this commit, it is equivalent to: ```c virtual_modifiers M1,M2; key <C> { vmods = M1 }; key <D> { vmods = M2 }; ``` Empty keys are skipped entirely, but any explicit field: - is taken into account: previously they would be skipped if there were no group; - forces the key to be printed at serialization.
2e0245f8	2025-04-02T10:45:44	xkbcomp: Enable more empty lists - Empty `interpret` - Empty key `type` - Empty `indicator` Motivations: - Follow the principle of least astonishment; - Ensure consistency; - Enhance the use of custom defaults; - Facilitate the tests.
6881fb32	2025-04-01T08:28:02	xkbcomp: Drop trailing NoSymbol and NoAction() This brings us closer to what `xkbcomp` outputs. One should use the explicit `VoidSymbol` instead of `NoSymbol`, in order to avoid dropping empty levels. This may affect keys that rely on an implicit key type. Example: - Input: ```c key <> { [a, A, NoSymbol] }; ``` - Compilation with xkbcommon \< 1.9.0: ```c key <> { type= "FOUR_LEVEL_SEMIALPHABETIC", [a, A, NoSymbol, NoSymbol] }; ``` - Compilation with xkbcommon ≥ 1.9.0: ```c key <> { type= "ALPHABETIC", [a, A] }; ```
fbacdd98	2025-03-31T07:58:04	test: Refactor test_multi_keysyms_actions - Use less macros - Add golden tests to check the compilation result
b254cc2e	2025-03-30T12:27:15	test: Remove empty components boilerplate
3150bca8	2025-03-30T09:54:02	xkbcomp: Make all components optional We already accept empty components, such as: `xkb_compat {};`. Let’s accept missing components as well, so that we can reduce the boilerplate in our tests. Note that we will still explicitly serialize empty components for compatibility with previous xkbcommon versions and Xorg xkbcomp.
500b260b	2025-03-28T09:38:58	xkbcomp: Fix parser failure on floating-point numbers Before this commit we used `strtold`, which depends on the locale. But the XKB syntax is fixed and uses a period as decimal separator. So ensure the syntax is correct without relying on `strtold` and truncate the result, as the parser does not use floating-point numbers.
e5401b07	2025-03-26T16:02:58	symbols: Improve Modmap parsing Parse, dont’t validate: ensure at parsing that `modifier_map` definitions use a list of keys and keysyms. This enables to remove the redundant `ExprResolveKeySym` and have keysym parsing exclusively in handled in `parser.y`.
e8561909	2025-03-18T14:34:10	xkbcomp: Fix keycodes bounds - Refactor to check conflicts first for the key names and then for the keycodes. This seems more useful for the user and enable further memory optimizations. - Do not allocate until we are sure to add the keycode. The bounds are only updated afterwards, so the call to `FindKeyByName` should be more efficient. - Fixed keycodes bounds not shrunk correctly when an existing keycode is overridden. - Do not prepare keyname strings for logging if we are not going to use them.
befa0cdd	2025-02-12T15:38:58	test: Check integers syntax
4cef822a	2025-02-12T07:44:34	test: Check masks syntax
e120807b	2025-01-29T15:35:22	Update license notices to SDPX short identifiers + update LICENSE Fix #628. Signed-off-by: Ran Benita <ran@unusedvar.com>
502e9e5b	2025-01-29T12:19:10	xkbcomp: Add stricter bounds for keycodes and levels Our current implementation uses continuous arrays indexed by keycodes and levels. This is simple and good enough for realistic keymaps. However, they are allowed to have big values that will lead to either memory exhaustion or a waste of memory (sparse arrays). Added the much stricter upper bounds `0xfff` for keycodes[^1] and 2048 for levels[^2], which should still be plenty enough and provides stronger memory security. [^1]: Current max keycode is 0x2ff in Linux. [^2]: Should be big enough to satisfy automatically generated keymaps.
88a3d3c2	2025-01-23T16:03:51	tests: Refactor buffercomp Move tests into proper functions and log tests names.
b1e1aae6	2025-01-23T15:20:44	xkbcomp: Fix memory leak when extra content after keymap It triggers with e.g.: ``` xkb_keymap { xkb_keycodes { }; }; }; // erroneous ```
709027ec	2025-01-23T09:12:15	symbols: Fix inconsistent error handling Currently the following keymap triggers a critical error (invalid `vmods`) only for the second key statement, while it should handle both equally. ``` xkb_keymap { xkb_keycodes { <> = 9; }; xkb_types { }; xkb_compat { }; xkb_symbols { key <> { vmods = [], repeats = false }; key <> { repeats = false, vmods = [] }; }; }; ``` Fixed by parsing the whole symbols body and failing if any error was found.
ec2915fe	2025-01-22T17:18:21	symbols: Fix a possible null pointer deference Introduce a new Expression type, `EXPR_EMPTY_LIST`, to avoid the ambiguity between action and keysym empty lists. Two alternatives were rejected to keep the semantics clear: - Using `EXPR_KEYSYM_LIST`: because we would end up accepting an empty keysym list while processing actions. - Using NULL: convey no info and is hazardous.
7036e46c	2025-01-13T15:20:47	symbols: Add tests for key merge modes (keysyms/actions) This commit adds tests for merging various key configurations: - With/without keysyms/actions - Single/multiple keysyms/actions per level We test all the merge modes for including a map (global) as well as directly on the keys (local): - default (global: include, local: implicit) - augment - override - replace The tests data are generated with: - A Python script `scripts/update-merge-modes-tests.py` for keycodes and symbols data. Use `--debug` for extra comments to help debugging. The script can optionally generate C headers for alternative key sequence tests, that were used before implementing golden tests. The latter tests are not used anymore (duplicate with golden tests) but their generator is kept for now, as they can still be useful for debugging or writing similar tests. - The `merge-modes` test generates its own keymap files for golden tests, using: `build/test-merge-modes update`. It can also replace them with the obtained output rather than the expected one using `build/test-merge-modes update-obtained`, which is very useful for debugging.
71d64df3	2024-10-08T18:45:18	symbols: Add tests for multiple actions per level
31c6d866	2024-10-08T18:39:00	symbols: Min. 2 keysyms in level list Do not allow `{ a }` when a single `a` suffices.
929a485f	2024-10-08T12:52:53	symbols: Fix too liberal parsing of keysyms lists Currently we are too liberal when parsing symbols lists: e.g. `[{a,{b}}]` is parsed as `[{a,b}]` but it should be rejected.
e325e65e	2024-02-20T08:13:37	Add test_unit to all tests Currently it only ensure we do not buffer `stdout`.
00e3058e	2023-11-06T21:53:51	Prevent recursive includes of keymap components - Add check for recursive includes of keymap components. It relies on limiting the include depth. The threshold is currently to 15, which seems reasonable with plenty of margin for keymaps in the wild. - Add corresponding new log message `recursive-include`. - Add tests for recursive includes.
82e9293e	2023-10-30T15:28:10	xkbcomp: early detection of invalid encoding
f937c308	2023-10-29T07:31:34	xkbcomp: skip heading UTF-8 encoded BOM (U+FEFF) Leading BOM is legal and is used as a signature — an indication that an otherwise unmarked text file is in UTF-8. See: https://www.unicode.org/faq/utf_bom.html#bom5 for further details.
b06aedb8	2023-05-02T14:15:55	scanner: allow for a zero terminated string as keymap As the documentation for xkb_keymap_new_from_buffer() states, the "input string does not have to be zero-terminated". The actual implementation however failed with "unrecognized token/syntax error" when it encountered a null byte. Fix this by allowing a null byte at the last position of the buffer. Anything else is likely a client error anyway. Fixes #307
40aab05e	2019-12-27T13:03:20	build: include config.h manually Previously we included it with an `-include` compiler directive. But that's not portable. And it's better to be explicit anyway. Every .c file should have `include "config.h"` first thing. Signed-off-by: Ran Benita <ran@unusedvar.com>
36f55c49	2013-03-11T12:53:39	keymap: add xkb_keymap_new_from_buffer() The current API doesn't allow the caller to create keymaps from mmap()'ed files. The problem is, xkb_keymap_new_from_string() requires a terminating 0 byte. However, there is no way to guarantee that when using mmap() so a user currently has to copy the whole file just to get the terminating zero byte (assuming they cannot use xkb_keymap_new_from_file()). This adds a new entry xkb_keymap_new_from_buffer() which takes a memory location and the buffer size in bytes. Internally, we depend on yy_scan_{string,byte}() helpers. According to flex documentation these already copy the input string because they are wrappers around yy_scan_buffer(). yy_scan_buffer() on the other hand has some insane requirements. The buffer must be writeable and the last two bytes must be ASCII-NUL. But the buffer may contain other 0 bytes just fine. Because we don't want these constraints in our public API, xkb_keymap_new_from_buffer() needs to create a copy of the input memory. But it then calls yy_scan_buffer() directly. Hence, we have the same number of buffer-copies as with *_from_string() but without the terminating 0 requirement. The explicit yy_scan_buffer() call is preferred over yy_scan_byte() so the buffer-copy operation is not hidden somewhere in flex. Maybe some day we no longer depend on flex and can have a zero-copy API. A user could mmap() a file and it would get parsed right from this buffer. But until then, we shouldn't expose this limitation in the API but instead provide an API that some day can work with zero-copy. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> [ran: rebased on top of my branch] Conflicts: Makefile.am src/xkbcomp/xkbcomp.c

ac9cd053

2025-06-11T19:00:47

test: Check extended layout indexes

7f39be25

2025-06-10T15:46:45

test: Use explicit keymap output format for test_compile_output()

0f89ad97

2025-06-09T19:26:13

dump: Always use numeric group indexes The upcoming raise of the maximum groups count will require to use numeric group indexes instead of the syntax `GroupN` if groups > 8. Let’s not bother with handling two cases (group count ≤ 8 or > 8) and always serialize group indexes as numeric values.

f3386743

2025-06-09T16:44:54

test: Use explicit keymap format in test_compile_output()

2acf5eca

2025-06-09T16:26:56

test: Use explicit keymap format in test_compile_buffer()

79e95509

2025-06-09T11:07:36

test: Use explicit keymap format in test_compile_rules()

39b4b670

2025-06-06T18:40:29

Support including keymap components using %-expansion and absolute path Enable to use the same `include` features than *rules* files in *keymap components*: - *`%`-expansion*: `%H` home directory, `%S` sytem root and `%E` extra. - absolute file paths. This is useful if one wants to overwrite the system file with a user config (i.e. same name, but in `~/.config/xkb`), but still include the system file: ``` // File: ~/.config/xkb/symbols/de xkb_symbols "basic" { include "%S/de(basic)" key <AB01> { [z, Z] }; key <AD06> { [y, Y] }; } ```` Without the commit, using a mere `include "de(basic)"` would result in an include loop. Refactored by using the same code for rules and keymap components.

9b4fd82b

2025-05-13T11:46:46

test: Skip checked arithmetic if not available

fb9fec18

2025-05-10T10:18:38

xkbcomp: Checked arithmetic Use a polyfill for C23 checked arithmetic. This is a bit paranoid, as we expect the user to use only 32 bit integers, so the signed 64 bit integer we use to store the result should be more than enough. Use jtckdint v1.0: - repository: https://github.com/jart/jtckdint - commit: 339450d13d8636f05dcb71ba36efddb226db481e - removed all C++-specific code

22d27277

2025-05-10T10:12:31

actions: Reject arguments if they are not expected `NoAction`, `VoidAction` and `TerminateServer` do not accept arguments.

d239a3f0

2025-05-11T11:42:20

actions: Improve unsupported legacy X11 actions handling - Display a warning - Document drawbacks of degrading to `NoAction()`

b4c89600

2025-05-09T15:15:10

actions: Add VoidAction(), mirroring NoSymbol/VoidSymbol. Added `VoidAction()` action to match the keysym pair `NoSymbol` / `VoidSymbol`. It enables overriding a previous action and breaks latches. This is a libxkbcommon extension. When serializing it will be converted to `LockControls(controls=none,affect=neither)` for backward compatibility. We cannot serialize it to `NoAction()`, as it would be dropped in e.g. the context of multiple actions.

c2d3694b

2025-05-06T07:01:01

xkbcomp: Do not discard extra bits in vmod masks Since we accept numeric values for the vmod mask in the keymap, we may have extra bits set that encode *no* real/virtual modifier. Keep them unchanged for consistency. E.g. the following keymap: xkb_keymap { xkb_keycodes { <a> = 38; }; xkb_symbols { virtual_modifiers X = 0xf0000000; key <a> { [ SetMods(mods = 0x00001100) ] }; }; }; would compile to: xkb_keymap { xkb_keycodes { <a> = 38; }; xkb_symbols { virtual_modifiers X = 0xf0000000; // Internal state key <a> { [ SetMods(mods = 0xf0001000) ] }; // Serialization key <a> { [ SetMods(mods = 0x00001100) ] }; }; };

9b0b8c68

2025-04-15T19:53:28

xkbcomp: Stricter handling of default map include Before this commit, including a *default* map, i.e. without an explicit section name (e.g. `include "au"` vs `include "au(basic)"`) would match the first section of the first matching file in the XKB include paths, even if this section is not an *explicit* default map (i.e. tagged with `default`) but an *implicit* default map (i.e. the first map of the file, i.e. a weak match). It makes user configuration risky: say a user wants to create a custom version `au(custom)` of the `au` layout: - `./config/xkb/symbols/au`: custom layout in section “custom”. - `/usr/share/X11/xkb/symbols/au`: system layout, with *default* section “basic”. In this setup *any* layout that imports the default map from `au` would in fact import the *implicit* default map `au(custom)` instead of the *explicit* default map `au(basic)`. This incorrect behavior may thus break setups with multiple layouts. This is especially true for symbols files such as: `pc`, `us` or `latin`. Fixed by trying harder to found the exact default map, defaulting to the old behavior (weak match) only if no *explicit* default map (exact match) has been found in the XKB include paths.

66f71890

2025-03-31T08:01:29

symbols: Enable writing keysyms list as UTF-8 strings Each Unicode code point of the string will be translated to their respective keysym, if possible. An empty string denotes `NoSymbol`. When such conversion is not possible, this will raise a syntax error. This introduces the following syntax: ```c // Empty string = `NoSymbol` key <1> {[""]}; // NoSymbol // Single code point = single keysym key <2> {["é"]}; // eacute // String = translate each code point to their respective keysym key <3> {["sßξك🎺"]}; // {s, ssharp, Greek_xi, Arabic_kaf, U1F3BA} // Mix string and keysyms key <4> {[{"ξ", Greek_kappa, "β"}]}; // { Greek_xi, Greek_kappa, Greek_beta} ``` It can also be used wherever a keysym is required, e.g. in `interpret` and `modifier_map` statements. In these cases a single keysym is expected, so the string should contain *exactly one* Unicode code point.

ead3ce77

2025-03-28T21:44:27

scanner: Enable LRM and RLM marks for BiDi text Enable displaying bidirectional text in XKB files using: - U+200E LEFT-TO-RIGHT MARK - U+200F RIGHT-TO-LEFT MARK We now parse these marks as white space. As such, they are dropped; note that a later serialization may not display correctly without the marks, although it will parse. References: - https://www.w3.org/International/articles/inline-bidi-markup/uba-basics - https://www.w3.org/International/questions/qa-bidi-unicode-controls - https://www.unicode.org/reports/tr31/#Whitespace - https://www.unicode.org/reports/tr55/

bc3e464b

2025-04-09T12:35:05

keysyms: Fix Unicode handling - `xkb_utf32_to_keysym`: Allow [Unicode noncharacters]. There is no requirement to drop them and this would be the only function of our API doing so. From the Unicode Standard 16.0, section 23.7 “Noncharacters”: > Applications are free to use any of these noncharacter code points > internally. They have no standard interpretation when exchanged > outside the context of internal use. However, they are not illegal > in interchange, nor does their presence cause Unicode text to be > ill-formed. > If a noncharacter is received in open interchange, an application is > not required to interpret it in any way. It is good practice, > however, to recognize it as a noncharacter and to take appropriate > action, such as replacing it with `U+FFFD` REPLACEMENT CHARACTER, > to indicate the problem in the text. The key part is: > an application is not required to interpret it in any way Since we handle the reverse conversion with `xkb_keysym_to_utf32` just fine, I do not see a good motivation to keep this asymmetry. This is the only function with a special case for these code points. - `xkb_keysym_from_name`: - Unicode format `UNNNN`: allow control characters C0 and C1 and use `xkb_utf32_to_keysym` for the conversion when `NNNN < 0x100`, for backward compatibility. - Numeric hexadecimal format `0xNNNN`: *unchanged*. Contrary to the Unicode format, it does not normalize any keysym values in order to enable roundtrip with `xkb_keysym_get_name`. Also added tests to ensure various properties and consistency. Note about *surrogates*: they are valid valid *code points* but invalid Unicode *scalar values*, i.e. they cannot be encoded in any Unicode encoding form (UTF-8, UTF-16, UTF-32). So their corresponding Unicode keysyms are valid, but: - cannot be used as input of `xkb_keysym_to_utf32` nor `xkb_keysym_to_utf8` - cannot result as output of `xkb_utf32_to_keysym`. Otherwise they are valid e.g. in the Unicode keysym notation. [Unicode noncharacters]: https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Noncharacters

5e557040

2025-04-09T11:17:00

xkbcomp: Fix Unicode escape sequence While the previous code correctly rejected malformed sequences such as `\u{` (incomplete) or `\u{123x}`, it should try to consume as much input as possible until reaching the corresponding closing `}` within the string. Else we can get leftovers and the error message does not reference the whole malformed sequence. Also added further tests with surrogates and noncharacters.

36442baa

2025-04-03T15:01:46

xkbcomp: Support multiple actions in interpret Before this commit we supported multiple actions per level, but not in *interpret* statements. Let’s fix this asymmetry, so we can equivalently assign all actions sets either implicitly or explicitly.

3d79f459

2025-03-29T11:46:34

xkbcomp: Add Unicode code point escape sequence \u{NNNN} Unicode code point escape sequences `\u{NNNN}` are replaced with the UTF-8 encoding of their corresponding code point `U+NNNN`, if legal. Supported Unicode code points are in the range `1‥0x10ffff`. Note that we will reject the `U+0000` NULL code point, as we reject it in the octal escape sequence `\0`. This is intended mainly for the upcoming feature to write keysyms as UTF-8 encoded strings. It can be used for various reasons: - avoid encoding issues; - avoid issue with font rendering (e.g. Asian scripts); - make white space or zero-width characters more readable.

7d91a753

2025-03-29T12:24:39

xkbcomp: Enable xkbcomp-style octal escape sequences Xorg xkbcomp only parses octal sequences with `\0`, while xkbcommon does not force the `0` prefix of the numeric part. However, we only parsed up to to 3 digits, which does not allow to parse e.g. `\0377` while `\377` parses fine. Fixed by parsing up to 4 octal digits, while checking the result fits into a byte.

aa8b572e

2025-03-29T12:04:26

keymap serialization: Ensure escaping relevant chars Previously we would write characters without any escaping in some cases (e.g.: names of indicators, types and groups). E.g. the string "new\nline" would be serialized as: "new line" which would raise a syntax error if parsed. Fixed by escaping any string that was not escaped after parsing (e.g. the section names are safe already).

d5a91fa9

2025-04-04T16:38:16

xkbcomp: Use custom parsers instead of strtol* The use of `strtol*` functions was already restricted due to its slowness and its capacity to parse other stuff than digits (e.g. signs and spaces). There is also another *big* limitation: it requires a NULL-terminated string. This is incompatible with our functions that work on buffers, because we cannot guarantee this. This may lead to a memory violation if the last token is a number. We now roll out our own parsers, which are more efficients and compatible with buffers.

44480f7c

2025-04-01T08:28:02

xkbcomp: Enable lists of keysyms and actions {} and {a} Motivations: - Follow the principle of least astonishment; - Ensure consistency; - Enhance the use of custom defaults; - Facilitate the tests. There is some ambiguity because we use `{}` to denote both an empty list of keysyms and an empty list of actions. But as soon as we get a keysym or an action, we know whether it is a `MultiKeySymList` or a `MultiActionList`. So we just count the `{}` at the *beginning* using `NoSymbolOrActionList`, then replace it by the relevant count of `NoSymbol` or `NoAction()` once the ambiguity is solved. If not, this is a list of empties of *some* type: we drop those empties and delegate the type resolution using `ExprEmptyList()`.

e09cbe66

2025-04-02T10:46:06

symbols: Fix handling of empty keys Before this commit, the following symbols: ```c xkb_symbols { virtual_modifiers M1, M2; key <A> {}; key <B> { [] }; key.vmods = M1; key <C> {}; key <D> { vmods = M2 }; }; ``` would be equivalent to: ```c xkb_symbols { virtual_modifiers M1,M2; key <B> { [ NoSymbol ] }; }; ``` `<B>` entry could be skipped but is harmless. However, `<C>` and `<D>` are missing, which would lead to the mapping resolution of `M1` and `M2` failing. After this commit, it is equivalent to: ```c virtual_modifiers M1,M2; key <C> { vmods = M1 }; key <D> { vmods = M2 }; ``` Empty keys are skipped entirely, but any explicit field: - is taken into account: previously they would be skipped if there were no group; - forces the key to be printed at serialization.

2e0245f8

2025-04-02T10:45:44

xkbcomp: Enable more empty lists - Empty `interpret` - Empty key `type` - Empty `indicator` Motivations: - Follow the principle of least astonishment; - Ensure consistency; - Enhance the use of custom defaults; - Facilitate the tests.

6881fb32

2025-04-01T08:28:02

xkbcomp: Drop trailing NoSymbol and NoAction() This brings us closer to what `xkbcomp` outputs. One should use the explicit `VoidSymbol` instead of `NoSymbol`, in order to avoid dropping empty levels. This may affect keys that rely on an *implicit* key type. Example: - Input: ```c key <> { [a, A, NoSymbol] }; ``` - Compilation with xkbcommon \< 1.9.0: ```c key <> { type= "FOUR_LEVEL_SEMIALPHABETIC", [a, A, NoSymbol, NoSymbol] }; ``` - Compilation with xkbcommon ≥ 1.9.0: ```c key <> { type= "ALPHABETIC", [a, A] }; ```

fbacdd98

2025-03-31T07:58:04

test: Refactor test_multi_keysyms_actions - Use less macros - Add golden tests to check the compilation *result*

b254cc2e

2025-03-30T12:27:15

test: Remove empty components boilerplate

3150bca8

2025-03-30T09:54:02

xkbcomp: Make all components optional We already accept *empty* components, such as: `xkb_compat {};`. Let’s accept missing components as well, so that we can reduce the boilerplate in our tests. Note that we will still explicitly serialize empty components for compatibility with previous xkbcommon versions and Xorg xkbcomp.

500b260b

2025-03-28T09:38:58

xkbcomp: Fix parser failure on floating-point numbers Before this commit we used `strtold`, which depends on the locale. But the XKB syntax is fixed and uses a period as decimal separator. So ensure the syntax is correct without relying on `strtold` and truncate the result, as the parser does not use floating-point numbers.

e5401b07

2025-03-26T16:02:58

symbols: Improve Modmap parsing Parse, dont’t validate: ensure *at parsing* that `modifier_map` definitions use a list of keys and keysyms. This enables to remove the redundant `ExprResolveKeySym` and have keysym parsing exclusively in handled in `parser.y`.

e8561909

2025-03-18T14:34:10

xkbcomp: Fix keycodes bounds - Refactor to check conflicts first for the key names and then for the keycodes. This seems more useful for the user and enable further memory optimizations. - Do not allocate until we are sure to add the keycode. The bounds are only updated afterwards, so the call to `FindKeyByName` should be more efficient. - Fixed keycodes bounds not shrunk correctly when an existing keycode is overridden. - Do not prepare keyname strings for logging if we are not going to use them.

befa0cdd

2025-02-12T15:38:58

test: Check integers syntax

4cef822a

2025-02-12T07:44:34

test: Check masks syntax

e120807b

2025-01-29T15:35:22

Update license notices to SDPX short identifiers + update LICENSE Fix #628. Signed-off-by: Ran Benita <ran@unusedvar.com>

502e9e5b

2025-01-29T12:19:10

xkbcomp: Add stricter bounds for keycodes and levels Our current implementation uses continuous arrays indexed by keycodes and levels. This is simple and good enough for realistic keymaps. However, they are allowed to have big values that will lead to either memory exhaustion or a waste of memory (sparse arrays). Added the much stricter upper bounds `0xfff` for keycodes[^1] and 2048 for levels[^2], which should still be plenty enough and provides stronger memory security. [^1]: Current max keycode is 0x2ff in Linux. [^2]: Should be big enough to satisfy automatically generated keymaps.

88a3d3c2

2025-01-23T16:03:51

tests: Refactor buffercomp Move tests into proper functions and log tests names.

b1e1aae6

2025-01-23T15:20:44

xkbcomp: Fix memory leak when extra content after keymap It triggers with e.g.: ``` xkb_keymap { xkb_keycodes { }; }; }; // erroneous ```

709027ec

2025-01-23T09:12:15

symbols: Fix inconsistent error handling Currently the following keymap triggers a critical error (invalid `vmods`) only for the second key statement, while it should handle both equally. ``` xkb_keymap { xkb_keycodes { <> = 9; }; xkb_types { }; xkb_compat { }; xkb_symbols { key <> { vmods = [], repeats = false }; key <> { repeats = false, vmods = [] }; }; }; ``` Fixed by parsing the whole symbols body and failing if any error was found.

ec2915fe

2025-01-22T17:18:21

symbols: Fix a possible null pointer deference Introduce a new Expression type, `EXPR_EMPTY_LIST`, to avoid the ambiguity between action and keysym empty lists. Two alternatives were rejected to keep the semantics clear: - Using `EXPR_KEYSYM_LIST`: because we would end up accepting an empty keysym list while processing actions. - Using NULL: convey no info and is hazardous.

7036e46c

2025-01-13T15:20:47

symbols: Add tests for key merge modes (keysyms/actions) This commit adds tests for merging various key configurations: - With/without keysyms/actions - Single/multiple keysyms/actions per level We test all the merge modes for including a map (global) as well as directly on the keys (local): - default (global: include, local: implicit) - augment - override - replace The tests data are generated with: - A Python script `scripts/update-merge-modes-tests.py` for keycodes and symbols data. Use `--debug` for extra comments to help debugging. The script can optionally generate C headers for alternative key sequence tests, that were used before implementing golden tests. The latter tests are not used anymore (duplicate with golden tests) but their generator is kept for now, as they can still be useful for debugging or writing similar tests. - The `merge-modes` test generates its own keymap files for golden tests, using: `build/test-merge-modes update`. It can also replace them with the obtained output rather than the expected one using `build/test-merge-modes update-obtained`, which is very useful for debugging.

71d64df3

2024-10-08T18:45:18

symbols: Add tests for multiple actions per level

31c6d866

2024-10-08T18:39:00

symbols: Min. 2 keysyms in level list Do not allow `{ a }` when a single `a` suffices.

929a485f

2024-10-08T12:52:53

symbols: Fix too liberal parsing of keysyms lists Currently we are too liberal when parsing symbols lists: e.g. `[{a,{b}}]` is parsed as `[{a,b}]` but it should be rejected.

e325e65e

2024-02-20T08:13:37

Add test_unit to all tests Currently it only ensure we do not buffer `stdout`.

00e3058e

2023-11-06T21:53:51

Prevent recursive includes of keymap components - Add check for recursive includes of keymap components. It relies on limiting the include depth. The threshold is currently to 15, which seems reasonable with plenty of margin for keymaps in the wild. - Add corresponding new log message `recursive-include`. - Add tests for recursive includes.

82e9293e

2023-10-30T15:28:10

xkbcomp: early detection of invalid encoding

f937c308

2023-10-29T07:31:34

xkbcomp: skip heading UTF-8 encoded BOM (U+FEFF) Leading BOM is legal and is used as a signature — an indication that an otherwise unmarked text file is in UTF-8. See: https://www.unicode.org/faq/utf_bom.html#bom5 for further details.

b06aedb8

2023-05-02T14:15:55

scanner: allow for a zero terminated string as keymap As the documentation for xkb_keymap_new_from_buffer() states, the "input string does not have to be zero-terminated". The actual implementation however failed with "unrecognized token/syntax error" when it encountered a null byte. Fix this by allowing a null byte at the last position of the buffer. Anything else is likely a client error anyway. Fixes #307

40aab05e

2019-12-27T13:03:20

build: include config.h manually Previously we included it with an `-include` compiler directive. But that's not portable. And it's better to be explicit anyway. Every .c file should have `include "config.h"` first thing. Signed-off-by: Ran Benita <ran@unusedvar.com>

36f55c49

2013-03-11T12:53:39

keymap: add xkb_keymap_new_from_buffer() The current API doesn't allow the caller to create keymaps from mmap()'ed files. The problem is, xkb_keymap_new_from_string() requires a terminating 0 byte. However, there is no way to guarantee that when using mmap() so a user currently has to copy the whole file just to get the terminating zero byte (assuming they cannot use xkb_keymap_new_from_file()). This adds a new entry xkb_keymap_new_from_buffer() which takes a memory location and the buffer size in bytes. Internally, we depend on yy_scan_{string,byte}() helpers. According to flex documentation these already copy the input string because they are wrappers around yy_scan_buffer(). yy_scan_buffer() on the other hand has some insane requirements. The buffer must be writeable and the last two bytes must be ASCII-NUL. But the buffer may contain other 0 bytes just fine. Because we don't want these constraints in our public API, xkb_keymap_new_from_buffer() needs to create a copy of the input memory. But it then calls yy_scan_buffer() directly. Hence, we have the same number of buffer-copies as with *_from_string() but without the terminating 0 requirement. The explicit yy_scan_buffer() call is preferred over yy_scan_byte() so the buffer-copy operation is not hidden somewhere in flex. Maybe some day we no longer depend on flex and can have a zero-copy API. A user could mmap() a file and it would get parsed right from this buffer. But until then, we shouldn't expose this limitation in the API but instead provide an API that some day can work with zero-copy. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> [ran: rebased on top of my branch] Conflicts: Makefile.am src/xkbcomp/xkbcomp.c

kc3-lang/libxkbcommon/test/buffercomp.c

test/buffercomp.c

Log