data


Log

Author Commit Date CI Message
Pierre Le Marre b9b4ab47 2024-12-09T09:21:01 keysyms: Add sharp S upper case mapping exception The case mapping `ssharp` ß ↔ `U1E9E` ẞ was added in 13b30f4f0dccc08dfea426d73570b913596ed602 but was broken: - For the lower case mapping it returned the keysym `0x10000df`, which is an invalid Unicode keysym. - For the upper case mapping it returned the upper Unicode code point rather than the corresponding keysym. It did accidentally enable the detection of alphabetic key type for the pair (ß, ẞ) though. However this detection was accidentally removed in 5c7c79970a2800b6248e829464676e1f09c5f43d (v1.7) with an attempt to fix the wrong keysym case mapping. Finally both the *lower* case mapping and the key type detection were fixed for good when we implemented the complete Unicode simple case mappings and corresponding tests in e83d08ddbc9851944c662c18e86d4eb0eff23e68. However, the *upper* case mapping `ssharp` → `U1E9E` remained disabled. Indeed, ẞ is a relatively recent addition to Unicode (2008) and had no official recommendation, until recently. So while the lower mapping ẞ→ß exists in Unicode, its converse upper mapping does not. Yet since 2017 the Council for German Orthography (Rat für deutsche Rechtschreibung) recommends[^1] ẞ as the capitalization of ß. Due to its stability policies, the Unicode Character Database (UCD) that we use to generate our keysym case mappings (via ICU) cannot update the simple case mapping of ß. Discussions are currently ongoing in the Unicode mailing list[^2] and CLDR[^3] about how to deal with the new recommended case mapping. However, the discussions are oriented on text-processing and compatibility mappings, while libxkbcommon is on a rather lower level. It seems that the slow adoption of ẞ is partly due to the difficulty to type it. Since ẞ is used only for ALL CAPS casing, the expectation is to type it using CapsLock. While our detection of alphabetic key types works well[^4] for the pair (ß,ẞ), the *internal capitalization* currently does not work and is fixed by this commit. Added the ß → ẞ upper mapping: - Added an exception in the generation script - Fixed tests - Added documentation of the exceptions in `xkbcommon.h` - Added/updated log entries [^1]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/ [^2]: https://corp.unicode.org/pipermail/unicode/2024-November/011162.html [^3]: https://unicode-org.atlassian.net/browse/CLDR-17624 [^4]: Except libxkbcommon 1.7, see the second paragraph.
Pierre Le Marre 420123fd 2024-11-29T07:34:07 export-keysyms: Enable all keysyms export + char names This enable to check case mappings for Unicode keysyms.
Pierre Le Marre e83d08dd 2024-02-23T17:10:15 keysyms: Fast and complete case mappings (Unicode 15.1) The current code to handle keysym case mappings is quite complex and slow. It is also incomplete, as it does not cover recent Unicode database. Finally, it does not handle title case correctly. It would be easier if we were to use only a lookup table, but a trivial implementation would lead to a huge array: the cased characters range from `U+0041` to `U+`1F189, i.e. a span of 127 304 elements. Thus we need some tricks to compress the lookup table. We based our work on the post: https://github.com/apankrat/notes/blob/3c551cb028595fd34046c5761fd12d1692576003/fast-case-conversion/README.md The compression algorithm is roughly: 1. Compute the delta between the characters and their mappings. 2. Split the delta array in chunk of a given size. 3. Rearrange the order of the chunks in order to optimize consecutive chunks overlap. 4. Create a data table with the reordered chunks and an index table that maps the original chunk index to its offset in the data table. The compression algorithm is then applied a second time to the previous index table. The complete algorithm optimizes the two chunk sizes in order to get the lowest total data size. The mappings were generated using CPython 3.12.4, PyICU 2.13, PyYaml 6.0.1 and ICU 75.1. Also: - Added explicit list of named keysyms and their case mappings. - Added benchmark for case mappings. - Rework ICU tests. Note: 13b30f4f0dccc08dfea426d73570b913596ed602 introduced a fix for sharp S `U+00DF`. With the new implementation, the *conversion* functions `xkb_keysym_to_{lower,upper}` leave it *unchanged*, while the *predicate* functions `xkb_keysym_is_{lower,upper_or_title}` produce the expected results: ```c xkb_keysym_to_upper(XKB_KEY_ssharp) == XKB_KEY_ssharp; xkb_keysym_to_lower(XKB_KEY_ssharp) == XKB_KEY_ssharp; xkb_keysym_to_lower(XKB_KEY_Ssharp) == XKB_KEY_ssharp; xkb_keysym_is_lower (XKB_KEY_ssharp) == true; xkb_keysym_is_upper_or_title(XKB_KEY_Ssharp) == true; ```
Pierre Le Marre b8b872d0 2024-02-23T17:10:15 keysyms: Fix case mappings There are several mismatches with the Unicode database. Fix the case mappings in `data/keysyms.yaml` for the following keysyms: - `Iabovedot` - `idotless` - `Greek_finalsmallsigma` - `function`
Pierre Le Marre 798b7b73 2024-02-23T17:10:15 keysyms: Export explicitly named keysyms Added a tool to export explicitly named keysyms to a yaml file. This includes the following keysyms properties: - Value - Name - Corresponding Unicode code point, if relevant - Corresponding case mapping, if relevant This will allow us to check the case mappings easier and to refactor the related functions.