|
b9b4ab47
|
2024-12-09T09:21:01
|
|
keysyms: Add sharp S upper case mapping exception
The case mapping `ssharp` ß ↔ `U1E9E` ẞ was added in
13b30f4f0dccc08dfea426d73570b913596ed602 but was broken:
- For the lower case mapping it returned the keysym `0x10000df`, which
is an invalid Unicode keysym.
- For the upper case mapping it returned the upper Unicode code point
rather than the corresponding keysym.
It did accidentally enable the detection of alphabetic key type for the
pair (ß, ẞ) though. However this detection was accidentally removed in
5c7c79970a2800b6248e829464676e1f09c5f43d (v1.7) with an attempt to fix
the wrong keysym case mapping. Finally both the *lower* case mapping
and the key type detection were fixed for good when we implemented the
complete Unicode simple case mappings and corresponding tests in
e83d08ddbc9851944c662c18e86d4eb0eff23e68.
However, the *upper* case mapping `ssharp` → `U1E9E` remained disabled.
Indeed, ẞ is a relatively recent addition to Unicode (2008) and had no
official recommendation, until recently. So while the lower mapping ẞ→ß
exists in Unicode, its converse upper mapping does not. Yet since 2017
the Council for German Orthography (Rat für deutsche Rechtschreibung)
recommends[^1] ẞ as the capitalization of ß.
Due to its stability policies, the Unicode Character Database (UCD)
that we use to generate our keysym case mappings (via ICU) cannot update
the simple case mapping of ß. Discussions are currently ongoing in the
Unicode mailing list[^2] and CLDR[^3] about how to deal with the new
recommended case mapping. However, the discussions are oriented on
text-processing and compatibility mappings, while libxkbcommon is
on a rather lower level.
It seems that the slow adoption of ẞ is partly due to the difficulty
to type it. Since ẞ is used only for ALL CAPS casing, the expectation
is to type it using CapsLock. While our detection of alphabetic key
types works well[^4] for the pair (ß,ẞ), the *internal capitalization*
currently does not work and is fixed by this commit.
Added the ß → ẞ upper mapping:
- Added an exception in the generation script
- Fixed tests
- Added documentation of the exceptions in `xkbcommon.h`
- Added/updated log entries
[^1]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/
[^2]: https://corp.unicode.org/pipermail/unicode/2024-November/011162.html
[^3]: https://unicode-org.atlassian.net/browse/CLDR-17624
[^4]: Except libxkbcommon 1.7, see the second paragraph.
|
|
420123fd
|
2024-11-29T07:34:07
|
|
export-keysyms: Enable all keysyms export + char names
This enable to check case mappings for Unicode keysyms.
|
|
e83d08dd
|
2024-02-23T17:10:15
|
|
keysyms: Fast and complete case mappings (Unicode 15.1)
The current code to handle keysym case mappings is quite complex and
slow. It is also incomplete, as it does not cover recent Unicode
database. Finally, it does not handle title case correctly.
It would be easier if we were to use only a lookup table, but a trivial
implementation would lead to a huge array: the cased characters range
from `U+0041` to `U+`1F189, i.e. a span of 127 304 elements.
Thus we need some tricks to compress the lookup table. We based our
work on the post:
https://github.com/apankrat/notes/blob/3c551cb028595fd34046c5761fd12d1692576003/fast-case-conversion/README.md
The compression algorithm is roughly:
1. Compute the delta between the characters and their mappings.
2. Split the delta array in chunk of a given size.
3. Rearrange the order of the chunks in order to optimize consecutive
chunks overlap.
4. Create a data table with the reordered chunks and an index table
that maps the original chunk index to its offset in the data table.
The compression algorithm is then applied a second time to the previous
index table.
The complete algorithm optimizes the two chunk sizes in order to get
the lowest total data size.
The mappings were generated using CPython 3.12.4, PyICU 2.13, PyYaml
6.0.1 and ICU 75.1.
Also:
- Added explicit list of named keysyms and their case mappings.
- Added benchmark for case mappings.
- Rework ICU tests.
Note: 13b30f4f0dccc08dfea426d73570b913596ed602 introduced a fix for
sharp S `U+00DF`. With the new implementation, the *conversion*
functions `xkb_keysym_to_{lower,upper}` leave it *unchanged*, while
the *predicate* functions `xkb_keysym_is_{lower,upper_or_title}`
produce the expected results:
```c
xkb_keysym_to_upper(XKB_KEY_ssharp) == XKB_KEY_ssharp;
xkb_keysym_to_lower(XKB_KEY_ssharp) == XKB_KEY_ssharp;
xkb_keysym_to_lower(XKB_KEY_Ssharp) == XKB_KEY_ssharp;
xkb_keysym_is_lower (XKB_KEY_ssharp) == true;
xkb_keysym_is_upper_or_title(XKB_KEY_Ssharp) == true;
```
|
|
b8b872d0
|
2024-02-23T17:10:15
|
|
keysyms: Fix case mappings
There are several mismatches with the Unicode database.
Fix the case mappings in `data/keysyms.yaml` for the following keysyms:
- `Iabovedot`
- `idotless`
- `Greek_finalsmallsigma`
- `function`
|
|
798b7b73
|
2024-02-23T17:10:15
|
|
keysyms: Export explicitly named keysyms
Added a tool to export explicitly named keysyms to a yaml file. This
includes the following keysyms properties:
- Value
- Name
- Corresponding Unicode code point, if relevant
- Corresponding case mapping, if relevant
This will allow us to check the case mappings easier and to refactor the
related functions.
|