Hash :
bc3e464b
Author :
Date :
2025-04-09T12:35:05
keysyms: Fix Unicode handling
- `xkb_utf32_to_keysym`: Allow [Unicode noncharacters]. There is no
requirement to drop them and this would be the only function of our
API doing so.
From the Unicode Standard 16.0, section 23.7 “Noncharacters”:
> Applications are free to use any of these noncharacter code points
> internally. They have no standard interpretation when exchanged
> outside the context of internal use. However, they are not illegal
> in interchange, nor does their presence cause Unicode text to be
> ill-formed.
> If a noncharacter is received in open interchange, an application is
> not required to interpret it in any way. It is good practice,
> however, to recognize it as a noncharacter and to take appropriate
> action, such as replacing it with `U+FFFD` REPLACEMENT CHARACTER,
> to indicate the problem in the text.
The key part is:
> an application is not required to interpret it in any way
Since we handle the reverse conversion with `xkb_keysym_to_utf32` just
fine, I do not see a good motivation to keep this asymmetry. This is
the only function with a special case for these code points.
- `xkb_keysym_from_name`:
- Unicode format `UNNNN`: allow control characters C0 and C1 and use
`xkb_utf32_to_keysym` for the conversion when `NNNN < 0x100`, for
backward compatibility.
- Numeric hexadecimal format `0xNNNN`: *unchanged*. Contrary to the
Unicode format, it does not normalize any keysym values in order to
enable roundtrip with `xkb_keysym_get_name`.
Also added tests to ensure various properties and consistency.
Note about *surrogates*: they are valid valid *code points* but invalid
Unicode *scalar values*, i.e. they cannot be encoded in any Unicode
encoding form (UTF-8, UTF-16, UTF-32). So their corresponding Unicode
keysyms are valid, but:
- cannot be used as input of `xkb_keysym_to_utf32` nor `xkb_keysym_to_utf8`
- cannot result as output of `xkb_utf32_to_keysym`.
Otherwise they are valid e.g. in the Unicode keysym notation.
[Unicode noncharacters]: https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Noncharacters