rules-format.md

Branch :
Show log
Commit
Author : Pierre Le Marre
Date : 2025-06-10 17:33:24
Hash : f7a61da7
Message : doc: Update new layout count ranges
doc/rules-format.md
The rules file {#rule-file-format}
==============

The purpose of the rules file is to map between configuration values
that are easy for a user to specify and understand, and the
configuration values that the keymap compiler, `xkbcomp`, uses and
understands. The following diagram presents an overview of this
process. See the [XKB introduction] for further details on the
components.

@dotfile xkb-configuration "XKB keymap configurations"

@tableofcontents{html:2}

`libxkbcommon`’s keymap compiler `xkbcomp` uses the `xkb_component_names`
struct internally, which maps directly to [include statements] of the
appropriate [sections] \(called [KcCGST] for short):

- [key codes],
- [compatibility],
- geometry ([not supported](@ref geometry-support) by xkbcommon),
- [symbols],
- [types].

These are not really intuitive nor straightforward for the uninitiated.
Instead, the user passes in a `xkb_rule_names` struct, which consists
of the following fields (called [RMLVO] for short):

- the name of a [rules] file (in Linux this is usually “evdev”),
- a keyboard [model] \(e.g. “pc105”),
- a set of [layouts][layout] (which will end up in different
  groups, e.g. “us,fr”),
- a set of [variants][variant] (used to alter/augment the respective
  layout, e.g. “intl,dvorak”),
- a set of [options] \(used to tweak some general
  behavior of the keyboard, e.g. “ctrl:nocaps,compose:menu” to make
  the Caps Lock key act like Ctrl and the Menu key like Compose).

[KcCGST]: @ref KcCGST-intro
[RMLVO]: @ref RMLVO-intro
[MLVO]: @ref RMLVO-intro
[XKB introduction]: @ref xkb-intro
[include statements]: @ref xkb-include
[sections]: @ref keymap-section-def
[key codes]: @ref the-xkb_keycodes-section
[compatibility]: @ref the-xkb_compat-section
[symbols]: @ref the-xkb_symbols-section
[types]: @ref the-xkb_types-section
[rules]: @ref config-rules-def
[model]: @ref config-model-def
[layout]: @ref config-layout-def
[variant]: @ref config-variant-def
[option]: @ref config-options-def
[options]: @ref config-options-def

# Format of the file

## Rules and rule sets {#rule-def}

@anchor rule-set-def
The file consists of **rule sets**, each consisting of **rules** (one
per line), which match the [MLVO] values on the left hand side, and,
if the values match to the values the user passed in, results in the
values on the right hand side being [added][value update] to the
resulting [KcCGST]. See @ref rmlvo-resolution for further details.

[rule set]: @ref rule-set-def
[rule sets]: @ref rule-set-def
[rule]: @ref rule-def

```c
// This is a comment

// The following line is a rule header.
// It starts with ‘!’ and introduces a rules set.
// It indicates that the rules map MLVO options to KcCGST symbols.
! option       = symbols
  // The following lines are rules that add symbols of the RHS when the
  // LHS matches an option.
  ctrl:nocaps  = +ctrl(nocaps)
  compose:menu = +compose(menu)

// One may use multiple MLVO components on the LHS
! layout    option          = symbols
  be        caps:digits_row = +capslock(digits_row)
  fr        caps:digits_row = +capslock(digits_row)
```

## Groups {#rules-group-def}

Since some values are related and repeated often, it is possible
to *group* them together and refer to them by a **group name** in the
rules.

[group]: @ref rules-group-def

```c
// Let’s rewrite the previous rules set using groups.
// Groups starts with ‘$’.

// Define a group for countries with AZERTY layouts
! $azerty = be fr

// The following rule will match option `caps:digits_row` only for
// layouts in the $azerty group, i.e. `fr` and `be`.
! layout    option          = symbols
 $azerty    caps:digits_row = +capslock(digits_row)
```

## Wild cards {#rules-wildcard-def}

Along with matching values by simple string equality and for
membership in a [group] defined previously, rules may also contain
**wild card** values with the following behavior:

<dl>
<dt>* @anchor rules-wildcard-legacy-def</dt>
<dd>

Legacy wild card:
- For `model` and `options`: *always* match.
- For `layout` and `variant`: match any *non-empty* value.

This wild card usually appears near the end of a rule set to set *default* values.

@note Prefer using the wild cards @ref rules-wildcard-some-def "\<some\>" or
@ref rules-wildcard-any-def "\<any\>" for their simpler semantics, as it does not
depend on the context.
</dd>
<dt>\<none\> @anchor rules-wildcard-none-def</dt>
<dd>

Match *empty* values

@since 1.9.0
</dd>
<dt>\<some\> @anchor rules-wildcard-some-def</dt>
<dd>

Match *non-empty* value

@since 1.9.0
</dd>
<dt>\<any\> @anchor rules-wildcard-any-def</dt>
<dd>

Match *any* (optionally empty) value. Its behavior does not depend on the
context, contrary to the legacy wild card @ref rules-wildcard-legacy-def "*".

This wild card usually appears near the end of a rule set to set *default* values.

@since 1.9.0
</dd>
</dl>

```c
! layout = keycodes
  // The following two lines only match exactly their respective groups.
 $azerty = +aliases(azerty)
 $qwertz = +aliases(qwertz)
  // This line will match layouts that are neither in $azerty nor in
  // $qwertz groups.
  *      = +aliases(qwerty)
```

# Grammar

It is advised to look at a file like `rules/evdev` along with
this grammar.

@note Comments, whitespace, etc. are not shown.

```bnf
File         ::= { "!" (Include | Group | RuleSet) }

Include      ::= "include" <ident>

Group        ::= GroupName "=" { GroupElement } "\n"
GroupName    ::= "$"<ident>
GroupElement ::= <ident>

RuleSet      ::= Mapping { Rule }

Mapping      ::= { Mlvo } "=" { Kccgst } "\n"
Mlvo         ::= "model" | "option" | ("layout" | "variant") [ Index ]
Index        ::= "[" ({ NumericIndex } | { SpecialIndex }) "]"
NumericIndex ::= 1..XKB_MAX_GROUPS
SpecialIndex ::= "single" | "first" | "later" | "any"
Kccgst       ::= "keycodes" | "symbols" | "types" | "compat" | "geometry"

Rule         ::= { MlvoValue } "=" { KccgstValue } "\n"
MlvoValue    ::= "*" | "<none>" | "<some>" | "<any>" | GroupName | <ident>
KccgstValue  ::= <ident> [ { Qualifier } ]
Qualifier    ::= ":" ({ NumericIndex } | "all")
```

<!--
[WARNING]: Doxygen parsing is a mess. \% does not work as expected
in Markdown code quotes, e.g. `\%H` gives `\H`. But using <code> tags
or %%H seems to do the job though.
-->
@note
- Include processes the rules in the file path specified in the `ident`,
  in order. **%-expansion** is performed, as follows: @anchor rules-include-expansion
  <dl>
    <dt>`%%`</dt>
    <dd>A literal %.</dd>
    <dt><code>\%H</code></dt>
    <dd>The value of the `$HOME` environment variable.</dd>
    <dt><code>\%E</code></dt>
    <dd>
        The extra lookup path for system-wide XKB data (usually
        `/etc/xkb/rules`).
    </dd>
    <dt><code>\%S</code></dt>
    <dd>
        The system-installed rules directory (usually
        `/usr/share/X11/xkb/rules`).
    </dd>
  </dl>
  **Note:** This feature is supported by libxkbcommon but not by the legacy X11
  tools.

- @anchor rules-extended-layout-indexes
  (Since version `1.8.0`)
  The following *extended layout indexes* can be used to avoid repetition and
  clarify the semantics:

  <dl>
    <dt>`single`</dt>
    <dd>
        Matches a single layout; `layout[single]` is the same as without
        explicit index: `layout`.
    </dd>
    <dt>`first`</dt>
    <dd>
        Matches the first layout/variant, no matter how many layouts are in
        the RMLVO configuration. Acts as both `layout` and `layout[1]`.
    </dd>
    <dt>`later`</dt>
    <dd>
        Matches all but the first layout. This is an index *range*.
        Acts as `layout[2]` .. `layout[4]`.
    </dd>
    <dt>any</dt>
    <dd>
        Matches layout at any position. This is an index *range*.
        Acts as `layout`, `layout[1]` .. `layout[4]`.
    </dd>
  </dl>

  When using a layout index *range* (`later`, `any`), the @ref rules-i-expansion "%i expansion"
  can be used in the `KccgstValue` to refer to the index of the matched layout.

- The order of values in a `Rule` must be the same as the `Mapping` it
  follows. The mapping line determines the meaning of the values in
  the rules which follow in the `RuleSet`.

- If a `Rule` is matched, **%-expansion** is performed on the
  `KccgstValue`, as follows:

  <dl>
    <dt><code>\%m</code>, <code>\%l</code>, <code>\%v</code></dt>
    <dd>
        The [model], [layout] or [variant], if *only one* was given
        (e.g. <code>\%l</code> for “us,il” is invalid).
    </dd>
    <dt>
        <code>\%l[1]</code>, <code>\%l[2]</code>, …,
        <code>\%v[1]</code>, <code>\%v[2]</code>, …
    </dt>
    <dd>
        [Layout][layout] or [variant] for the specified layout `Index`,
        if *more than one* was given, e.g.: <code>\%l[1]</code> is
        invalid for “us” but expands to “us” for “us,de”.
    </dd>
    <dt>
        `%+m`,
        `%+l`, `%+l[1]`, `%+l[2]`, …,
        `%+v`, `%+v[1]`, `%+v[2]`, …
    </dt>
    <dd>
        As above, but prefixed with ‘+’. Similarly, ‘|’, ‘^’, ‘-’, ‘_’ may be
        used instead of ‘+’. See the [merge mode] documentation for the
        special meaning of ‘+’, ‘|’ and ‘^’.
    </dd>
    <dt>
        `%(m)`,
        `%(l)`, `%(l[1])`, `%(l[2])`, …,
        `%(v)`, `%(v[1])`, `%(v[2])`, …
    </dt>
    <dd>
        As above, but prefixed by ‘(’ and suffixed by ‘)’.
    </dd>
    <dt>
        @anchor rules-i-expansion
        `:%%i`,
        `%%l[%%i]`,
        `%(l[%%i])`,
        etc.
    </dt>
    <dd>
        (Since version `1.8.0`)
        In case the mapping uses an @ref rules-extended-layout-indexes "extended layout index",
        `%%i` corresponds to the index of the matched layout.
    </dd>
  </dl>

  In case the expansion is *invalid*, as described above, it is *skipped*
  (the rest of the string is still processed); this includes the prefix
  and suffix. This is why one should use e.g. <code>%(v[1])</code>
  instead of <code>(\%v[1])</code>. See @ref rules-symbols-example for
  an illustration.

- @anchor rules-all-qualifier
  (Since version `1.8.0`) If a `Rule` is matched, the `:all` *qualifier* in the
  `KccgstValue` applies the qualified value (and its optional merge mode) to all
  layouts. If there is no merge mode, it defaults to *override* `+`.

  <table>
    <caption>Examples of `:all` qualified use</caption>
    <tr>
        <th>`KccgstValue`</th>
        <th>Layouts count</th>
        <th>Final `KccgstValue`</th>
    </tr>
    <tr>
        <td rowspan="2">`x:all`</td>
        <td>1</td>
        <td>`x:1`</td>
    </tr>
    <tr>
        <td>2</td>
        <td>`x:1+x:2`</td>
    </tr>
    <tr>
        <td rowspan="2">`+x:all`</td>
        <td>1</td>
        <td>`+x:1`</td>
    </tr>
    <tr>
        <td>3</td>
        <td>`+x:1+x:2+x:3`</td>
    </tr>
    <tr>
        <td rowspan="2">`|x:all`</td>
        <td>1</td>
        <td>`|x:1`</td>
    </tr>
    <tr>
        <td>4</td>
        <td>`|x:1|x:2|x:3|x:4`</td>
    </tr>
    <tr>
        <td rowspan="2">`x|y:all`</td>
        <td>1</td>
        <td>`x|y:1`</td>
    </tr>
    <tr>
        <td>3</td>
        <td>`x|y:1|y:2|y:3`</td>
    </tr>
    <tr>
        <td>`x:all+y|z:all`</td>
        <td>2</td>
        <td>`x:1+x:2+y|z:1|z:2`</td>
    </tr>
  </table>

# RMLVO resolution process {#rmlvo-resolution}

## Process

First of all, the rules *file* is extracted from the provided
[<em>R</em>MLVO][RMLVO] configuration (usually `evdev`). Then its path
is resolved and the file is parsed to get the [rule sets].

Then *each rule set* is checked against the provided [MLVO] configuration,
following their *order* in the rules file.

@important @anchor irrelevant-options-order Contrary to layouts and variants,
the *options order* in a [MLVO] configuration (e.g. via `xkbcli`) is irrelevant
for its resolution: only the order of the rules matters. See
“@ref rules-options-example ""” for an illustration.

If a [rule] matches in a @ref rule-set-def "rule set", then:

<!--
Using HTML list tags due to Doxygen Markdown limitation with tables
inside lists.
-->
<ol>
  <li>
    @anchor rules-kccgst-value-update
    The *KcCGST* value of the rule is used to update the [KcCGST]
    configuration, using the following instructions. Note that `foo`
    and `bar` are placeholders; ‘+’ specifies the *override* [merge mode]
    and can be replaced by ‘|’ or ‘^’ to specify respectively the *augment*
    or *replace* merge mode instead.

    | Rule value        | Old KcCGST value | New KcCGST value      |
    | ----------------- | ---------------- | --------------------- |
    | `bar`             |                  | `bar`                 |
    | `bar`             | `foo`            | `foo` (*skip* `bar`)  |
    | `bar`             | `+foo`           | `bar+foo` (*prepend*) |
    | `+bar`            |                  | `+bar`                |
    | `+bar`            | `foo`            | `foo+bar`             |
    | `+bar`            | `+foo`           | `+foo+bar`            |
  </li>
  <li>
    The rest of the set will be *skipped*, except if the set matches
    against [options]. Indeed, those may contain *multiple* legitimate
    rules, so they are processed entirely. See @ref rules-options-example
    for an illustration.
  </li>
</ol>

[value update]: @ref rules-kccgst-value-update
[merge mode]: @ref merge-mode-def

## Examples

### Example: key codes

Using the following example:

```c
! $jollamodels = jollasbj
! $azerty = be fr
! $qwertz = al ch cz de hr hu ro si sk

! model       = keycodes
 $jollamodels = evdev+jolla(jolla)
  olpc        = evdev+olpc(olpc)
  *           = evdev

! layout      = keycodes
 $azerty      = +aliases(azerty)
 $qwertz      = +aliases(qwertz)
  *           = +aliases(qwerty)
```

we would have the following resolutions of <em>[key codes]</em>:

| Model      | Layout   | Keycodes                             |
| ---------- | :------: | :----------------------------------- |
| `jollasbj` | `us`     | `evdev+jolla(jolla)+aliases(qwerty)` |
| `olpc`     | `be`     | `evdev+olpc(olpc)+aliases(azerty)`   |
| `pc`       | `al`     | `evdev+aliases(qwertz)`              |

### Example: layouts, variants and symbols {#rules-symbols-example}

Using the following example:

```c
! layout    = symbols
  *         = pc+%l%(v)
// The following would not work: syntax for *multiple* layouts
// in a rule set for *single* layout.
//*         = pc+%l[1]%(v[1])

! layout[1] = symbols
  *         = pc+%l[1]%(v[1])
// The following would not work: syntax for *single* layout
// in a rule set for *multiple* layouts.
//*         = pc+%l%(v)

! layout[2] = symbols
  *         = +%l[2]%(v[2]):2

! layout[3] = symbols
  *         = +%l[3]%(v[3]):3
```

we would have the following resolutions of <em>[symbols]</em>:

| Layout     | Variant      | Symbols                       | Rules sets used |
| ---------- | ------------ | ----------------------------- | --------------- |
| `us`       |              | `pc+us`                       | #1              |
| `us`       | `intl`       | `pc+us(intl)`                 | #1              |
| `us,es`    |              | `pc+us+es:2`                  | #2, #3          |
| `us,es,fr` | `intl,,bepo` | `pc+us(intl)+es:2+fr(bepo):3` | #2, #3, #4      |

Since version `1.8.0`, the previous code can be replaced with simply:

```c
! layout[first] = symbols
  *             = pc+%l[%i]%(v[%i])

! layout[later] = symbols
  *             = +%l[%i]%(v[%i]):%i
```

### Example: layout, option and symbols {#rules-options-example}

Using the following example:

```c
! $azerty = be fr

! layout = symbols
  *      = pc+%l%(v)

! layout[1] = symbols
  *         = pc+%l[1]%(v[1])

! layout[2] = symbols
  *         = +%l[2]%(v[2])
// Repeat the previous rules set with indexes 3 and 4

! layout     option          = symbols
 $azerty     caps:digits_row = +capslock(digits_row)
  *          misc:typo       = +typo(base)
  *          lv3:ralt_alt    = +level3(ralt_alt)

! layout[1]  option          = symbols
 $azerty     caps:digits_row = +capslock(digits_row):1
  *          misc:typo       = +typo(base):1
  *          lv3:ralt_alt    = +level3(ralt_alt):1
// Repeat the previous rules set for indexes 2 to 4
```

we would have the following resolutions of <em>[symbols]</em>:

| Layout  | Option                                   | Symbols                                                     |
| ------- | ---------------------------------------- | ----------------------------------------------------------- |
| `be`    | `caps:digits_row`                        | `pc+be+capslock(digits_row)`                                |
| `gb`    | `caps:digits_row`                        | `pc+gb`                                                     |
| `fr`    | `misc:typo`                              | `pc+fr+typo(base)`                                          |
| `fr`    | `misc:typo,caps:digits_row`              | `pc+fr+capslock(digits_row)+typo(base)`                     |
| `fr`    | `lv3:ralt_alt,caps:digits_row,misc:typo` | `pc+fr+capslock(digits_row)+typo(base)+level3(ralt_alt)`    |
| `fr,gb` | `caps:digits_row,misc:typo`              | `pc+fr+gb:2+capslock(digits_row)+typo(base):1+typo(base):2` |

Note that the configuration with `gb` [layout] has no match for the
[option] `caps:digits_row` and that the order of the [options] in the
[RMLVO] configuration has no influence on the resulting [symbols], as it
depends solely on their order in the rules.

Since version `1.8.0`, the previous code can be replaced with simply:

```c
! $azerty = be fr

! layout[first] = symbols
  *             = pc+%l[%i]%(v[%i])

! layout[later] = symbols
  *             = +%l[%i]%(v[%i])

! layout[any]  option          = symbols
 $azerty       caps:digits_row = +capslock(digits_row):%i

! option       = symbols
  misc:typo    = +typo(base):all
  lv3:ralt_alt = +level3(ralt_alt):all

// The previous is equivalent to:
! layout[any]  option       = symbols
  *            misc:typo    = +typo(base):%i
  *            lv3:ralt_alt = +level3(ralt_alt):%i
```
kc3-lang/libxkbcommon/doc/rules-format.md

Commit

kc3-lang/libxkbcommon /doc/rules-format.md