Edit

kc3-lang/libxkbcommon/doc/rules-format.md

Branch :

  • Show log

    Commit

  • Author : Pierre Le Marre
    Date : 2025-03-08 17:57:56
    Hash : 04220319
    Message : doc: Warn about the options order Options are always applied depending of their order in the relevant rules files, independently of their order in the relevant RMLVO config. Since this is counter-intuitive (e.g. when using `xkbcli`), let’s highlight this in the documentation.

  • doc/rules-format.md
  • The rules file {#rule-file-format}
    ==============
    
    The purpose of the rules file is to map between configuration values
    that are easy for a user to specify and understand, and the
    configuration values that the keymap compiler, `xkbcomp`, uses and
    understands. The following diagram presents an overview of this
    process. See the [XKB introduction] for further details on the
    components.
    
    @dotfile xkb-configuration "XKB keymap configurations"
    
    @tableofcontents{html:2}
    
    `libxkbcommon`’s keymap compiler `xkbcomp` uses the `xkb_component_names`
    struct internally, which maps directly to [include statements] of the
    appropriate [sections] \(called [KcCGST] for short):
    
    - [key codes],
    - [compatibility],
    - geometry ([not supported](@ref geometry-support) by xkbcommon),
    - [symbols],
    - [types].
    
    These are not really intuitive nor straightforward for the uninitiated.
    Instead, the user passes in a `xkb_rule_names` struct, which consists
    of the following fields (called [RMLVO] for short):
    
    - the name of a [rules] file (in Linux this is usually “evdev”),
    - a keyboard [model] \(e.g. “pc105”),
    - a set of [layouts][layout] (which will end up in different
      groups, e.g. “us,fr”),
    - a set of [variants][variant] (used to alter/augment the respective
      layout, e.g. “intl,dvorak”),
    - a set of [options] \(used to tweak some general
      behavior of the keyboard, e.g. “ctrl:nocaps,compose:menu” to make
      the Caps Lock key act like Ctrl and the Menu key like Compose).
    
    [KcCGST]: @ref KcCGST-intro
    [RMLVO]: @ref RMLVO-intro
    [MLVO]: @ref RMLVO-intro
    [XKB introduction]: @ref xkb-intro
    [include statements]: @ref xkb-include
    [sections]: @ref keymap-section-def
    [key codes]: @ref the-xkb_keycodes-section
    [compatibility]: @ref the-xkb_compat-section
    [symbols]: @ref the-xkb_symbols-section
    [types]: @ref the-xkb_types-section
    [rules]: @ref config-rules-def
    [model]: @ref config-model-def
    [layout]: @ref config-layout-def
    [variant]: @ref config-variant-def
    [option]: @ref config-options-def
    [options]: @ref config-options-def
    
    Format of the file
    ------------------
    @anchor rule-set-def
    @anchor rule-def
    The file consists of **rule sets**, each consisting of **rules** (one
    per line), which match the [MLVO] values on the left hand side, and,
    if the values match to the values the user passed in, results in the
    values on the right hand side being [added][value update] to the
    resulting [KcCGST]. See @ref rmlvo-resolution for further details.
    
    [rule set]: @ref rule-set-def
    [rule sets]: @ref rule-set-def
    [rule]: @ref rule-def
    
    ```c
    // This is a comment
    
    // The following line is a rule header.
    // It starts with ‘!’ and introduces a rules set.
    // It indicates that the rules map MLVO options to KcCGST symbols.
    ! option       = symbols
      // The following lines are rules that add symbols of the RHS when the
      // LHS matches an option.
      ctrl:nocaps  = +ctrl(nocaps)
      compose:menu = +compose(menu)
    
    // One may use multiple MLVO components on the LHS
    ! layout    option          = symbols
      be        caps:digits_row = +capslock(digits_row)
      fr        caps:digits_row = +capslock(digits_row)
    ```
    
    @anchor rules-group-def
    Since some values are related and repeated often, it is possible
    to *group* them together and refer to them by a **group name** in the
    rules.
    
    [group]: @ref rules-group-def
    
    ```c
    // Let’s rewrite the previous rules set using groups.
    // Groups starts with ‘$’.
    
    // Define a group for countries with AZERTY layouts
    ! $azerty = be fr
    
    // The following rule will match option `caps:digits_row` only for
    // layouts in the $azerty group, i.e. `fr` and `be`.
    ! layout    option          = symbols
     $azerty    caps:digits_row = +capslock(digits_row)
    ```
    
    @anchor rules-wildcard-def
    Along with matching values by simple string equality and for
    membership in a [group] defined previously, rules may also contain
    **wildcard** values “\*” with the following behavior:
    - For `model` and `options`: *always* match.
    - For `layout` and `variant`: match any *non-empty* value.
    These usually appear near the end of a rule set to set *default* values.
    
    ```c
    ! layout = keycodes
      // The following two lines only match exactly their respective groups.
     $azerty = +aliases(azerty)
     $qwertz = +aliases(qwertz)
      // This line will match layouts that are neither in $azerty nor in
      // $qwertz groups.
      *      = +aliases(qwerty)
    ```
    
    Grammar
    -------
    It is advised to look at a file like `rules/evdev` along with
    this grammar.
    
    @note Comments, whitespace, etc. are not shown.
    
    ```bnf
    File         ::= { "!" (Include | Group | RuleSet) }
    
    Include      ::= "include" <ident>
    
    Group        ::= GroupName "=" { GroupElement } "\n"
    GroupName    ::= "$"<ident>
    GroupElement ::= <ident>
    
    RuleSet      ::= Mapping { Rule }
    
    Mapping      ::= { Mlvo } "=" { Kccgst } "\n"
    Mlvo         ::= "model" | "option" | ("layout" | "variant") [ Index ]
    Index        ::= "[" ({ NumericIndex } | { SpecialIndex }) "]"
    NumericIndex ::= 1..XKB_MAX_GROUPS
    SpecialIndex ::= "single" | "first" | "later" | "any"
    Kccgst       ::= "keycodes" | "symbols" | "types" | "compat" | "geometry"
    
    Rule         ::= { MlvoValue } "=" { KccgstValue } "\n"
    MlvoValue    ::= "*" | GroupName | <ident>
    KccgstValue  ::= <ident> [ { Qualifier } ]
    Qualifier    ::= ":" ({ NumericIndex } | "all")
    ```
    
    <!--
    [WARNING]: Doxygen parsing is a mess. \% does not work as expected
    in Markdown code quotes, e.g. `\%H` gives `\H`. But using <code> tags
    or %%H seems to do the job though.
    -->
    @note
    - Include processes the rules in the file path specified in the `ident`,
      in order. **%-expansion** is performed, as follows:
      <dl>
        <dt>`%%`</dt>
        <dd>A literal %.</dd>
        <dt><code>\%H</code></dt>
        <dd>The value of the `$HOME` environment variable.</dd>
        <dt><code>\%E</code></dt>
        <dd>
            The extra lookup path for system-wide XKB data (usually
            `/etc/xkb/rules`).
        </dd>
        <dt><code>\%S</code></dt>
        <dd>
            The system-installed rules directory (usually
            `/usr/share/X11/xkb/rules`).
        </dd>
      </dl>
      **Note:** This feature is supported by libxkbcommon but not by the legacy X11
      tools.
    
    - @anchor rules-special-indexes
      (Since version `1.8.0`)
      The following *special indexes* can be used to avoid repetition and clarify
      the semantics:
    
      <dl>
        <dt>`single`</dt>
        <dd>
            Matches a single layout; `layout[single]` is the same as without
            explicit index: `layout`.
        </dd>
        <dt>`first`</dt>
        <dd>
            Matches the first layout/variant, no matter how many layouts are in
            the RMLVO configuration. Acts as both `layout` and `layout[1]`.
        </dd>
        <dt>`later`</dt>
        <dd>
            Matches all but the first layout. This is an index *range*.
            Acts as `layout[2]` .. `layout[4]`.
        </dd>
        <dt>any</dt>
        <dd>
            Matches layout at any position. This is an index *range*.
            Acts as `layout`, `layout[1]` .. `layout[4]`.
        </dd>
      </dl>
    
      When using a layout index *range* (`later`, `any`), the @ref rules-i-expansion "%i expansion"
      can be used in the `KccgstValue` to refer to the index of the matched layout.
    
    - The order of values in a `Rule` must be the same as the `Mapping` it
      follows. The mapping line determines the meaning of the values in
      the rules which follow in the `RuleSet`.
    
    - If a `Rule` is matched, **%-expansion** is performed on the
      `KccgstValue`, as follows:
    
      <dl>
        <dt><code>\%m</code>, <code>\%l</code>, <code>\%v</code></dt>
        <dd>
            The [model], [layout] or [variant], if *only one* was given
            (e.g. <code>\%l</code> for “us,il” is invalid).
        </dd>
        <dt>
            <code>\%l[1]</code>, <code>\%l[2]</code>, …,
            <code>\%v[1]</code>, <code>\%v[2]</code>, …
        </dt>
        <dd>
            [Layout][layout] or [variant] for the specified layout `Index`,
            if *more than one* was given, e.g.: <code>\%l[1]</code> is
            invalid for “us” but expands to “us” for “us,de”.
        </dd>
        <dt>
            `%+m`,
            `%+l`, `%+l[1]`, `%+l[2]`, …,
            `%+v`, `%+v[1]`, `%+v[2]`, …
        </dt>
        <dd>
            As above, but prefixed with ‘+’. Similarly, ‘|’, ‘-’, ‘_’ may be
            used instead of ‘+’. See the [merge mode] documentation for the
            special meaning of ‘+’ and ‘|’.
        </dd>
        <dt>
            `%(m)`,
            `%(l)`, `%(l[1])`, `%(l[2])`, …,
            `%(v)`, `%(v[1])`, `%(v[2])`, …
        </dt>
        <dd>
            As above, but prefixed by ‘(’ and suffixed by ‘)’.
        </dd>
        <dt>
            @anchor rules-i-expansion
            `:%%i`,
            `%%l[%%i]`,
            `%(l[%%i])`,
            etc.
        </dt>
        <dd>
            (Since version `1.8.0`)
            In case the mapping uses a @ref rules-special-indexes "special index",
            `%%i` corresponds to the index of the matched layout.
        </dd>
      </dl>
    
      In case the expansion is *invalid*, as described above, it is *skipped*
      (the rest of the string is still processed); this includes the prefix
      and suffix. This is why one should use e.g. <code>%(v[1])</code>
      instead of <code>(\%v[1])</code>. See @ref rules-symbols-example for
      an illustration.
    
    - @anchor rules-all-qualifier
      (Since version `1.8.0`) If a `Rule` is matched, the `:all` *qualifier* in the
      `KccgstValue` applies the qualified value (and its optional merge mode) to all
      layouts. If there is no merge mode, it defaults to *override* `+`.
    
      <table>
        <caption>Examples of `:all` qualified use</caption>
        <tr>
            <th>`KccgstValue`</th>
            <th>Layouts count</th>
            <th>Final `KccgstValue`</th>
        </tr>
        <tr>
            <td rowspan="2">`x:all`</td>
            <td>1</td>
            <td>`x:1`</td>
        </tr>
        <tr>
            <td>2</td>
            <td>`x:1+x:2`</td>
        </tr>
        <tr>
            <td rowspan="2">`+x:all`</td>
            <td>1</td>
            <td>`+x:1`</td>
        </tr>
        <tr>
            <td>3</td>
            <td>`+x:1+x:2+x:3`</td>
        </tr>
        <tr>
            <td rowspan="2">`|x:all`</td>
            <td>1</td>
            <td>`|x:1`</td>
        </tr>
        <tr>
            <td>4</td>
            <td>`|x:1|x:2|x:3|x:4`</td>
        </tr>
        <tr>
            <td rowspan="2">`x|y:all`</td>
            <td>1</td>
            <td>`x|y:1`</td>
        </tr>
        <tr>
            <td>3</td>
            <td>`x|y:1|y:2|y:3`</td>
        </tr>
        <tr>
            <td>`x:all+y|z:all`</td>
            <td>2</td>
            <td>`x:1+x:2+y|z:1|z:2`</td>
        </tr>
      </table>
    
    RMLVO resolution process {#rmlvo-resolution}
    ------------------------
    
    First of all, the rules *file* is extracted from the provided
    [<em>R</em>MLVO][RMLVO] configuration (usually `evdev`). Then its path
    is resolved and the file is parsed to get the [rule sets].
    
    Then *each rule set* is checked against the provided [MLVO] configuration,
    following their *order* in the rules file.
    
    @important @anchor irrelevant-options-order Contrary to layouts and variants,
    the *options order* in a [MLVO] configuration (e.g. via `xkbcli`) is irrelevant
    for its resolution: only the order of the rules matters. See
    “@ref rules-options-example ""” for an illustration.
    
    If a [rule] matches in a @ref rule-set-def "rule set", then:
    
    <!--
    Using HTML list tags due to Doxygen Markdown limitation with tables
    inside lists.
    -->
    <ol>
      <li>
        @anchor rules-kccgst-value-update
        The *KcCGST* value of the rule is used to update the [KcCGST]
        configuration, using the following instructions. Note that `foo`
        and `bar` are placeholders; ‘+’ specifies the *override* [merge mode]
        and can be replaced by ‘|’ to specify the *augment* merge mode instead.
    
        | Rule value        | Old KcCGST value | New KcCGST value      |
        | ----------------- | ---------------- | --------------------- |
        | `bar`             |                  | `bar`                 |
        | `bar`             | `foo`            | `foo` (*skip* `bar`)  |
        | `bar`             | `+foo`           | `bar+foo` (*prepend*) |
        | `+bar`            |                  | `+bar`                |
        | `+bar`            | `foo`            | `foo+bar`             |
        | `+bar`            | `+foo`           | `+foo+bar`            |
      </li>
      <li>
        The rest of the set will be *skipped*, except if the set matches
        against [options]. Indeed, those may contain *multiple* legitimate
        rules, so they are processed entirely. See @ref rules-options-example
        for an illustration.
      </li>
    </ol>
    
    [value update]: @ref rules-kccgst-value-update
    [merge mode]: @ref merge-mode-def
    
    ### Example: key codes
    
    Using the following example:
    
    ```c
    ! $jollamodels = jollasbj
    ! $azerty = be fr
    ! $qwertz = al ch cz de hr hu ro si sk
    
    ! model       = keycodes
     $jollamodels = evdev+jolla(jolla)
      olpc        = evdev+olpc(olpc)
      *           = evdev
    
    ! layout      = keycodes
     $azerty      = +aliases(azerty)
     $qwertz      = +aliases(qwertz)
      *           = +aliases(qwerty)
    ```
    
    we would have the following resolutions of <em>[key codes]</em>:
    
    | Model      | Layout   | Keycodes                             |
    | ---------- | :------: | :----------------------------------- |
    | `jollasbj` | `us`     | `evdev+jolla(jolla)+aliases(qwerty)` |
    | `olpc`     | `be`     | `evdev+olpc(olpc)+aliases(azerty)`   |
    | `pc`       | `al`     | `evdev+aliases(qwertz)`              |
    
    ### Example: layouts, variants and symbols {#rules-symbols-example}
    
    Using the following example:
    
    ```c
    ! layout    = symbols
      *         = pc+%l%(v)
    // The following would not work: syntax for *multiple* layouts
    // in a rule set for *single* layout.
    //*         = pc+%l[1]%(v[1])
    
    ! layout[1] = symbols
      *         = pc+%l[1]%(v[1])
    // The following would not work: syntax for *single* layout
    // in a rule set for *multiple* layouts.
    //*         = pc+%l%(v)
    
    ! layout[2] = symbols
      *         = +%l[2]%(v[2]):2
    
    ! layout[3] = symbols
      *         = +%l[3]%(v[3]):3
    ```
    
    we would have the following resolutions of <em>[symbols]</em>:
    
    | Layout     | Variant      | Symbols                       | Rules sets used |
    | ---------- | ------------ | ----------------------------- | --------------- |
    | `us`       |              | `pc+us`                       | #1              |
    | `us`       | `intl`       | `pc+us(intl)`                 | #1              |
    | `us,es`    |              | `pc+us+es:2`                  | #2, #3          |
    | `us,es,fr` | `intl,,bepo` | `pc+us(intl)+es:2+fr(bepo):3` | #2, #3, #4      |
    
    Since version `1.8.0`, the previous code can be replaced with simply:
    
    ```c
    ! layout[first] = symbols
      *             = pc+%l[%i]%(v[%i])
    
    ! layout[later] = symbols
      *             = +%l[%i]%(v[%i]):%i
    ```
    
    ### Example: layout, option and symbols {#rules-options-example}
    
    Using the following example:
    
    ```c
    ! $azerty = be fr
    
    ! layout = symbols
      *      = pc+%l%(v)
    
    ! layout[1] = symbols
      *         = pc+%l[1]%(v[1])
    
    ! layout[2] = symbols
      *         = +%l[2]%(v[2])
    // Repeat the previous rules set with indexes 3 and 4
    
    ! layout     option          = symbols
     $azerty     caps:digits_row = +capslock(digits_row)
      *          misc:typo       = +typo(base)
      *          lv3:ralt_alt    = +level3(ralt_alt)
    
    ! layout[1]  option          = symbols
     $azerty     caps:digits_row = +capslock(digits_row):1
      *          misc:typo       = +typo(base):1
      *          lv3:ralt_alt    = +level3(ralt_alt):1
    // Repeat the previous rules set for indexes 2 to 4
    ```
    
    we would have the following resolutions of <em>[symbols]</em>:
    
    | Layout  | Option                                   | Symbols                                                     |
    | ------- | ---------------------------------------- | ----------------------------------------------------------- |
    | `be`    | `caps:digits_row`                        | `pc+be+capslock(digits_row)`                                |
    | `gb`    | `caps:digits_row`                        | `pc+gb`                                                     |
    | `fr`    | `misc:typo`                              | `pc+fr+typo(base)`                                          |
    | `fr`    | `misc:typo,caps:digits_row`              | `pc+fr+capslock(digits_row)+typo(base)`                     |
    | `fr`    | `lv3:ralt_alt,caps:digits_row,misc:typo` | `pc+fr+capslock(digits_row)+typo(base)+level3(ralt_alt)`    |
    | `fr,gb` | `caps:digits_row,misc:typo`              | `pc+fr+gb:2+capslock(digits_row)+typo(base):1+typo(base):2` |
    
    Note that the configuration with `gb` [layout] has no match for the
    [option] `caps:digits_row` and that the order of the [options] in the
    [RMLVO] configuration has no influence on the resulting [symbols], as it
    depends solely on their order in the rules.
    
    Since version `1.8.0`, the previous code can be replaced with simply:
    
    ```c
    ! $azerty = be fr
    
    ! layout[first] = symbols
      *             = pc+%l[%i]%(v[%i])
    
    ! layout[later] = symbols
      *             = +%l[%i]%(v[%i])
    
    ! layout[any]  option          = symbols
     $azerty       caps:digits_row = +capslock(digits_row):%i
    
    ! option       = symbols
      misc:typo    = +typo(base):all
      lv3:ralt_alt = +level3(ralt_alt):all
    
    // The previous is equivalent to:
    ! layout[any]  option       = symbols
      *            misc:typo    = +typo(base):%i
      *            lv3:ralt_alt = +level3(ralt_alt):%i
    ```