Edit

kc3-lang/libxkbcommon/doc/rules-format.md

Branch :

  • Show log

    Commit

  • Author : Pierre Le Marre
    Date : 2025-06-10 17:33:24
    Hash : f7a61da7
    Message : doc: Update new layout count ranges

  • doc/rules-format.md
  • The rules file {#rule-file-format}
    ==============
    
    The purpose of the rules file is to map between configuration values
    that are easy for a user to specify and understand, and the
    configuration values that the keymap compiler, `xkbcomp`, uses and
    understands. The following diagram presents an overview of this
    process. See the [XKB introduction] for further details on the
    components.
    
    @dotfile xkb-configuration "XKB keymap configurations"
    
    @tableofcontents{html:2}
    
    `libxkbcommon`’s keymap compiler `xkbcomp` uses the `xkb_component_names`
    struct internally, which maps directly to [include statements] of the
    appropriate [sections] \(called [KcCGST] for short):
    
    - [key codes],
    - [compatibility],
    - geometry ([not supported](@ref geometry-support) by xkbcommon),
    - [symbols],
    - [types].
    
    These are not really intuitive nor straightforward for the uninitiated.
    Instead, the user passes in a `xkb_rule_names` struct, which consists
    of the following fields (called [RMLVO] for short):
    
    - the name of a [rules] file (in Linux this is usually “evdev”),
    - a keyboard [model] \(e.g. “pc105”),
    - a set of [layouts][layout] (which will end up in different
      groups, e.g. “us,fr”),
    - a set of [variants][variant] (used to alter/augment the respective
      layout, e.g. “intl,dvorak”),
    - a set of [options] \(used to tweak some general
      behavior of the keyboard, e.g. “ctrl:nocaps,compose:menu” to make
      the Caps Lock key act like Ctrl and the Menu key like Compose).
    
    [KcCGST]: @ref KcCGST-intro
    [RMLVO]: @ref RMLVO-intro
    [MLVO]: @ref RMLVO-intro
    [XKB introduction]: @ref xkb-intro
    [include statements]: @ref xkb-include
    [sections]: @ref keymap-section-def
    [key codes]: @ref the-xkb_keycodes-section
    [compatibility]: @ref the-xkb_compat-section
    [symbols]: @ref the-xkb_symbols-section
    [types]: @ref the-xkb_types-section
    [rules]: @ref config-rules-def
    [model]: @ref config-model-def
    [layout]: @ref config-layout-def
    [variant]: @ref config-variant-def
    [option]: @ref config-options-def
    [options]: @ref config-options-def
    
    # Format of the file
    
    ## Rules and rule sets {#rule-def}
    
    @anchor rule-set-def
    The file consists of **rule sets**, each consisting of **rules** (one
    per line), which match the [MLVO] values on the left hand side, and,
    if the values match to the values the user passed in, results in the
    values on the right hand side being [added][value update] to the
    resulting [KcCGST]. See @ref rmlvo-resolution for further details.
    
    [rule set]: @ref rule-set-def
    [rule sets]: @ref rule-set-def
    [rule]: @ref rule-def
    
    ```c
    // This is a comment
    
    // The following line is a rule header.
    // It starts with ‘!’ and introduces a rules set.
    // It indicates that the rules map MLVO options to KcCGST symbols.
    ! option       = symbols
      // The following lines are rules that add symbols of the RHS when the
      // LHS matches an option.
      ctrl:nocaps  = +ctrl(nocaps)
      compose:menu = +compose(menu)
    
    // One may use multiple MLVO components on the LHS
    ! layout    option          = symbols
      be        caps:digits_row = +capslock(digits_row)
      fr        caps:digits_row = +capslock(digits_row)
    ```
    
    ## Groups {#rules-group-def}
    
    Since some values are related and repeated often, it is possible
    to *group* them together and refer to them by a **group name** in the
    rules.
    
    [group]: @ref rules-group-def
    
    ```c
    // Let’s rewrite the previous rules set using groups.
    // Groups starts with ‘$’.
    
    // Define a group for countries with AZERTY layouts
    ! $azerty = be fr
    
    // The following rule will match option `caps:digits_row` only for
    // layouts in the $azerty group, i.e. `fr` and `be`.
    ! layout    option          = symbols
     $azerty    caps:digits_row = +capslock(digits_row)
    ```
    
    ## Wild cards {#rules-wildcard-def}
    
    Along with matching values by simple string equality and for
    membership in a [group] defined previously, rules may also contain
    **wild card** values with the following behavior:
    
    <dl>
    <dt>* @anchor rules-wildcard-legacy-def</dt>
    <dd>
    
    Legacy wild card:
    - For `model` and `options`: *always* match.
    - For `layout` and `variant`: match any *non-empty* value.
    
    This wild card usually appears near the end of a rule set to set *default* values.
    
    @note Prefer using the wild cards @ref rules-wildcard-some-def "\<some\>" or
    @ref rules-wildcard-any-def "\<any\>" for their simpler semantics, as it does not
    depend on the context.
    </dd>
    <dt>\<none\> @anchor rules-wildcard-none-def</dt>
    <dd>
    
    Match *empty* values
    
    @since 1.9.0
    </dd>
    <dt>\<some\> @anchor rules-wildcard-some-def</dt>
    <dd>
    
    Match *non-empty* value
    
    @since 1.9.0
    </dd>
    <dt>\<any\> @anchor rules-wildcard-any-def</dt>
    <dd>
    
    Match *any* (optionally empty) value. Its behavior does not depend on the
    context, contrary to the legacy wild card @ref rules-wildcard-legacy-def "*".
    
    This wild card usually appears near the end of a rule set to set *default* values.
    
    @since 1.9.0
    </dd>
    </dl>
    
    ```c
    ! layout = keycodes
      // The following two lines only match exactly their respective groups.
     $azerty = +aliases(azerty)
     $qwertz = +aliases(qwertz)
      // This line will match layouts that are neither in $azerty nor in
      // $qwertz groups.
      *      = +aliases(qwerty)
    ```
    
    # Grammar
    
    It is advised to look at a file like `rules/evdev` along with
    this grammar.
    
    @note Comments, whitespace, etc. are not shown.
    
    ```bnf
    File         ::= { "!" (Include | Group | RuleSet) }
    
    Include      ::= "include" <ident>
    
    Group        ::= GroupName "=" { GroupElement } "\n"
    GroupName    ::= "$"<ident>
    GroupElement ::= <ident>
    
    RuleSet      ::= Mapping { Rule }
    
    Mapping      ::= { Mlvo } "=" { Kccgst } "\n"
    Mlvo         ::= "model" | "option" | ("layout" | "variant") [ Index ]
    Index        ::= "[" ({ NumericIndex } | { SpecialIndex }) "]"
    NumericIndex ::= 1..XKB_MAX_GROUPS
    SpecialIndex ::= "single" | "first" | "later" | "any"
    Kccgst       ::= "keycodes" | "symbols" | "types" | "compat" | "geometry"
    
    Rule         ::= { MlvoValue } "=" { KccgstValue } "\n"
    MlvoValue    ::= "*" | "<none>" | "<some>" | "<any>" | GroupName | <ident>
    KccgstValue  ::= <ident> [ { Qualifier } ]
    Qualifier    ::= ":" ({ NumericIndex } | "all")
    ```
    
    <!--
    [WARNING]: Doxygen parsing is a mess. \% does not work as expected
    in Markdown code quotes, e.g. `\%H` gives `\H`. But using <code> tags
    or %%H seems to do the job though.
    -->
    @note
    - Include processes the rules in the file path specified in the `ident`,
      in order. **%-expansion** is performed, as follows: @anchor rules-include-expansion
      <dl>
        <dt>`%%`</dt>
        <dd>A literal %.</dd>
        <dt><code>\%H</code></dt>
        <dd>The value of the `$HOME` environment variable.</dd>
        <dt><code>\%E</code></dt>
        <dd>
            The extra lookup path for system-wide XKB data (usually
            `/etc/xkb/rules`).
        </dd>
        <dt><code>\%S</code></dt>
        <dd>
            The system-installed rules directory (usually
            `/usr/share/X11/xkb/rules`).
        </dd>
      </dl>
      **Note:** This feature is supported by libxkbcommon but not by the legacy X11
      tools.
    
    - @anchor rules-extended-layout-indexes
      (Since version `1.8.0`)
      The following *extended layout indexes* can be used to avoid repetition and
      clarify the semantics:
    
      <dl>
        <dt>`single`</dt>
        <dd>
            Matches a single layout; `layout[single]` is the same as without
            explicit index: `layout`.
        </dd>
        <dt>`first`</dt>
        <dd>
            Matches the first layout/variant, no matter how many layouts are in
            the RMLVO configuration. Acts as both `layout` and `layout[1]`.
        </dd>
        <dt>`later`</dt>
        <dd>
            Matches all but the first layout. This is an index *range*.
            Acts as `layout[2]` .. `layout[4]`.
        </dd>
        <dt>any</dt>
        <dd>
            Matches layout at any position. This is an index *range*.
            Acts as `layout`, `layout[1]` .. `layout[4]`.
        </dd>
      </dl>
    
      When using a layout index *range* (`later`, `any`), the @ref rules-i-expansion "%i expansion"
      can be used in the `KccgstValue` to refer to the index of the matched layout.
    
    - The order of values in a `Rule` must be the same as the `Mapping` it
      follows. The mapping line determines the meaning of the values in
      the rules which follow in the `RuleSet`.
    
    - If a `Rule` is matched, **%-expansion** is performed on the
      `KccgstValue`, as follows:
    
      <dl>
        <dt><code>\%m</code>, <code>\%l</code>, <code>\%v</code></dt>
        <dd>
            The [model], [layout] or [variant], if *only one* was given
            (e.g. <code>\%l</code> for “us,il” is invalid).
        </dd>
        <dt>
            <code>\%l[1]</code>, <code>\%l[2]</code>, …,
            <code>\%v[1]</code>, <code>\%v[2]</code>, …
        </dt>
        <dd>
            [Layout][layout] or [variant] for the specified layout `Index`,
            if *more than one* was given, e.g.: <code>\%l[1]</code> is
            invalid for “us” but expands to “us” for “us,de”.
        </dd>
        <dt>
            `%+m`,
            `%+l`, `%+l[1]`, `%+l[2]`, …,
            `%+v`, `%+v[1]`, `%+v[2]`, …
        </dt>
        <dd>
            As above, but prefixed with ‘+’. Similarly, ‘|’, ‘^’, ‘-’, ‘_’ may be
            used instead of ‘+’. See the [merge mode] documentation for the
            special meaning of ‘+’, ‘|’ and ‘^’.
        </dd>
        <dt>
            `%(m)`,
            `%(l)`, `%(l[1])`, `%(l[2])`, …,
            `%(v)`, `%(v[1])`, `%(v[2])`, …
        </dt>
        <dd>
            As above, but prefixed by ‘(’ and suffixed by ‘)’.
        </dd>
        <dt>
            @anchor rules-i-expansion
            `:%%i`,
            `%%l[%%i]`,
            `%(l[%%i])`,
            etc.
        </dt>
        <dd>
            (Since version `1.8.0`)
            In case the mapping uses an @ref rules-extended-layout-indexes "extended layout index",
            `%%i` corresponds to the index of the matched layout.
        </dd>
      </dl>
    
      In case the expansion is *invalid*, as described above, it is *skipped*
      (the rest of the string is still processed); this includes the prefix
      and suffix. This is why one should use e.g. <code>%(v[1])</code>
      instead of <code>(\%v[1])</code>. See @ref rules-symbols-example for
      an illustration.
    
    - @anchor rules-all-qualifier
      (Since version `1.8.0`) If a `Rule` is matched, the `:all` *qualifier* in the
      `KccgstValue` applies the qualified value (and its optional merge mode) to all
      layouts. If there is no merge mode, it defaults to *override* `+`.
    
      <table>
        <caption>Examples of `:all` qualified use</caption>
        <tr>
            <th>`KccgstValue`</th>
            <th>Layouts count</th>
            <th>Final `KccgstValue`</th>
        </tr>
        <tr>
            <td rowspan="2">`x:all`</td>
            <td>1</td>
            <td>`x:1`</td>
        </tr>
        <tr>
            <td>2</td>
            <td>`x:1+x:2`</td>
        </tr>
        <tr>
            <td rowspan="2">`+x:all`</td>
            <td>1</td>
            <td>`+x:1`</td>
        </tr>
        <tr>
            <td>3</td>
            <td>`+x:1+x:2+x:3`</td>
        </tr>
        <tr>
            <td rowspan="2">`|x:all`</td>
            <td>1</td>
            <td>`|x:1`</td>
        </tr>
        <tr>
            <td>4</td>
            <td>`|x:1|x:2|x:3|x:4`</td>
        </tr>
        <tr>
            <td rowspan="2">`x|y:all`</td>
            <td>1</td>
            <td>`x|y:1`</td>
        </tr>
        <tr>
            <td>3</td>
            <td>`x|y:1|y:2|y:3`</td>
        </tr>
        <tr>
            <td>`x:all+y|z:all`</td>
            <td>2</td>
            <td>`x:1+x:2+y|z:1|z:2`</td>
        </tr>
      </table>
    
    # RMLVO resolution process {#rmlvo-resolution}
    
    ## Process
    
    First of all, the rules *file* is extracted from the provided
    [<em>R</em>MLVO][RMLVO] configuration (usually `evdev`). Then its path
    is resolved and the file is parsed to get the [rule sets].
    
    Then *each rule set* is checked against the provided [MLVO] configuration,
    following their *order* in the rules file.
    
    @important @anchor irrelevant-options-order Contrary to layouts and variants,
    the *options order* in a [MLVO] configuration (e.g. via `xkbcli`) is irrelevant
    for its resolution: only the order of the rules matters. See
    “@ref rules-options-example ""” for an illustration.
    
    If a [rule] matches in a @ref rule-set-def "rule set", then:
    
    <!--
    Using HTML list tags due to Doxygen Markdown limitation with tables
    inside lists.
    -->
    <ol>
      <li>
        @anchor rules-kccgst-value-update
        The *KcCGST* value of the rule is used to update the [KcCGST]
        configuration, using the following instructions. Note that `foo`
        and `bar` are placeholders; ‘+’ specifies the *override* [merge mode]
        and can be replaced by ‘|’ or ‘^’ to specify respectively the *augment*
        or *replace* merge mode instead.
    
        | Rule value        | Old KcCGST value | New KcCGST value      |
        | ----------------- | ---------------- | --------------------- |
        | `bar`             |                  | `bar`                 |
        | `bar`             | `foo`            | `foo` (*skip* `bar`)  |
        | `bar`             | `+foo`           | `bar+foo` (*prepend*) |
        | `+bar`            |                  | `+bar`                |
        | `+bar`            | `foo`            | `foo+bar`             |
        | `+bar`            | `+foo`           | `+foo+bar`            |
      </li>
      <li>
        The rest of the set will be *skipped*, except if the set matches
        against [options]. Indeed, those may contain *multiple* legitimate
        rules, so they are processed entirely. See @ref rules-options-example
        for an illustration.
      </li>
    </ol>
    
    [value update]: @ref rules-kccgst-value-update
    [merge mode]: @ref merge-mode-def
    
    ## Examples
    
    ### Example: key codes
    
    Using the following example:
    
    ```c
    ! $jollamodels = jollasbj
    ! $azerty = be fr
    ! $qwertz = al ch cz de hr hu ro si sk
    
    ! model       = keycodes
     $jollamodels = evdev+jolla(jolla)
      olpc        = evdev+olpc(olpc)
      *           = evdev
    
    ! layout      = keycodes
     $azerty      = +aliases(azerty)
     $qwertz      = +aliases(qwertz)
      *           = +aliases(qwerty)
    ```
    
    we would have the following resolutions of <em>[key codes]</em>:
    
    | Model      | Layout   | Keycodes                             |
    | ---------- | :------: | :----------------------------------- |
    | `jollasbj` | `us`     | `evdev+jolla(jolla)+aliases(qwerty)` |
    | `olpc`     | `be`     | `evdev+olpc(olpc)+aliases(azerty)`   |
    | `pc`       | `al`     | `evdev+aliases(qwertz)`              |
    
    ### Example: layouts, variants and symbols {#rules-symbols-example}
    
    Using the following example:
    
    ```c
    ! layout    = symbols
      *         = pc+%l%(v)
    // The following would not work: syntax for *multiple* layouts
    // in a rule set for *single* layout.
    //*         = pc+%l[1]%(v[1])
    
    ! layout[1] = symbols
      *         = pc+%l[1]%(v[1])
    // The following would not work: syntax for *single* layout
    // in a rule set for *multiple* layouts.
    //*         = pc+%l%(v)
    
    ! layout[2] = symbols
      *         = +%l[2]%(v[2]):2
    
    ! layout[3] = symbols
      *         = +%l[3]%(v[3]):3
    ```
    
    we would have the following resolutions of <em>[symbols]</em>:
    
    | Layout     | Variant      | Symbols                       | Rules sets used |
    | ---------- | ------------ | ----------------------------- | --------------- |
    | `us`       |              | `pc+us`                       | #1              |
    | `us`       | `intl`       | `pc+us(intl)`                 | #1              |
    | `us,es`    |              | `pc+us+es:2`                  | #2, #3          |
    | `us,es,fr` | `intl,,bepo` | `pc+us(intl)+es:2+fr(bepo):3` | #2, #3, #4      |
    
    Since version `1.8.0`, the previous code can be replaced with simply:
    
    ```c
    ! layout[first] = symbols
      *             = pc+%l[%i]%(v[%i])
    
    ! layout[later] = symbols
      *             = +%l[%i]%(v[%i]):%i
    ```
    
    ### Example: layout, option and symbols {#rules-options-example}
    
    Using the following example:
    
    ```c
    ! $azerty = be fr
    
    ! layout = symbols
      *      = pc+%l%(v)
    
    ! layout[1] = symbols
      *         = pc+%l[1]%(v[1])
    
    ! layout[2] = symbols
      *         = +%l[2]%(v[2])
    // Repeat the previous rules set with indexes 3 and 4
    
    ! layout     option          = symbols
     $azerty     caps:digits_row = +capslock(digits_row)
      *          misc:typo       = +typo(base)
      *          lv3:ralt_alt    = +level3(ralt_alt)
    
    ! layout[1]  option          = symbols
     $azerty     caps:digits_row = +capslock(digits_row):1
      *          misc:typo       = +typo(base):1
      *          lv3:ralt_alt    = +level3(ralt_alt):1
    // Repeat the previous rules set for indexes 2 to 4
    ```
    
    we would have the following resolutions of <em>[symbols]</em>:
    
    | Layout  | Option                                   | Symbols                                                     |
    | ------- | ---------------------------------------- | ----------------------------------------------------------- |
    | `be`    | `caps:digits_row`                        | `pc+be+capslock(digits_row)`                                |
    | `gb`    | `caps:digits_row`                        | `pc+gb`                                                     |
    | `fr`    | `misc:typo`                              | `pc+fr+typo(base)`                                          |
    | `fr`    | `misc:typo,caps:digits_row`              | `pc+fr+capslock(digits_row)+typo(base)`                     |
    | `fr`    | `lv3:ralt_alt,caps:digits_row,misc:typo` | `pc+fr+capslock(digits_row)+typo(base)+level3(ralt_alt)`    |
    | `fr,gb` | `caps:digits_row,misc:typo`              | `pc+fr+gb:2+capslock(digits_row)+typo(base):1+typo(base):2` |
    
    Note that the configuration with `gb` [layout] has no match for the
    [option] `caps:digits_row` and that the order of the [options] in the
    [RMLVO] configuration has no influence on the resulting [symbols], as it
    depends solely on their order in the rules.
    
    Since version `1.8.0`, the previous code can be replaced with simply:
    
    ```c
    ! $azerty = be fr
    
    ! layout[first] = symbols
      *             = pc+%l[%i]%(v[%i])
    
    ! layout[later] = symbols
      *             = +%l[%i]%(v[%i])
    
    ! layout[any]  option          = symbols
     $azerty       caps:digits_row = +capslock(digits_row):%i
    
    ! option       = symbols
      misc:typo    = +typo(base):all
      lv3:ralt_alt = +level3(ralt_alt):all
    
    // The previous is equivalent to:
    ! layout[any]  option       = symbols
      *            misc:typo    = +typo(base):%i
      *            lv3:ralt_alt = +level3(ralt_alt):%i
    ```