NamesList.html

Branch :
Show log
Commit
Author : Thomas de Grivel
Date : 2025-10-16 13:30:28
Hash : 427ac856
Message : make
UCD/NamesList.html
<!doctype html> 

<html lang="en-us">

<head>
<meta charset="utf-8">
<title>Unicode NamesList Format</title>
<link rel="stylesheet" type="text/css" href="https://www.unicode.org/reports/reports-v2.css">
<style>
a.headernav {
    font-size: 90%;
}
a.headernav:link {
    color: white;
}
a.headernav:visited {
    color: white;
}
a.headernav:active {
    color: white;
}
a.headernav:hover {
    color: #B0B0B0;
}
.pageheader {
    margin-top: 0; 
    padding: 0 .5em 0 0;
    display: flex; 
    flex-direction: row;
    flex-wrap: nowrap;
    justify-content: flex-start;
    background-color: #5555FF;
    color: white;
    font-family: arial, geneva, sans-serif;  
    font-weight:bold;
    align-items: center;
    }
.pageicon {
    padding : 2px 4px 0 2px;
    }
.pagelogo {
    height: 33px; width: 34px;
    border: 0;
    padding-bottom: 0px;
    margin-bottom:-2px;
    }
.pagetitle {
    font-size: 115%; 
    flex-grow: 4;
    padding-left: 1em;
    }
.headernav { padding-top: 0px;
    font-weight: bold;
    font-size: 100%;
    color: white;  font-family: arial, geneva, sans-serif;
    text-align:right;
    }
.graybar {
    width: 100%;padding:0; 
    font-size:50%;
    background-color: #EEEEFE;
    }
.pagecontents {
    padding-left: 3.25em; 
    padding-right: 3.25em;
    padding-bottom: 1.75em;
    padding-top: 1em;
}
.pagebottom img
{
   padding-top: 2px;
   width:216px; 
   height:50px;
   border: 0;
}
.pagebottom
{
   margin: auto;
   text-align:center;
}
</style>
</head>

<body>

  <div class="pageheader">
      <div class="pageicon"><a href="https://www.unicode.org/"><img class="pagelogo"
      src="https://www.unicode.org/webscripts/logo60s2.gif" 
      alt="[Unicode]" ></a></div>

      <div class="pagetitle"><a class="headernav" 
      href="https://www.unicode.org/ucd/">Unicode Character Database</a></div>

  </div>
  <div class="graybar">&nbsp;</div>

<div class="body">
  <h1>Unicode® NamesList File Format</h1>   
  <table class="simple">
    <tbody>
      <tr>
        <td>Revision</td>
        <td>17.0.0</td>
      </tr>
      <tr>
        <td>Editors</td>
        <td>Asmus Freytag, Ken Whistler</td>
      </tr>
      <tr>
        <td>Date</td>
        <td>2025-08-05</td>
      </tr>
      <tr>
        <td>This Version</td>
        <td >
		<a href="https://www.unicode.org/Public/17.0.0/ucd/NamesList.html">
		https://www.unicode.org/Public/17.0.0/ucd/NamesList.html</a></td>
      </tr>
      <tr>
        <td>Previous Version</td>
        <td>
		<a href="https://www.unicode.org/Public/16.0.0/ucd/NamesList.html">
		https://www.unicode.org/Public/16.0.0/ucd/NamesList.html</a></td>
      </tr>
      <tr>
        <td>Latest Version</td>
        <td><a href="https://www.unicode.org/Public/UCD/latest/ucd/NamesList.html">https://www.unicode.org/Public/UCD/latest/ucd/NamesList.html</a></td>
      </tr>
    </tbody>
  </table>
  <p>&nbsp;</p>
  <h3><i>Summary</i></h3>
  <blockquote>
    <p>This file describes the format and contents of NamesList.txt</p>
  </blockquote>
  <h3><i>Status</i></h3>
  <blockquote>
    <p><i>The file and the files described herein are part of the <a href="https://www.unicode.org/ucd/">Unicode 
    Character Database</a> (UCD). The Unicode <a href="https://www.unicode.org/terms_of_use.html"> 
    Terms of Use</a> apply.</i></p>
  </blockquote>
  <hr style="width:50%">

<h2 id="Introduction">1.0 <a href="#Introduction">Introduction</a></h2>

<p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain
text file used to drive the layout of the character code charts in the Unicode
Standard. The information in this file is a combination of several fields from
the UnicodeData.txt and Blocks.txt files, together with additional annotations
for many characters.</p>
<p>This document describes the syntax rules for the file 
format, but also gives brief information on how each construct is rendered
when laid out for the code charts. Some of the syntax elements are used only in
preparation of the drafts of the code charts and are not present in the final,
released form of the NamesList.txt file.</p>

<p>Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode
5.0. The syntax for marginal sidebar comments is utilized extensively in
draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset 
declaration in a comment at the head of the file were introduced after Unicode 
6.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction
in comments and aliases in the names list format was loosened from the earlier
limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0, and dropped entirely as of Unicode 16.0.0.</p>

<p>The same input file can be used for the preparation of drafts and final editions for ISO/IEC 
  10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some 
  information in the name list file that is not needed (and in fact removed
  during parsing) for the Unicode code charts.</p>

<p>With access to the layout program (<a href="https://www.unicode.org/unibook/">Unibook</a>) it is a simple matter of 
creating name lists for the purpose of formatting working drafts or other documents containing 
proposed characters.</p>  
  <p>The content of the NamesList.txt file is optimized for code chart creation. 
  Some information that can be inferred by the reader from context has been 
  suppressed to make the code charts more readable. See the chapter on Code 
	Charts in the <a href="https://www.unicode.org/versions/latest">Unicode 
	Standard</a>.</p> 

<h3 id="Overview">1.1 <a href="#Overview">NamesList File Overview</a></h3>

<p>The NamesList files are plain text files which in their most simple form look 
like this:</p>

<p>@@&lt;tab&gt;0020&lt;tab&gt;BASIC LATIN&lt;tab&gt;007F<br>
; this is a file comment (ignored)<br>
0020&lt;tab&gt;SPACE<br>
0021&lt;tab&gt;EXCLAMATION MARK<br>
0022&lt;tab&gt;QUOTATION MARK<br>
. . . <br>
007F&lt;tab&gt;DELETE</p>

<p>The semicolon (as first character), @ and &lt;tab&gt; characters are used
by the file syntax and must be provided as shown. Hexadecimal digits must be 
in UPPERCASE. A double @@ introduces a block header, with the title, and 
start and ending code of the block provided as shown.</p>

<p>For a minimal name list, only the NAME_LINE and BLOCKHEADER and 
their constituent syntax elements are needed.</p>

<p>The full syntax with all the options is provided in the following sections.</p>

<h2 id="FileStructure">2.0 <a href="#FileStructure">NamesList File Structure</a></h2>

<p>This section defines the overall file structure</p>

<pre><strong>NAMELIST:     FILE_COMMENT* TITLE_PAGE* EXTENDED_BLOCK*</strong>

<strong>TITLE_PAGE:   TITLE 
		| TITLE_PAGE SUBTITLE 
		| TITLE_PAGE SUBHEADER 
		| TITLE_PAGE IGNORED_LINE 
		| TITLE_PAGE EMPTY_LINE
		| TITLE_PAGE NOTICE_LINE
		| TITLE_PAGE COMMENT_LINE
		| TITLE_PAGE PAGEBREAK 
		| TITLE_PAGE FILE_COMMENT 


EXTENDED_BLOCK:	BLOCK 
		| BLOCK SUMMARY


BLOCK:	BLOCKHEADER 
		| BLOCKHEADER INDEX_TAB
		| BLOCK CHAR_ENTRY 
		| BLOCK SUBHEADER 
		| BLOCK NOTICE_LINE 
		| BLOCK EMPTY_LINE 
		| BLOCK IGNORED_LINE
		| BLOCK SIDEBAR_LINE
		| BLOCK PAGEBREAK
		| BLOCK FILE_COMMENT
		| BLOCK CROSS_REF


CHAR_ENTRY:   NAME_LINE | RESERVED_LINE
		| CHAR_ENTRY ALIAS_LINE
		| CHAR_ENTRY FORMALALIAS_LINE
		| CHAR_ENTRY COMMENT_LINE
		| CHAR_ENTRY CROSS_REF
		| CHAR_ENTRY DECOMPOSITION
		| CHAR_ENTRY COMPAT_MAPPING
		| CHAR_ENTRY IGNORED_LINE
		| CHAR_ENTRY EMPTY_LINE
		| CHAR_ENTRY NOTICE_LINE
		| CHAR_ENTRY FILE_COMMENT 
		| CHAR_ENTRY VARIATION_LINE</strong>
</pre>

<p>In other words:</p>
<p> 
  Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p>
<p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE, 
    EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.</p>
<ul>
  <li>CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, VARIATION_LINE, ALIAS and FORMALALIAS_LINE lines 
    occurring before the first block header are treated as if they were 
    COMMENT_LINEs.</li>
</ul>
<p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted 
    sequence of the following lines may occur (in any order and repeated as often
    as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE, 
    EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.</p>
<ul>
  <li>The conventional order of elements in a char entry: NAME_LINE, 
    FORMALALIAS_LINE, ALIAS, COMMENT_LINE or NOTICE_LINE, CROSS_REFs, VARIATION_LINE, and optionally 
    ending in either DECOMPOSITION or COMPAT_MAPPING is not enforced by the layout program 
    (<a href="https://www.unicode.org/unibook/">Unibook</a>). </li>
</ul>
<p>Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and 
    FILE_COMMENT, none of these lines may 
    occur in any other place.</p>
<ul>
  <li>A NOTICE_LINE or CROSS_REF displays differently depending on whether it follows a header or title 
    or is part of a CHAR_ENTRY</li>
  </ul>
<p>A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY. 
    A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may 
    appear after any block header.</p>
<p>If the first line of a file is a file comment, it may contain a UTF-8 
    charset declaration (see below). Alternatively, or in addition, a BOM may be 
    present at the very beginning of the file, forcing the encoding to be 
    interpreted as UTF-16 (little-endian only) or UTF-8. When
    declared as UTF-8, the names list format will support use of any Unicode characters in
    STRING and LABEL elements. Otherwise,
    the supported repertoire is limited to Latin-1, and attempted use of characters outside
    the Latin-1 range will result in data corruption.</p>
<p>The NamesList file format does not support styled text; each line or other element
    will usually be displayed in a specific font selected for it. To allow CHAR elements
    that normally use chart glyphs to better coexist with running text in LABEL and STRING
    elements, a user defined limit can be set, below which the normal selection of (chart) glyphs 
    for the CHAR element is overridden in favor of equivalent glyphs from a font selected for better
    readability in running text. Any running text outside that range will use standard chart
    glyphs, which may result in a ransom note effect. For production of the Unicode Standard 
    Version 16.0.0 and later the limit is set to U+1EFF.</p>
<p>Several of these elements, while part of the formal definition of the 
  file format, do not occur in final published versions of 
   NamesList.txt in the <a href="https://www.unicode.org/Public/UCD/latest/">UCD</a>.</p>

<h4>Blocks followed by Summaries</h4>
<p>A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:</p>
<pre><strong>
SUMMARY:   ALTGLYPH_SUMMARY
		| VARIATION SUMMARY
		| ALTGLYPH_SUMMARY VARIATION_SUMMARY
		| MIXED_SUMMARY

ALTGLYPH_SUMMARY:   ALTGLYPH_SUBHEADER
		| ALTGLYPH_SUMMARY SUMMARY_LINE

VARIATION_SUMMARY:   VARIATION_SUBHEADER
		| VARIATION_SUMMARY SUMMARY_LINE

MIXED_SUMMARY:   MIXED_SUBHEADER
		| MIXED_SUMMARY SUMMARY_LINE

SUMMARY_LINE:   SUBHEADER
		| NOTICE_LINE
		| FILE_COMMENT
		| EMPTY_LINE</strong>
</pre>

<p>When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements
of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and
preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately
follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are
interspersed between items in the summary.</p>

<p>These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are 
omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements 
as described below, Unibook will automatically generate any required summaries using a default format for the headers.</p>

<p>Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to 
provide specific contents for these summary titles as well as allow the ability to add additional 
information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list 
is machine generated and will always explicitly provide any summary subheaders.</p>

<h3 id="FileElements">2.1 <a href="#FileElements">NamesList File Elements</a></h3>

<p>This section provides the details of the syntax for the individual elements.</p>

<pre><strong>ELEMENT		SYNTAX</strong>	// How rendered

<strong>NAME_LINE:	CHAR TAB NAME LF</strong>
			// The CHAR and the corresponding image are echoed, 
			// followed by the name as given in NAME

<strong>		| CHAR TAB &quot;&lt;&quot; LCNAME &quot;&gt;&quot; LF</strong>
			// Control and noncharacters use this form of
			// lowercase, bracketed pseudo character name

<strong>		| CHAR TAB NAME SP COMMENT LF</strong>
			// Names may have a comment, which is stripped off
			// unless the file is parsed for an ISO style list
                        
<strong>		| CHAR TAB &quot;&lt;&quot; LCNAME &quot;&gt;&quot; SP COMMENT LF</strong>
			// Control and noncharacters may also have comments

<strong>RESERVED_LINE:	CHAR TAB &quot;&lt;reserved&gt;&quot; LF</strong>
			// The CHAR is echoed followed by an icon for the
			// reserved character and a fixed string e.g. &quot;&lt;reserved&gt;&quot;

<strong>COMMENT_LINE:	TAB &quot;*&quot; SP EXPAND_LINE</strong>
			// * is replaced by BULLET, output line as comment

<strong>		| TAB EXPAND_LINE</strong>       
			// Output line as comment

<strong>ALIAS_LINE:	TAB &quot;=&quot; SP LINE</strong>      
			// Replace = by itself, output line as alias

<strong>FORMALALIAS_LINE:
		TAB &quot;%&quot; SP NAME LF</strong>
			// Replace % by U+203B, output line as formal alias

<strong>CROSS_REF:	TAB &quot;x&quot; SP CHAR SP LCNAME LF
		| TAB &quot;x&quot; SP CHAR SP &quot;&lt;&quot; LCNAME &quot;&gt;&quot; LF</strong>
			// x is replaced by a right arrow

<strong>		| TAB &quot;x&quot; SP &quot;(&quot; LCNAME SP &quot;-&quot; SP CHAR &quot;)&quot; LF
		| TAB &quot;x&quot; SP &quot;(&quot; &quot;&lt;&quot; LCNAME &quot;&gt;&quot; SP &quot;-&quot; SP CHAR &quot;)&quot; LF</strong>
			// x is replaced by a right arrow;
			// (second type as used for control and noncharacters)

			// In the forms with parentheses the &quot;(&quot;,&quot;-&quot; and &quot;)&quot; are removed
			// and the order of CHAR and LCNAME is reversed;
			// i.e. all inputs result in the same order of output

<strong>		| TAB &quot;x&quot; SP CHAR LF</strong>
			// x is replaced by a right arrow
			// (this type is the only one without LCNAME 
			// and is used for ideographs)

<strong>VARIATION_LINE:	TAB &quot;~&quot; SP CHAR VARSEL SP LABEL LF   
		| TAB &quot;~&quot; SP CHAR VARSEL SP LABEL &quot;(&quot; LCTAG &quot;)&quot; LF</strong>
			// output standardized variation sequence or simply the char code in case of alternate
			// glyphs, followed by the alternate glyph or variation glyph and the label and context

<strong>FILE_COMMENT:	&quot;;&quot;  LINE</strong>

<strong>EMPTY_LINE:	LF</strong>       
			// Empty and ignored lines as well as 
			// file comments are ignored

<strong>IGNORED_LINE:	TAB &quot;;&quot; LINE</strong>
			// Ignore LINE

<strong>SIDEBAR_LINE: 	&quot;;;&quot; LINE</strong>
			// Output LINE as marginal note

<strong>DECOMPOSITION:	TAB &quot;:&quot; SP EXPAND_LINE
		| TAB &quot;:&quot; SP &quot;&lt;&quot; TAG &quot;&gt;&quot; SP EXPAND_LINE</strong>
			// Replace ':' by EQUIV, expand line into decomposition
			// The &lt;tag&gt; gives optional information,
			// e.g., about composition exclusion.
			// by convention the tag has initial lowercase

<strong>COMPAT_MAPPING:	TAB &quot;#&quot; SP EXPAND_LINE
		| TAB &quot;#&quot; SP &quot;&lt;&quot; TAG &quot;&gt;&quot; SP EXPAND_LINE</strong>
			// Replace '#' by APPROX, output line as mapping
			// The &lt;tag&gt; is the optional compatibility decomposition tag.
			// by convention the tag has initial lowercase

<strong>NOTICE_LINE:	&quot;@+&quot; TAB LINE</strong>       
			// Output LINE as notice

<strong>		| &quot;@+&quot; TAB &quot;*&quot; SP LINE</strong>   
			// Output LINE as notice
			// &quot;*&quot; expands to a bullet character
			// Notices following a character code apply to the
			// character and are indented. Notices not following
			// a character code apply to the page/block/column 
			// and are italicized, but not indented

<strong>TITLE:		&quot;@@@&quot; TAB LINE</strong>
			// Output LINE as text
			// Title is used in page headers

<strong>SUBTITLE:	&quot;@@@+&quot; TAB LINE</strong>
			// Output LINE as subtitle

<strong>SUBHEADER:	&quot;@&quot; TAB LINE</strong>
			// Output LINE as column header

<strong>VARIATION_SUBHEADER:</strong>	<strong>&quot;@~&quot; TAB LINE</strong>		
			// Output LINE as column header (summary subheader)
		<strong>| &quot;@~&quot; LF</strong>
			// Output a default standard variation sequences summary subheader
		<strong>| &quot;@~&quot; TAB &quot;!&quot; LF</strong>
			// Suppress output of a default standard variant sequences summary subheader
			// and disable display of summary
		<strong>| &quot;@~&quot; TAB &quot;!&quot; VARSEL_LIST LF</strong>
		<strong>| &quot;@~&quot; TAB &quot;!&quot; VARSEL_LIST LINE</strong>
			// Output a standard summary subheader, using default or LINE respectively
			// Suppress any std variation sequences using selectors from the list

<strong>ALTGLYPH_SUBHEADER:</strong>	<strong>&quot;@@~&quot; TAB LINE</strong>	
			// Output LINE as column header (summary subheader)
		<strong>| &quot;@@~&quot; LF</strong>
			// Output a default alternate glyph summary subheader
		<strong>| &quot;@@~&quot; TAB &quot;!&quot; LF</strong>
			// Suppress output of a default alternate glyph summary subheader
			// and disable display of summary

<strong>MIXED_SUBHEADER:	</strong><strong>&quot;@@@~&quot; TAB LINE</strong>
			// Output LINE as column header (summary subheader)
		<strong>| &quot;@@@~&quot; LF</strong>
			// Output a default combined variation and alternate glyph summary subheader
		<strong>| &quot;@@@~&quot; TAB &quot;!&quot; LF</strong>
			// Suppress output of a default alternate glyph summary subheader
			// and disable display of summary
		<strong>| &quot;@@@~&quot; TAB &quot;!&quot; VARSEL_LIST LF</strong>
		<strong>| &quot;@@@~&quot; TAB &quot;!&quot; VARSEL_LIST LINE</strong>
			// Output a combined summary subheader, using default or LINE respectively
			// Suppress any std variation sequences using selectors from the list

<strong>BLOCKHEADER:	&quot;@@&quot; TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF</strong>
			// Cause a page break and optional
			// blank page, then output one or more charts
			// followed by the list of character names.
			// Use BLOCKSTART and BLOCKEND to define
			// what characters belong to a block.
			// Use BLOCKNAME in page and table headers

<strong>BLOCKNAME:	LABEL
		| LABEL SP &quot;(&quot; LABEL &quot;)&quot;</strong>   
			// If an alternate label is present it replaces
			// the BLOCKNAME when an ISO-style names list is
			// laid out; it is ignored in the Unicode charts

<strong>BLOCKSTART:	CHAR</strong>	// First character position in block
<strong>BLOCKEND:	CHAR</strong>	// Last character position in block
<strong>PAGEBREAK:	&quot;@@&quot;</strong>	// Insert a (column) break
<strong>INDEX_TAB:	&quot;@@+&quot;</strong>	// Start a new index tab at latest BLOCKSTART

<strong>EXPAND_LINE:	{ESC_CHAR | CHAR | STRING | ESC +}+ LF</strong>
			// Instances of CHAR (see Notes) are replaced by
			// CHAR NBSP x NBSP where x is the single Unicode
			// character corresponding to CHAR.
			// If character is combining, it is replaced with
			// CHAR NBSP &lt;circ&gt; x NBSP where &lt;circ&gt; is the
			// dotted circle
</pre>


	<b>Notes:</b><ul>
	<li>Blocks must be aligned on 16-code point boundary and contain an integer 
		multiple of 16-code point columns. The exception to that rule is for blocks of
		ideographs, <i>etc.</i>, for which no names are listed in the file. The BLOCKEND for such blocks 
	        must correspond to the last assigned character, and not the actual end of the block.</li>
	<li>Blocks must be non-overlapping and in ascending order. NAME_LINEs 
		must be in ascending order and follow the block header for the block to 
		which they belong. </li>
	<li>Reserved entries are optional, and will normally be supplied automatically. They are 
		required whenever followed by ALIAS_LINE, COMMENT_LINE, NOTICE_LINE or CROSS_REF.
	</li>
    <li>An empty alternative glyph summary subheader expression will result in default header &quot;Selected Alternative Glyphs&quot;</li>
    <li>An empty standard variation subheader expression will result in the default header &quot;Standardized Variation Sequences&quot;</li>
    <li>A VARSEL_LIST may only contain code points for standard variation selectors (including script specific ones)</li>
    <li>When displaying a VARIATION_LINE for alternate glyphs, the &quot;ALTn&quot; selector is not displayed. </li>
    <li>If a glyph is unavailable for the variant glyph in a VARIATION_LINE it is replaced by the glyph for U+2591 LIGHT SHADE.</li>
    <li>Because a LINE or an EXPAND_LINE can itself start with a special character followed 
        by a SP or LF, an &quot;unmarked&quot; COMMENT_LINE should match the input in lower priority than line 
        types that require a special character or have a more restrictive set of characters than EXPAND_LINE. 
        Similarly, a SUBHEADER containing TAB &quot;!&quot; LF should match with a higher priority than one
        where the TAB is followed by a LINE.</li>
	</ul>


<h3 id="FilePrimitives">2.2 <a href="#FilePrimitives">NamesList File Primitives</a></h3>

<p>The following are the primitives and terminals for the NamesList syntax. "Limit" is a user-defined value; see discussion of the implications of Limit in the notes below.</p>

<pre><strong>LINE</strong>:		<strong>STRING LF
COMMENT:	&quot;(&quot; LABEL &quot;)&quot;
		| &quot;(&quot; LABEL &quot;)&quot; SP &quot;*&quot;
		| &quot;*&quot;</strong>

<strong>NAME</strong>:	  	&lt;sequence of uppercase ASCII letters, digits, space and hyphen&gt; 
<strong>LCNAME</strong>:		&lt;sequence of lowercase ASCII letters, digits, space and hyphen&gt; <strong> (&quot;-&quot; CHAR)?</strong>

<strong>TAG</strong>:		&lt;sequence of ASCII letters&gt;
<strong>LCTAG</strong>:		&lt;sequence of lowercase ASCII letters&gt;
<strong>STRING</strong>:	  	&lt;sequence of characters, except controls&gt; 
<strong>LABEL</strong>:	  	&lt;sequence of characters, except controls, &quot;(&quot; or &quot;)&quot;&gt; 
<strong>VARSEL</strong>:		<strong>CHAR
		| &quot;ALT&quot; ( &quot;1&quot;|&quot;2&quot;|&quot;3&quot;|&quot;4&quot;|&quot;5&quot;|&quot;6&quot;|&quot;7&quot;|&quot;8&quot;|&quot;9&quot; )</strong>
<strong>VARSEL_LIST</strong>:	<strong>&quot;{&quot; CHAR_LIST &quot;}&quot;</strong>
<strong>CHAR_LIST</strong>:	<strong>CHAR
		| CHAR_LIST SP CHAR</strong>
<strong>CHAR</strong>:		<strong>X X X X</strong>
		<strong>| X X X X X </strong>
		<strong>| X X X X X X </strong>
<strong>X</strong>:	  	<strong>&quot;0&quot;|&quot;1&quot;|&quot;2&quot;|&quot;3&quot;|&quot;4&quot;|&quot;5&quot;|&quot;6&quot;|&quot;7&quot;|&quot;8&quot;|&quot;9&quot;|&quot;A&quot;|&quot;B&quot;|&quot;C&quot;|&quot;D&quot;|&quot;E&quot;|&quot;F&quot;</strong> 
<strong>ESC_CHAR</strong>:	<strong>ESC CHAR</strong>
<strong>ESC</strong>:	        <strong>&quot;\&quot;</strong>
			// Special semantics of backslash (\) are supported
			// only in EXPAND_LINE.
<strong>TAB</strong>:	  	&lt;sequence of one or more ASCII tab characters 0x09&gt;    
<strong>SP</strong>:	  	&lt;ASCII 20&gt;
<strong>LF</strong>:	  	&lt;any sequence of a single ASCII 0A or 0D, or both&gt;
</pre>

<p><b>Notes:</b></p>
<ul>
       <li>Multiple or leading spaces, multiple or leading hyphens, as well as 
       word-initial digits in NAMEs or LCNAMEs are illegal.</li>
	<li>The French version of the names list uses French rules, which allow 
		apostrophe and accented letters in character names.</li>
       <li>When names containing code points are lowercased to make them LCNAMEs, 
         the code point values remain uppercase. Such code points by convention 
         follow a hyphen and are the last element in the name.</li>
    <li>Special limited lookbehind logic prevents a 4 digit number for a standard, such
    as ISO 9999 from being misinterpreted as ISO CHAR. Currently recognized are
    &quot;ISO&quot;, &quot;DIN&quot;, &quot;IEC&quot; and &quot;S X&quot; as well as &quot;S C&quot; for the JIS X and JIS C series of
    standards. (In addition &quot;EEE&quot; and &quot;S X&quot; are recognized for use with IEEE and KSC X standards. For the GB series of standards, &quot; GB&quot; is defined to prevent conversion to CHAR, but has no effect at the start of a line). For other standards, or for four-digit years in a comment, use a
    NOTICE_LINE instead, which prevents expansion, or use &quot;\&quot; to escape the digits.</li>
       <li>Single and double straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
    Smart apostrophes are supported, but nested quotes are not.
       Single quotes can only be applied around a single word.</li>
       <li>A CHAR inside ' or &quot; is expanded, but only its glyph image is printed, the
    code value is not echoed.</li>
       <li>Inside an EXPAND_LINE, backslash is treated as an escape character that
       removes the special meaning of any literal character and also prevents
       the following digit sequence from being expanded. A backslash character in
       isolation is never displayed. A sequence of two backslash characters results
       in display of a single backslash, but has no effect on the interpretation
       of following characters.</li>
       <li>The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on 
       output.</li>
       <li>The NamesList.txt file is encoded in UTF-8 if the <i>first line</i> is a 
       FILE_COMMENT containing the declaration &quot;UTF-8&quot; or any casemap variation 
       thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond 
       detecting the charset declaration (typically: &quot;; charset=utf-8&quot;) the 
       remainder of that comment is ignored. 
       When declared as UTF-8, the NamesList format will support any Unicode character
       in STRING or LABEL elements, but see further implications below.</li>
       <li>In a STRING or LABEL element, a Unicode character outside the range 
        U+0020..Limit is displayed with a glyph matching 
        the chart font, and not with the font that is otherwise defined for that element. 
        The Limit value is user defined.
        For production of the Unicode Standard from Version 16.0.0 and later the Limit
        value is set to U+1EFF.
        All code points less than the Limit value can be mapped onto a font selected for best
        results in running text. However, any CHAR elements contained in an EXPAND_LINE 
        are exempt from this and are always displayed with a glyph matching the chart font.
        The net effect is a workaround for the fact that the NamesList format does 
        not support style runs within any element that encompasses a single unit of flowed text.</li>
       <li>When drafting STRING or LABEL elements, one should note that text containing
        characters outside the range U+0020..Limit may result in a ransom note effect,
    as the regular text font and charts fonts would be alternated. This is best avoided.</li>
       <li>The code chart layout program
       (<a href="https://www.unicode.org/unibook/">Unibook</a>)
       can accept files in several other formats. These include little-endian UTF-16,
       prefixed with a BOM, or UTF-8 prefixed with the UTF-8 BOM.</li>
       <li>While the format allows multiple &lt;tab&gt; characters, by convention the 
       actual number of tabs is always one or two, chosen to provide the best 
       layout of the plain text file.</li>
       <li>Earlier published versions of the NamesList.txt file may contain trailing or otherwise extraneous 
       spaces or tab characters; while these are errors in the files, they are not 
       being corrected, to retain stability of the published versions. Anyone 
       writing a parser for older versions of this file may need to be prepared to 
       handle such exceptions.</li>
   <li>Lines are terminated by \r, \n, \r\n or \n\r. Repeated terminators imply empty lines, e.g. \r\r\n is treated as 2 lines, as is \r\n\r\n.</li>
       <li>The final LF in the file must be present.</li>
</ul>
  <h2 id="Modifications"><a href="#Modifications">Modifications</a></h2>

    <p><b>Version 17.0.0</b></p>
        <ul>
            <li>Reissued for Unicode 17.0.0</li>
        </ul>

    <p><b>Version 16.0.0</b></p>
        <ul>
            <li>Reissued for Unicode 16.0.0</li>
            <li>Reflect the wider range of possible values for the user defined Limit.</li>
            <li>Added an explanation of the effect of the Limit value.</li>
        </ul>

    <p><b>Version 15.1.0</b></p>
        <ul>
        <li>Reissued for Unicode 15.1.0.</li>
        <li>Adjusted NAMELIST definition to account for positions of FILE_COMMENT.</li>
        <li>Added a note to the bullets in Section 2.1 to clarify priority of matching for
        some line types.</li>
        <li>In Section 2.2, added a note clarifying the font handling for characters
        outside the range U+0000..U+02FF occurring in NAME or LABEL elements.</li>
        <li>Also in Section 2.2, updated the bullet about lookbehind logic
            for identifying digit sequences that are part of identifiers for various standards, 
            to include the detection of IEEE, KSC X, and GB standards.</li>
        <li>Added missing quotation marks around * in second expansion for
        NOTICE_LINE.</li>
        <li>Corrected and clarified the BNF statement of nameslist syntax.</li>
        <li>Some literals had not been quoted, some productions were missing the trailing LF</li>
        <li>The LF and LCNAME productions were clarified</li>
        <li>Updated to HTML5</li>
    	</ul>
    <p><b>Version 15.0.0</b></p>
        <ul>
        <li>Reissued for Unicode 15.0.0.</li>
    	</ul>
    <p><b>Version 14.0.0</b></p>
        <ul>
        <li>Reissued for Unicode 14.0.0.</li>
        <li>Corrected character name LIGHT SCREEN to LIGHT SHADE.</li>
    	</ul>
    <p><b>Version 13.0.0</b></p>
        <ul>
        <li>Reissued for Unicode 13.0.0.</li>
        <li>Added a second expansion for DECOMPOSITION, for possible future
        	use to designate specific subtypes of canonical decompositions
        	in the names list output.</li>
    	</ul>
    <p><b>Version 12.1.0</b></p>
        <ul>
        <li>Reissued for Unicode 12.1.0.</li>
    	</ul>
    <p><b>Version 12.0.0</b></p>
        <ul>
        <li>Reissued for Unicode 12.0.0.</li>
        <li>Added definition of TAG (allowing uppercase letters), distinct from LCTAG.</li>
        <li>Corrected definition of VARIATION_LINE to use LCTAG instead of LCNAME.</li>
        <li>Corrected definition of COMPAT_MAPPING to use TAG instead of LCTAG.</li>
        <li>Corrected the documentation regarding which elements allow use of characters
        	in the range U+0020..U+02FF.</li>
        </ul>
    <p><b>Version 11.0.0</b></p>
        <ul>
        <li>Reissued for Unicode 11.0.0.</li>
        <li>Loosened the limitation on repertoire allowed in LINE and LABEL
        	elements to include characters outside Latin-1, in the range
        	U+0100..U+02FF.</li>
        </ul>
  <p><b>Version 10.0.0</b></p>
        <ul>
        <li>Reissued for Unicode 10.0.0.</li>
        </ul>
  <p><b>Version 9.0.0</b></p>
        <ul>
        <li>Reissued for Unicode 9.0.0.</li>
        </ul>
  <p><b>Version 8.0.0</b></p>
        <ul>
        <li>Reissued for Unicode 8.0.0.</li>
  <li>Added MIXED_SUBHEADER, VARSEL_LIST, and CHAR_LIST to the syntax.</li>
  <li>Tweaked BNF and notes for variation summaries.</li>
        </ul>
  <p><b>Version 7.0.0</b></p>
        <ul>
        <li>Reissued for Unicode 7.0.0.</li>
        </ul>
  <p><b>Version 6.3.0</b></p>
        <ul>
        <li>Reissued for Unicode 6.3.0.</li>
        </ul>
  <p><b>Version 6.2.0</b></p>
        <ul>
  <li>Edited the variation syntax definitions, description and corresponding notes for wording.</li>
  <li>Minor tweaks to the layout of BNF syntax, mostly adding tabs and | characters as needed.</li>
  <li>Fixed some typographical errors and minor inconsistencies.</li>
  <li>Added syntax for elements required by variation sequence and alternate glyph summaries.</li>
  <li>Edited and reformatted some notes for readability.</li>
  <li>Documented the permitted presence of CROSS_REF outside character entries within blocks. 
  Such CROSS_REFs have been present in published names lists, but that information was missing in 
  the syntax description. For an example see the Currency Symbols block in the code charts.</li>
  <li>Added description of UTF-8 charset declaration and file encoding.</li>
        </ul>
  <p><b>Version 6.1.0</b></p>
        <ul>
		<li>Removed constraint that LCTAG consist only of lowercase letters,
                because of the existence of the &quot;noBreak&quot; tag.</li>
        </ul>
  <p><b>Version 6.0.0</b></p>
        <ul>
		<li>Added definitions for ESC_CHAR and ESC primitives.</li>
		<li>Clarified interpretation of backslash escapes in EXPAND_LINE.</li>
        </ul>
  <p><b>Version 5.2.0</b></p>
	<ul>
		<li>Better aligned the rules section with the actual published files and 
		behavior of existing parsers. This included fixing some obvious typos 
		and clarifying some notes as well as the following changes, which are 
		listed individually.</li>
		<li>Replaced instances of &lt;tab&gt; by TAB throughout.</li>
		<li>NAME_LINE for special names may have trailing COMMENTs including COMMENTs 
		consisting entirely of &quot;*&quot;.</li>
		<li>In CROSS_REF added the form without LCNAME, fixed the literal to the 
		correct lowercase &quot;x&quot; and noted that LCNAME may have &quot;&lt;&quot; and &quot;&gt;&quot; around 
		it in the data. Also added missing LF in the rules.</li>
		<li>Removed a redundant rule for BLOCKHEADER.</li>
		<li>Changed FORMALALIAS_LINE from LINE to NAME to match actual restriction 
		on contents.</li>
		<li>Extended the documentation of lookahead logic for CHAR.</li>
		<li>Accounted for FILE_COMMENT in overall file structure.</li>
	</ul>
	<p><b>Version 5.1.0</b></p>
	<ul>
		<li>Noted that comments in NAME_LINEs must be preceded by SP.</li>
		<li>Provided additional information on allowable characters in names.</li>
		<li>Added SIDEBAR_LINE.</li>
		<li>Noted that CROSS_REF must contain a SP and CHAR, and that 
		COMPAT_MAPPING must contain a SP and may contain a &lt;tag&gt;</li>
		<li>Noted that LCNAME may contain uppercase characters under 
		exceptional circumstances.</li>
		<li>Relaxed the restriction on lines starting with #, :, %, x and = on 
		the TITLE_PAGE. These are now treated as comments.</li>
	</ul>
	<p><b>Version 5.0.0</b></p>
	<ul>
		<li>Added FORMALALIAS_LINE and INDEX_TAB to syntax.</li>
		<li>Fixed the list of lines that may appear before a BLOCKHEADER by 
		adding NOTICE_LINE.</li>
		<li>Minor fixes to the wording of several syntax definitions.</li>
	</ul>
	<p><b>Version 4.0.0</b></p>
	<ul>
		<li>Fixed syntax to better reflect restrictions on characters 
  in character and block names.</li>
		<li>Better document treatment of comments in block names, plus 
  French name rules.</li>
	</ul>
  <p><b>Version 3.2.0</b></p>
	<ul>
		<li>Fixed several broken links, added a left margin,  
  changed version numbering.</li>
	</ul>
  <p><b>Version 3.1.0 (2)</b></p>
	<ul>
		<li>Use of 4-6 digit hex notation is now supported.</li>
	</ul>
</div>

  <div class="pagebottom">
      <hr style="width:50%">
      <a href="https://www.unicode.org/copyright.html">
      <img src="https://www.unicode.org/img/hb_notice.gif" 
      alt="Access to Copyright and terms of use" ></a>
  </div>

</body>

</html>
kc3-lang/ucd2c/UCD/NamesList.html

Commit

kc3-lang/ucd2c /UCD/NamesList.html