Edit

kc3-lang/harfbuzz/docs/usermanual-what-is-harfbuzz.xml

Branch :

  • Show log

    Commit

  • Author : David Corbett
    Date : 2022-06-25 13:32:04
    Hash : 78c5ae39
    Message : [indic] Remove remnants of Sinhala

  • docs/usermanual-what-is-harfbuzz.xml
  • <?xml version="1.0"?>
    <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
                   "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
      <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
      <!ENTITY version SYSTEM "version.xml">
    ]>
    <chapter id="what-is-harfbuzz">
      <title>What is HarfBuzz?</title>
      <para>
        HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you
        give HarfBuzz a font and a string containing a sequence of Unicode
        codepoints, HarfBuzz selects and positions the corresponding
        glyphs from the font, applying all of the necessary layout rules
        and font features. HarfBuzz then returns the string to you in the
        form that is correctly arranged for the language and writing
        system. 
      </para>
      <para>
        HarfBuzz can properly shape all of the world's major writing
        systems. It runs on all major operating systems and software
        platforms and it supports the major font formats in use
        today.
      </para>
      <section id="what-is-text-shaping">
        <title>What is text shaping?</title>
        <para>
          Text shaping is the process of translating a string of character
          codes (such as Unicode codepoints) into a properly arranged
          sequence of glyphs that can be rendered onto a screen or into
          final output form for inclusion in a document.
        </para>
        <para>
          The shaping process is dependent on the input string, the active
          font, the script (or writing system) that the string is in, and
          the language that the string is in.
        </para>
        <para>
          Modern software systems generally only deal with strings in the
          Unicode encoding scheme (although legacy systems and documents may
          involve other encodings).
        </para>
        <para>
          There are several font formats that a program might
          encounter, each of which has a set of standard text-shaping
          rules.
        </para>
        <para>The dominant format is <ulink
          url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The
        OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for
        various scripts from around the world. These shaping models depend on
        the font incorporating certain features as
        <emphasis>lookups</emphasis> in its <literal>GSUB</literal> 
        and <literal>GPOS</literal> tables.
        </para>
        <para>
          Alternatively, OpenType fonts can include shaping features for
          the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model.
        </para>
        <para>
          TrueType fonts can also include OpenType shaping
          features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple
          Advanced Typography</ulink> (AAT) tables to implement shaping
          support. AAT fonts are generally only found on macOS and iOS systems.
        </para>
        <para>
          Text strings will usually be tagged with a script and language
          tag that provide the context needed to perform text shaping
          correctly.  The necessary <ulink
          url="https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags">script</ulink> 
          and <ulink
          url="https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags">language</ulink>
          tags are defined by OpenType.
        </para>
      </section>
      
      <section id="why-do-i-need-a-shaping-engine">
        <title>Why do I need a shaping engine?</title>
        <para>
          Text shaping is an integral part of preparing text for
          display. Before a Unicode sequence can be rendered, the
          codepoints in the sequence must be mapped to the corresponding
          glyphs provided in the font, and those glyphs must be positioned
          correctly relative to each other. For many of the scripts
          supported in Unicode, these steps involve script-specific layout
          rules, including complex joining, reordering, and positioning
          behavior. Implementing these rules is the job of the shaping engine.
        </para>
        <para>
          Text shaping is a fairly low-level operation. HarfBuzz is
          used directly by text-handling libraries like <ulink
          url="https://www.pango.org/">Pango</ulink>, as well as by the layout
          engines in Firefox, LibreOffice, and Chromium. Unless you are
          <emphasis>writing</emphasis> one of these layout engines
          yourself, you will probably not need to use HarfBuzz: normally,
          a layout engine, toolkit, or other library will turn text into
          glyphs for you.
        </para>
        <para>
          However, if you <emphasis>are</emphasis> writing a layout engine
          or graphics library yourself, then you will need to perform text
          shaping, and this is where HarfBuzz can help you.
        </para>
        <para>
          Here are some specific scenarios where a text-shaping engine
          like HarfBuzz helps you:
        </para>
        <itemizedlist>
          <listitem>
            <para>
              OpenType fonts contain a set of glyphs (that is, shapes
    	  to represent the letters, numbers, punctuation marks, and
    	  all other symbols), which are indexed by a <literal>glyph ID</literal>.
    	</para>
    	<para>
              A particular glyph ID within the font does not necessarily
    	  correlate to a predictable Unicode codepoint. For instance,
    	  some fonts have the letter &quot;a&quot; as glyph ID 1, but
    	  many others do not. In order to retrieve the right glyph
    	  from the font to display &quot;a&quot;, you need to consult
    	  the table inside the font (the <literal>cmap</literal>
    	  table) that maps Unicode codepoints to glyph IDs. In other
    	  words, <emphasis>text shaping turns codepoints into glyph
    	  IDs</emphasis>.
            </para>
          </listitem>
          <listitem>
            <para>
              Many OpenType fonts contain ligatures: combinations of
              characters that are rendered as a single unit. For instance,
    	  it is common for the &quot;f, i&quot; letter
    	  sequence to appear in print as the single ligature glyph
    	  &quot;fi&quot;.
    	</para>
    	<para>
    	  Whether you should render an &quot;f, i&quot; sequence
    	  as <literal>fi</literal> or as &quot;fi&quot; does not
              depend on the input text. Instead, it depends on the whether
    	  or not the font includes an &quot;fi&quot; glyph and on the
    	  level of ligature application you wish to perform. The font
    	  and the amount of ligature application used are under your
    	  control. In other words, <emphasis>text shaping involves
    	  querying the font's ligature tables and determining what
    	  substitutions should be made</emphasis>. 
            </para>
          </listitem>
          <listitem>
            <para>
              While ligatures like &quot;fi&quot; are optional typographic
              refinements, some languages <emphasis>require</emphasis> certain
              substitutions to be made in order to display text correctly.
            </para>
    	<para>
    	  For example, in Tamil, when the letter &quot;TTA&quot; (ட)
    	  letter is followed by the vowel sign &quot;U&quot; (ு), the pair
    	  must be replaced by the single glyph &quot;டு&quot;. The
    	  sequence of Unicode characters &quot;ட,ு&quot; needs to be
    	  substituted with a single &quot;டு&quot; glyph from the
    	  font.
    	</para>
    	<para>
    	  But &quot;டு&quot; does not have a Unicode codepoint. To
    	  find this glyph, you need to consult the table inside 
    	  the font (the <literal>GSUB</literal> table) that contains
    	  substitution information. In other words, <emphasis>text shaping 
    	  chooses the correct glyph for a sequence of characters
    	  provided</emphasis>.
            </para>
          </listitem>
          <listitem>
            <para>
              Similarly, each Arabic character has four different variants
    	  corresponding to the different positions it might appear in
    	  within a sequence. Inside a font, there will be separate
    	  glyphs for the initial, medial, final, and isolated forms of
    	  each letter, each at a different glyph ID.
    	</para>
    	<para>
    	  Unicode only assigns one codepoint per character, so a
    	  Unicode string will not tell you which glyph variant to use
    	  for each character. To decide, you need to analyze the whole
    	  string and determine the appropriate glyph for each character
    	  based on its position. In other words, <emphasis>text
    	  shaping chooses the correct form of the letter by its
    	  position and returns the correct glyph from the font</emphasis>.
            </para>
          </listitem>
          <listitem>
            <para>
              Other languages involve marks and accents that need to be
              rendered in specific positions relative a base character. For
              instance, the Moldovan language includes the Cyrillic letter
              &quot;zhe&quot; (ж) with a breve accent, like so: &quot;ӂ&quot;.
    	</para>
    	<para>
    	  Some fonts will provide this character as a single
    	  zhe-with-breve glyph, but other fonts will not and, instead,
    	  will expect the rendering engine to form the character by 
              superimposing the separate &quot;ж&quot; and &quot;˘&quot;
    	  glyphs.
    	</para>
    	<para>
    	  But exactly where you should draw the breve depends on the
    	  height and width of the preceding zhe glyph. To find the
    	  right position, you need to consult the table inside
    	  the font (the <literal>GPOS</literal> table) that contains
    	  positioning information.
              In other words, <emphasis>text shaping tells you whether you
    	  have a precomposed glyph within your font or if you need to
    	  compose a glyph yourself out of combining marks&mdash;and,
    	  if so, where to position those marks.</emphasis>
            </para>
          </listitem>
        </itemizedlist>
        <para>
          If tasks like these are something that you need to do, then you
          need a text shaping engine. You could use Uniscribe if you are
          writing Windows software; you could use CoreText on macOS; or
          you could use HarfBuzz.
        </para>
        <note>
          <para>
    	In the rest of this manual, the text will assume that the reader
    	is that implementor of a text-layout engine.
          </para>
        </note>
      </section>
      
    
      <section id="what-does-harfbuzz-do">
        <title>What does HarfBuzz do?</title>
        <para>
          HarfBuzz provides text shaping through a cross-platform
          C API that accepts sequences of Unicode codepoints as input. Currently,
          the following OpenType shaping models are supported:
        </para>
        <itemizedlist>
          <listitem>
    	<para>
    	  Indic (covering Devanagari, Bengali, Gujarati,
    	  Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu)
    	</para>
          </listitem>
          <listitem>
    	<para>
    	  Arabic (covering Arabic, N'Ko, Syriac, and Mongolian)
    	</para>
          </listitem>
          <listitem>
    	<para>
    	  Thai and Lao
    	</para>
          </listitem>
          <listitem>
    	<para>
    	  Khmer
    	</para>
          </listitem>
          <listitem>
    	<para>
    	  Myanmar
    	</para>
          </listitem>
          
          <listitem>
    	<para>
    	  Tibetan
    	</para>
          </listitem>
          
          <listitem>
    	<para>
    	  Hangul
    	</para>
          </listitem>
          
          <listitem>
    	<para>
    	  Hebrew
    	</para>
          </listitem>      
          <listitem>
    	<para>
    	  The Universal Shaping Engine or <emphasis>USE</emphasis>
    	  (covering complex scripts not covered by the above shaping
    	  models)
    	</para>
          </listitem>      
          <listitem>
    	<para>
    	  A default shaping model for non-complex scripts
    	  (covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh,
    	  and many others)
    	</para>
          </listitem>
          <listitem>
    	<para>
    	  Emoji (including emoji modifier sequences, flag sequences,
    	  and ZWJ sequences)
    	</para>
          </listitem>
        </itemizedlist>
    
        <para>
          In addition to OpenType shaping, HarfBuzz supports the latest
          version of Graphite shaping (the "Graphite 2" model) and AAT
          shaping.
        </para>
        
        <para>
          HarfBuzz can read and understand TrueType fonts (.ttf), TrueType
          collections (.ttc), and OpenType fonts (.otf, including those
          fonts that contain TrueType-style outlines and those that
          contain PostScript CFF or CFF2 outlines).
        </para>
    
        <para>
          HarfBuzz is designed and tested to run on top of the FreeType
          font renderer. It can run on Linux, Android, Windows, macOS, and
          iOS systems.
        </para>
        
        <para>
          In addition to its core shaping functionality, HarfBuzz provides
          functions for accessing other font features, including optional
          GSUB and GPOS OpenType features, as well as
          all color-font formats (<literal>CBDT</literal>,
          <literal>sbix</literal>, <literal>COLR/CPAL</literal>, and
          <literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz
          also includes a font-subsetting feature. HarfBuzz can perform
          some low-level math-shaping operations, although it does not
          currently perform full shaping for mathematical typesetting.
        </para>
        
        <para>
          A suite of command-line utilities is also provided in the
          source-code tree, designed to help users test and debug
          HarfBuzz's features on real-world fonts and input.
        </para>
      </section>
    
      <section id="what-harfbuzz-doesnt-do">
        <title>What HarfBuzz doesn't do</title>
        <para>
          HarfBuzz will take a Unicode string, shape it, and give you the
          information required to lay it out correctly on a single
          horizontal (or vertical) line using the font provided. That is the
          extent of HarfBuzz's responsibility.
        </para>
        <para>
          It is important to note that if you are implementing a complete
          text-layout engine you may have other responsibilities that
          HarfBuzz will <emphasis>not</emphasis> help you with. For example:
        </para>
        <itemizedlist>
          <listitem>
            <para>
              HarfBuzz won't help you with bidirectionality. If you want to
              lay out text that includes a mix of Hebrew and English, you
    	  will need to ensure that each buffer provided to HarfBuzz
    	  has all of its characters in the same order and that the
    	  directionality of the buffer is set correctly. This may mean
    	  segmenting the text before it is placed into HarfBuzz buffers. In
              other words, the user will hit the keys in the following
              sequence:
            </para>
            <programlisting>
    	  A B C [space] ג ב א [space] D E F
            </programlisting>
            <para>
              but will expect to see in the output:
            </para>
            <programlisting>
    	  ABC אבג DEF
            </programlisting>
            <para>
              This reordering is called <emphasis>bidi processing</emphasis>
              (&quot;bidi&quot; is short for bidirectional), and there's an
              algorithm as an annex to the Unicode Standard which tells you how
              to process a string of mixed directionality.
              Before sending your string to HarfBuzz, you may need to apply the
              bidi algorithm to it. Libraries such as <ulink
    	  url="http://icu-project.org/">ICU</ulink> and <ulink
    	  url="http://fribidi.org/">fribidi</ulink> can do this for you.
            </para>
          </listitem>
          <listitem>
            <para>
              HarfBuzz won't help you with text that contains different font
              properties. For instance, if you have the string &quot;a
              <emphasis>huge</emphasis> breakfast&quot;, and you expect
              &quot;huge&quot; to be italic, then you will need to send three
              strings to HarfBuzz: <literal>a</literal>, in your Roman font;
              <literal>huge</literal> using your italic font; and
              <literal>breakfast</literal> using your Roman font again.
    	</para>
    	<para>
              Similarly, if you change the font, font size, script,
    	  language, or direction within your string, then you will
    	  need to shape each run independently and output them
    	  independently. HarfBuzz expects to shape a run of characters
    	  that all share the same properties.
            </para>
          </listitem>
          <listitem>
            <para>
              HarfBuzz won't help you with line breaking, hyphenation, or
              justification. As mentioned above, HarfBuzz lays out the string
              along a <emphasis>single line</emphasis> of, notionally,
              infinite length. If you want to find out where the potential
              word, sentence and line break points are in your text, you
              could use the ICU library's break iterator functions.
            </para>
            <para>
              HarfBuzz can tell you how wide a shaped piece of text is, which is
              useful input to a justification algorithm, but it knows nothing
              about paragraphs, lines or line lengths. Nor will it adjust the
              space between words to fit them proportionally into a line.
            </para>
          </listitem>
        </itemizedlist>
        <para>
          As a layout-engine implementor, HarfBuzz will help you with the
          interface between your text and your font, and that's something
          that you'll need&mdash;what you then do with the glyphs that your font
          returns is up to you. 
        </para>
      </section>
        
      <section id="why-is-it-called-harfbuzz">
        <title>Why is it called HarfBuzz?</title>
        <para>
          HarfBuzz began its life as text-shaping code within the FreeType
          project (and you will see references to the FreeType authors
          within the source code copyright declarations), but was then
          extracted out to its own project. This project is maintained by
          Behdad Esfahbod, who named it HarfBuzz. Originally, it was a
          shaping engine for OpenType fonts&mdash;&quot;HarfBuzz&quot; is
          the Persian for &quot;open type&quot;.
        </para>
      </section>
    </chapter>