• README

  • gxvalid: TrueType GX validator
    ==============================
    
    
    1. What is this
    ---------------
    
      `gxvalid' is a module to  validate TrueType GX tables: a collection of
      additional tables  in TrueType  font which are  used by  `QuickDraw GX
      Text',  Apple Advanced  Typography  (AAT).  In  addition, gxvalid  can
      validates `kern'  tables which have  been extended for AAT.   Like the
      otvalid  module,   gxvalid  uses   FreeType  2's  validator  framework
      (ftvalid).
    
      You can link gxvalid with your program; before running your own layout
      engine, gxvalid validates a font  file.  As the result, you can remove
      error-checking code  from the layout  engine.  It is also  possible to
      use  gxvalid  as a  stand-alone  font  validator;  the `ftvalid'  test
      program  included  in the  ft2demo  bundle  calls gxvalid  internally.
      A stand-alone font validator may be useful for font developers.
    
      This documents documents the following issues.
    
      - supported TrueType GX tables
      - fundamental validation limitations
      - permissive error handling of broken GX tables
      - `kern' table issue.
    
    
    2. Supported tables
    -------------------
    
      The following GX tables are currently supported.
    
        bsln
        feat
        just
        kern(*)
        lcar
        mort
        morx
        opbd
        prop
        trak
    
      The following GX tables are currently unsupported.
    
        cvar
        fdsc
        fmtx
        fvar
        gvar
        Zapf
    
      The following GX tables won't be supported.
    
        acnt(**)
        hsty(***)
    
      The following undocumented tables in TrueType fonts designed for Apple
      platform aren't handled either.
    
        addg
        CVTM
        TPNM
        umif
    
    
      *)   The `kern'  validator handles both  the classic and the  new kern
           formats;  the former  is supported  on both  Microsoft  and Apple
           platforms, while the latter is supported on Apple platforms.
    
      **)  `acnt' tables are not supported by currently available Apple font
           tools.
    
      ***) There  is  one more  Apple  extension,  `hsty',  but  it  is  for
           Newton-OS, not GX  (Newton-OS is a platform by  Apple, but it can
           use  sfnt- housed bitmap  fonts only).   Therefore, it  should be
           excluded  from  `Apple  platform'  in the  context  of  TrueType.
           gxvalid ignores it as Apple font tools do so.
    
    
      We have  checked 183  fonts bundled with  MacOS 9.1, MacOS  9.2, MacOS
      10.0, MacOS X 10.1, MSIE  for MacOS, and AppleWorks 6.0.  In addition,
      we have  checked 67 Dynalab fonts  (designed for MacOS)  and 189 Ricoh
      fonts (designed for Windows and  MacOS dual platforms).  The number of
      fonts including TrueType GX tables are as follows.
    
        bsln:  76
        feat: 191
        just:  84
        kern:  59
        lcar:   4
        mort: 326
        morx:  19
        opbd:   4
        prop: 114
        trak:  16
    
      Dynalab  and Ricoh fonts  don't have  GX tables  except of  `feat' and
      `mort'.
    
    
    3. Fundamental validation limitations
    -------------------------------------
    
      TrueType  GX  provides  layout   information  to  libraries  for  font
      rasterizers  and text layout.   gxvalid can  check whether  the layout
      data in  a font is conformant  to the TrueType GX  format specified by
      Apple.  But gxvalid cannot check  a how QuickDraw GX/AAT renderer uses
      the stored information.
    
      3-1. Validation of State Machine activity
      -----------------------------------------
    
        QuickDraw GX/AAT uses a `State Machine' to provide `stateful' layout
        features,  and TrueType GX  stores the  state transition  diagram of
        this `State  Machine' in a  `StateTable' data structure.   While the
        State  Machine receives  a series  of glyph  IDs, the  State Machine
        starts with `start  of text' state, walks around  various states and
        generates various  layout information  to the  renderer, and finally
        reaches the `end of text' state.
    
        gxvalid can check essential errors like:
    
          - possibility of state transitions to undefined states
          - existence of glyph  IDs that the State Machine  doesn't know how
            to handle
          - the  State Machine  cannot compute  the layout  information from
            given diagram
    
        These errors  can be  checked within finite  steps, and  without the
        State Machine itself, because these are `expression' errors of state
        transition diagram.
    
        There  is no  limitation  about  how long  the  State Machine  walks
        around,  so validation  of  the algorithm  in  the state  transition
        diagram requires infinite  steps, even if we had  a State Machine in
        gxvalid.   Therefore, the  following errors  and problems  cannot be
        checked.
    
          - existence of states which the State Machine never transits to
          - the  possibility that the  State Machine  never reaches  `end of
            text'
          - the possibility of stack underflow/overflow in the State Machine
            (in  ligature  and  contextual  glyph substitutions,  the  State
            Machine can store 16 glyphs onto its stack)
    
        In addition, gxvalid doesn't check `temporary glyph IDs' used in the
        chained State Machines  (in `mort' and `morx' tables).   If a layout
        feature  is  implemented by  a  single  State  Machine, a  glyph  ID
        converted by the State Machine is passed to the glyph renderer, thus
        it  should not  point to  an undefined  glyph ID.   But if  a layout
        feature is implemented by  chained State Machines, a component State
        Machine  (if it  is  not the  final  one) is  permitted to  generate
        undefined glyph IDs for temporary use, because it is handled by next
        component State Machine and not  by the glyph renderer.  To validate
        such temporary glyph IDs, gxvalid must stack all undefined glyph IDs
        which  can occur in  the output  of the  previous State  Machine and
        search  them in  the  `ClassTable' structure  of  the current  State
        Machine.  It is too complex to  list all possible glyph IDs from the
        StateTable, especially from a ligature substitution table.
    
      3-2. Validation of relationship between multiple layout features
      ----------------------------------------------------------------
    
        gxvalid does  not validate the relationship  between multiple layout
        features at all.
    
        If  multiple layout  features  are defined  in  TrueType GX  tables,
        possible  interactions,  overrides,  and  conflicts  between  layout
        features are implicitly  given in the font too.   For example, there
        are several predefined spacing control features:
    
          - Text Spacing          (Proportional/Monospace/Half-width/Normal)
          - Number Spacing        (Monospaced-numbers/Proportional-numbers)
          - Kana Spacing          (Full-width/Proportional)
          - Ideographic Spacing   (Full-width/Proportional)
          - CJK Roman Spacing     (Half-width/Proportional/Default-roman
                                   /Full-width-roman/Proportional)
    
        If all  layout features are  independently managed, we  can activate
        inconsistent  typographic rules  like  `Text Spacing=Monospace'  and
        `Ideographic Spacing=Proportional' at the same time.
    
        The combinations  of layout features  is managed by a  32bit integer
        (one bit each for selector  setting), so we can define relationships
        between  up  to 32  features,  theoretically.   But  if one  feature
        setting  affects  another   feature  setting,  we  need  typographic
        priority  rules to  validate the  relationship.   Unfortunately, the
        TrueType GX format specification does not give such information even
        for predefined features.
    
    
    4. Permissive error handling of broken GX tables
    ------------------------------------------------
    
      When  Apple's font  rendering system  finds an  inconsistency,  like a
      specification  violation or  an  unspecified value  in  a TrueType  GX
      table, it does not always  return error.  In most cases, the rendering
      engine silently  ignores such wrong  values or even whole  tables.  In
      fact, MacOS is shipped with  fonts including broken GX/AAT tables, but
      no harmful  effects due to  `officially broken' fonts are  observed by
      end-users.
    
      gxvalid  is designed  to continue  the validation  process as  long as
      possible.  When gxvalid find wrong  values, gxvalid warns it at least,
      and takes  a fallback procedure  if possible.  The  fallback procedure
      depends on the debug level.
    
      We used the following three tools to investigate Apple's error handling.
    
        - FontValidator  (for MacOS 8.5 - 9.2)  resource fork font
        - ftxvalidator   (for MacOS X 10.1 -)   dfont or naked-sfnt
        - ftxdumperfuser (for MacOS X 10.1 -)   dfont or naked-sfnt
    
      However, all tests were done on a PowerPC based Macintosh; at present,
      we have not checked those tools on a m68k-based Macintosh.
    
      In total, we checked 183 fonts  bundled to MacOS 9.1, MacOS 9.2, MacOS
      10.0, MacOS X  10.1, MSIE for MacOS, and  AppleWorks 6.0.  These fonts
      are distributed  officially, but many broken GX/AAT  tables were found
      by Apple's font tools.  In the following, we list typical violation of
      the GX specification, in fonts officially distributed with those Apple
      systems.
    
      4-1. broken BinSrchHeader (19/183)
      ----------------------------------
    
        `BinSrchHeader' is  a header of a  data array for  m68k platforms to
        access memory efficiently.  Although  there are only two independent
        parameters  for real  (`unitSize' and  `nUnits'),  BinSrchHeader has
        three additional parameters which  can be calculated from `unitSize'
        and  `nUnits',  for  fast  setup.   Apple  font  tools  ignore  them
        silently, so gxvalid warns if it finds and inconsistency, and always
        continues  validation.    The  additional  parameters   are  ignored
        regardless of the consistency.
    
          19  fonts include  such  inconsistencies; all  breaks  are in  the
          BinSrchHeader structure of the `kern' table.
    
      4-2. too-short LookupTable (5/183)
      ----------------------------------
    
        LookupTable format 0  is a simple array to get a  value from a given
        GID (glyph  ID); the index of  this array is a  GID too.  Therefore,
        the length  of the array is expected  to be same as  the maximum GID
        value defined  in the `maxp' table,  but there are  some fonts whose
        LookupTable format 0 is too  short to cover all GIDs.  FontValidator
        ignores  this error silently,  ftxvalidator and  ftxdumperfuser both
        warn and continue.  Similar problems are found in format 3 subtables
        of `kern'.  gxvalid  warns always and abort if  the validation level
        is set to FT_VALIDATE_PARANOID.
    
          5 fonts include too-short kern format 0 subtables.
          1 font includes too-short kern format 3 subtable.
    
      4-3. broken LookupTable format 2 (1/183)
      ----------------------------------------
    
        LookupTable  format  2,  subformat  4  covers the  GID  space  by  a
        collection  of  segments which  are  specified  by `firstGlyph'  and
        `lastGlyph'.   Some  fonts  store  `firstGlyph' and  `lastGlyph'  in
        reverse order,  so the segment specification is  broken.  Apple font
        tools ignore this error silently;  a broken segment is ignored as if
        it  did not  exist.   gxvalid  warns and  normalize  the segment  at
        FT_VALIDATE_DEFAULT, or ignore  the segment at FT_VALIDATE_TIGHT, or
        abort at FT_VALIDATE_PARANOID.
    
          1 font includes broken LookupTable format 2, in the `just' table.
    
        *) It seems  that all fonts manufactured by  ITC for AppleWorks have
           this error.
    
      4-4. bad bracketing in glyph property (14/183)
      ----------------------------------------------
    
        GX/AAT defines a  `bracketing' property of the glyphs  in the `prop'
        table,  to control layout  features of  strings enclosed  inside and
        outside  of   brackets.   Some  fonts   give  inappropriate  bracket
        properties  to glyphs.   Apple  font tools  warn  about this  error;
        gxvalid warns too and aborts at FT_VALIDATE_PARANOID.
    
          14 fonts include wrong bracket properties.
    
    
      4-5. invalid feature number (117/183)
      -------------------------------------
    
        The GX/AAT  extension can  include 255 different  layout features,
        but    popular    layout     features    are    predefined    (see
        https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html).
        Some fonts include feature numbers which are incompatible with the
        predefined feature registry.
    
        In our survey, there are 140 fonts including `feat' table.
    
        a) 67 fonts use a feature number which should not be used.
        b) 117 fonts set the wrong feature range (nSetting).  This is mostly
           found in the `mort' and `morx' tables.
    
        Apple  font tools give  no warning,  although they  cannot recognize
        what  the feature  is.   At FT_VALIDATE_DEFAULT,  gxvalid warns  but
        continues in both cases (a, b).  At FT_VALIDATE_TIGHT, gxvalid warns
        and aborts for (a), but continues for (b).  At FT_VALIDATE_PARANOID,
        gxvalid warns and aborts in both cases (a, b).
    
      4-6. invalid prop version (10/183)
      ----------------------------------
    
        As most TrueType GX tables, the `prop' table must start with a 32bit
        version identifier: 0x00010000,  0x00020000 or 0x00030000.  But some
        fonts  store nonsense binary  data instead.   When Apple  font tools
        find them, they abort the processing immediately, and the data which
        follows is unhandled.  gxvalid does the same.
    
          10 fonts include broken `prop' version.
    
        All  of these  fonts are  classic  TrueType fonts  for the  Japanese
        script, manufactured by Apple.
    
      4-7. unknown resource name (2/183)
      ------------------------------------
    
        NOTE: THIS IS NOT A TRUETYPE GX ERROR.
    
        If  a TrueType  font is  stored  in the  resource fork  or in  dfont
        format, the data must be tagged as `sfnt' in the resource fork index
        to invoke TrueType font handler for the data.  But the TrueType font
        data  in   `Keyboard.dfont'  is  tagged   as  `kbd',  and   that  in
        `LastResort.dfont' is tagged as  `lst'.  Apple font tools can detect
        that the data is in  TrueType format and successfully validate them.
        Maybe  this is possible  because they  are known  to be  dfont.  The
        current  implementation  of the  resource  fork  driver of  FreeType
        cannot do that, thus gxvalid cannot validate them.
    
          2 fonts use an unknown tag for the TrueType font resource.
    
    5. `kern' table issues
    ----------------------
    
      In common terminology of TrueType, `kern' is classified as a basic and
      platform-independent table.  But there are Apple extensions of `kern',
      and  there is  an  extension which  requires  a GX  state machine  for
      contextual kerning.   Therefore, gxvalid includes  a special validator
      for  `kern' tables.   Unfortunately, there  is no  exact  algorithm to
      check Apple's extension, so  gxvalid includes a heuristic algorithm to
      find  the proper validation  routines for  all possible  data formats,
      including    the   data    format   for    Microsoft.     By   calling
      classic_kern_validate() instead of gxv_validate(), you can specify the
      `kern' format  explicitly.  However, current  FreeType2 uses Microsoft
      `kern' format  only, others  are ignored (and  should be handled  in a
      library one level higher than FreeType).
    
      5-1. History
      ------------
    
        The original  16bit version of `kern'  was designed by  Apple in the
        pre-GX  era, and  it was  also approved  by  Microsoft.  Afterwards,
        Apple designed a  new 32bit version of the  `kern' table.  According
        to  the documentation, the  difference between  the 16bit  and 32bit
        version is only the size of  variables in the `kern' header.  In the
        following,  we call  the original  16bit version  as  `classic', and
        32bit version as `new'.
    
      5-2. Versions and dialects which should be differentiated
      ---------------------------------------------------------
    
        The `kern' table  consists of a table header  and several subtables.
        The version number  which identifies a `classic' or  a `new' version
        is  explicitly   written  in  the   table  header,  but   there  are
        undocumented  differences between  Microsoft's and  Apple's formats.
        It is  called a `dialect' in  the following.  There  are three cases
        which  should  be  handled:   the  new  Apple-dialect,  the  classic
        Apple-dialect,  and the classic  Microsoft-dialect.  An  analysis of
        the formats and the auto detection algorithm of gxvalid is described
        in the following.
    
        5-2-1. Version detection: classic and new kern
        ----------------------------------------------
    
          According  to Apple  TrueType  specification, there  are only  two
          differences between the classic and the new:
    
            - The `kern' table header starts with the version number.
              The classic version starts with 0x0000 (16bit),
              the new version starts with 0x00010000 (32bit).
    
            - In the  `kern' table header,  the number of  subtables follows
              the version number.
              In the classic version, it is stored as a 16bit value.
              In the new version, it is stored as a 32bit value.
    
          From Apple font tool's output (DumpKERN is also tested in addition
          to  the  three  Apple  font  tools in  above),  there  is  another
          undocumented difference.  In the  new version, the subtable header
          includes a 16bit variable  named `tupleIndex' which does not exist
          in the classic version.
    
          The new version  can store all subtable formats (0,  1, 2, and 3),
          but the Apple TrueType specification does not mention the subtable
          formats available in the classic version.
    
        5-2-2. Available subtable formats in classic version
        ----------------------------------------------------
    
          Although the  Apple TrueType  specification recommends to  use the
          classic version in  the case if the font is  designed for both the
          Apple and Microsoft platforms,  it does not document the available
          subtable formats in the classic version.
    
          According  to the Microsoft  TrueType specification,  the subtable
          format  assured for  Windows  and OS/2  support  is only  subtable
          format  0.  The  Microsoft TrueType  specification  also describes
          subtable format  2, but does  not mention which  platforms support
          it.  Subtable formats 1, 3,  and higher are documented as reserved
          for future use.  Therefore, the classic version can store subtable
          formats 0 and 2, at least.  `ttfdump.exe', a font tool provided by
          Microsoft,  ignores the  subtable format  written in  the subtable
          header, and parses the table as if all subtables are in format 0.
    
          `kern'  subtable format  1  uses  a StateTable,  so  it cannot  be
          utilized without a GX  State Machine.  Therefore, it is reasonable
          to assume  that format 1 (and  3) were introduced  after Apple had
          introduced GX and moved to the new 32bit version.
    
        5-2-3. Apple and Microsoft dialects
        -----------------------------------
    
          The  `kern' subtable  has  a 16bit  `coverage'  field to  describe
          kerning attributes, but bit interpretations by Apple and Microsoft
          are different:  For example, Apple  uses bits 0-7 to  identify the
          subtable, while Microsoft uses bits 8-15.
    
          In  addition, due  to the  output of  DumpKERN  and FontValidator,
          Apple's bit interpretations of coverage in classic and new version
          are  incompatible also.   In  summary, there  are three  dialects:
          classic Apple  dialect, classic  Microsoft dialect, and  new Apple
          dialect.  The classic Microsoft  dialect and the new Apple dialect
          are documented  by each vendors' TrueType  font specification, but
          the documentation for classic Apple dialect is not available.
    
          For example,  in the  new Apple dialect,  bit 15 is  documented as
          `set to  1 if  the kerning  is vertical'.  On  the other  hand, in
          classic Microsoft dialect, bit 1 is documented as `set to 1 if the
          kerning  is  horizontal'.   From   the  outputs  of  DumpKERN  and
          FontValidator, classic  Apple dialect recognizes  15 as `set  to 1
          when  the kerning  is horizontal'.   From the  results  of similar
          experiments, classic Apple dialect  seems to be the Endian reverse
          of the classic Microsoft dialect.
    
          As a  conclusion it must be  noted that no font  tool can identify
          classic Apple dialect or classic Microsoft dialect automatically.
    
        5-2-4. gxvalid auto dialect detection algorithm
        -----------------------------------------------
    
          The first 16  bits of the `kern' table are  enough to identify the
          version:
    
            - if  the first  16  bits are  0x0000,  the `kern'  table is  in
              classic Apple dialect or classic Microsoft dialect
            - if the first 16 bits are  0x0001, and next 16 bits are 0x0000,
              the kern table is in new Apple dialect.
    
          If the `kern'  table is a classic one,  the 16bit `coverage' field
          is checked next.   Firstly, the coverage bits are  decoded for the
          classic Apple dialect using the following bit masks (this is based
          on DumpKERN output):
    
            0x8000: 1=horizontal, 0=vertical
            0x4000: not used
            0x2000: 1=cross-stream, 0=normal
            0x1FF0: reserved
            0x000F: subtable format
    
          If  any  of  reserved  bits  are  set  or  the  subtable  bits  is
          interpreted as format 1 or 3, we take it as `impossible in classic
          Apple dialect' and retry, using the classic Microsoft dialect.
    
            The most popular coverage in new Apple-dialect:         0x8000,
            The most popular coverage in classic Apple-dialect:     0x0000,
            The most popular coverage in classic Microsoft dialect: 0x0001.
    
      5-3. Tested fonts
      -----------------
    
        We checked  59 fonts  bundled with MacOS  and 38 fonts  bundled with
        Windows, where all font include a `kern' table.
    
          - fonts bundled with MacOS
            * new Apple dialect
              format 0: 18
              format 2:  1
              format 3:  1
            * classic Apple dialect
              format 0: 14
            * classic Microsoft dialect
              format 0: 15
    
          - fonts bundled with Windows
            * classic Microsoft dialect
              format 0: 38
    
        It looks strange that classic Microsoft-dialect fonts are bundled to
        MacOS: they come from MSIE for MacOS, except of MarkerFelt.dfont.
    
    
      ACKNOWLEDGEMENT
      ---------------
    
      Some parts of gxvalid are  derived from both the `gxlayout' module and
      the `otvalid'  module.  Development of  gxlayout was supported  by the
      Information-technology Promotion Agency(IPA), Japan.
    
      The detailed analysis of undefined  glyph ID utilization in `mort' and
      `morx' tables is provided by George Williams.
    
    ------------------------------------------------------------------------
    
    Copyright (C) 2004-2021 by
    suzuki toshiya, Masatake YAMATO, Red hat K.K.,
    David Turner, Robert Wilhelm, and Werner Lemberg.
    
    This  file is  part  of the  FreeType  project, and  may  only be  used,
    modified,  and  distributed under  the  terms  of  the FreeType  project
    license, LICENSE.TXT.  By continuing  to use, modify, or distribute this
    file  you indicate that  you have  read the  license and  understand and
    accept it fully.
    
    
    --- end of README ---