src/gen-indic-table.py


Log

Author Commit Date CI Message
Behdad Esfahbod 9909d11f 2022-06-28T15:59:40 [indic generator] Fix regression Fixes https://github.com/harfbuzz/harfbuzz/issues/3690
David Corbett 78c5ae39 2022-06-25T13:32:04 [indic] Remove remnants of Sinhala
David Corbett 1555b300 2022-06-24T21:02:26 Add U+25CC to lone Robatic but not after U+17D9
David Corbett 0f15cb12 2022-06-24T20:37:01 [indic-table] Fix block headers
Behdad Esfahbod 99a26bc1 2022-06-15T16:14:31 [indic-generator] Fix typo
Behdad Esfahbod 2cbb7758 2022-06-11T08:57:21 [myanmar] Fold category P into GB Fixes https://github.com/harfbuzz/harfbuzz/issues/3649 This actually now allows Asat after the Myanmar punctuation marks; something I see in Wikipedia data.
Behdad Esfahbod b350e301 2022-06-11T08:52:11 [myanmar] Remove category D completely Fixes https://github.com/harfbuzz/harfbuzz/issues/3651
Behdad Esfahbod 8533214a 2022-06-11T08:49:36 [khmer] Fold category Coeng completely into category H
Behdad Esfahbod 607a9fe7 2022-06-11T04:20:23 [indic-like] Remove category duplication
Behdad Esfahbod 02016914 2022-06-10T17:24:19 [indic-generator] Remove unnecessary Myanmar category=D overrides https://github.com/harfbuzz/harfbuzz/pull/3648#discussion_r894685106
Behdad Esfahbod 937c8780 2022-06-10T17:20:15 [indic-generator] Remove unnecessary override for Myanmar U+1039 https://github.com/harfbuzz/harfbuzz/pull/3648#discussion_r894762535
Behdad Esfahbod 9504037c 2022-06-10T17:13:16 [indic-generator] Remove three unneeded Myanmar overrides U+AA74-6 These three characters have Indic_Syllabic_Category=Consonant_Placeholder. The original evidence that prompted these overrides says they can take tone marks. They are not subjoined: Khamti Shan apparently does not use subjoined characters at all. Therefore, PLACEHOLDER is good enough and these need not be overridden to C. https://www.unicode.org/L2/L2008/08276-khamti-proposal.pdf https://github.com/harfbuzz/harfbuzz/pull/3648#discussion_r894640713
Behdad Esfahbod 02eb6606 2022-06-10T17:10:42 [indic-generator] Remove redundant PLACEHODER characters overrides https://github.com/harfbuzz/harfbuzz/pull/3648#discussion_r894631922
Behdad Esfahbod e16669ce 2022-06-10T17:05:35 [indic-generator] Remove redundant override of U+2010 / U+2011 https://github.com/harfbuzz/harfbuzz/pull/3648#discussion_r894630596
Behdad Esfahbod bb255cd9 2022-06-10T17:03:52 [indic-generator] Remove redundant override of U+0980 https://github.com/harfbuzz/harfbuzz/pull/3648#discussion_r894627064
Behdad Esfahbod eb2f2e31 2022-06-10T16:47:59 [indic-generator] Update comment re U+104E https://github.com/harfbuzz/harfbuzz/pull/3648#pullrequestreview-1002150048
Behdad Esfahbod 165ef55e 2022-06-10T06:20:10 [indic-generator] Move INDIC_COMBINE_CATEGORIES here
Behdad Esfahbod b030dd9e 2022-06-10T06:12:13 [indic-table] Minor rename
Behdad Esfahbod 37217fc9 2022-06-09T16:43:50 [indic-generator/myanmar] Move most Myanmar category overrides to generator
Behdad Esfahbod c136227f 2022-06-09T13:36:19 [indic-generator/khmer] Move Khmer overrides to generator
Behdad Esfahbod 25793075 2022-06-09T13:11:46 [indic-generator] Move Khmer/Myanmar vowel categories to the generator
Behdad Esfahbod 10cd8ac0 2022-06-09T12:27:31 [indic-generator] Move matra category overrides to generator
Behdad Esfahbod c4e4f1d3 2022-06-09T11:56:57 [indic-generator] Move SMVD position overrides to generator
Behdad Esfahbod 2963154c 2022-06-09T11:49:02 [indic-generator] Add a couple comments
Behdad Esfahbod 91d6f45b 2022-06-09T07:34:44 [indic-generator] Move some position overrides to the generator
Behdad Esfahbod 0ec4dcb9 2022-06-09T07:33:43 [indic-generator] Ouch Not sure how this was passing tests still.
Behdad Esfahbod f0269e0f 2022-06-09T07:10:47 [indic-generator] Move Ra handling to the generator
Behdad Esfahbod 419d2146 2022-06-09T07:01:14 [indic-generator] Cap off what categories have positions This was left off of the commit moving Indic categories to the generator. It didn't fail any tests, but adding it back because it has implications possibly.
Behdad Esfahbod e1d965d5 2022-06-09T06:48:25 [indic-generator] Move position mapping to generator
Behdad Esfahbod 49075140 2022-06-09T06:33:51 [indic-generator] Move category overrides to generator
Behdad Esfahbod 58eeb3a1 2022-06-09T05:34:49 [indic-generator] Move category mapping to generator
Behdad Esfahbod 5bfb0b72 2022-06-03T02:56:41 Rename s/shape-complex/shaper/g
Behdad Esfahbod 676d1e6a 2021-01-29T19:53:39 [indic] Spell out INDIC_TABLE_ELEMENT_TYPE
Ebrahim Byagowi 6937092a 2020-07-13T21:32:15 [py] apply lgtm.com python suggestions
Ebrahim Byagowi 82c6ddb9 2020-07-03T15:09:10 [py] remove not needed imports
Ebrahim Byagowi ad87155f 2020-05-29T00:11:19 minor, use py3's open(encoding=)
Ebrahim Byagowi 7554f618 2020-05-28T22:51:29 minor, use sys.exit print shorthand
Ebrahim Byagowi 08f1d95a 2020-05-28T15:01:15 minor, move scripts manuals to __doc__
David Corbett fd748fac 2020-03-15T15:59:31 Update to Unicode 13.0.0
Ebrahim Byagowi 8d199077 2020-02-19T14:56:55 Remove python2 support from tests/utils scripts
Ebrahim Byagowi 6a390df8 2020-02-10T17:19:23 [tools] Print unicode links on gen-* tools output As Behdad's review
Evgeniy Reizner 4dc87365 2020-02-09T18:39:33 Add links to files used by python scripts. Closes #2150
Adrian Wong b6607681 2019-08-28T21:31:27 Adjustments to the generated Indic table output (#1936) * Add empty parentheses after print call * Minor: newlines. Move #pragma pop down one; #endif up one * Adjust #define ISC/IMC output * Regenerate Indic table
Behdad Esfahbod 7aad5365 2019-06-26T13:21:03 [config] Add HB_NO_OT_SHAPE / HB_NO_OT Part of https://github.com/harfbuzz/harfbuzz/issues/1652
David Corbett cb758f26 2019-03-08T09:46:48 Remove obsolete overrides from Indic/USE scripts
David Corbett 8c42f032 2019-03-08T09:46:48 Remove obsolete overrides from Indic/USE scripts
Behdad Esfahbod 8874eef8 2019-01-17T15:04:44 Add pragram GCC diagnostic ignored "-Wunused-macros"
Behdad Esfahbod c77ae408 2018-08-25T22:36:36 Rename hb-*private.hh to hb-*.hh Sorry for the noise, downstream custom builders. Please adjust.
Ebrahim Byagowi 80395f14 2018-03-29T22:00:41 Make gen-* scripts LC_ALL=C compatible (#942)
Ebrahim Byagowi 26e0cbd8 2018-03-29T21:22:47 Actual py3 compatibility making on gen-* scripts (#941)
Ebrahim Byagowi cab2c2c0 2018-03-29T12:48:47 Make more gen-* scripts py3 compatible (#940)
Behdad Esfahbod 308f4192 2018-01-03T14:22:07 [use] Fix Brahmi Number Joiner 1107F Fixes https://github.com/harfbuzz/harfbuzz/pull/660
Behdad Esfahbod 216b003c 2017-07-14T16:38:51 [use] Fix shaping of U+AA29 CHAM VOWEL SIGN AA Part of https://github.com/behdad/harfbuzz/issues/376 Also see https://github.com/roozbehp/unicode-data/issues/6 Test added, using NotoSansCham built from Noto Phase III sources.
Behdad Esfahbod 30e6e29f 2016-05-06T15:52:27 [indic/use] Move Javanese from Indic shaper to USE Fixes https://github.com/behdad/harfbuzz/issues/243 With javatext.ttf, the reodering medial Ra gets its advance width zero'ed in Uniscribe implementation, and the font adds the advance back. Our Indic shaper does not do that, but USE does. So, route Javanese through USE. That's what Microsoft does anyway. Test: U+A9A5,U+A9BA This also seems to fix the following sequence, and variations thereof: U+A99F,U+A9C0,U+A9A2,U+A9BF
Behdad Esfahbod 01a30a6a 2016-05-06T11:50:02 [indic] Remove data for scripts that don't go thorough this shaper
Behdad Esfahbod f718fe37 2016-05-06T11:21:12 Minor
Behdad Esfahbod 2813e304 2015-12-18T11:05:11 [indic] Update data tables to Unicode 8.0 Test stats remain unchanged, except for Malayalam, which we investigate: BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%) DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%) GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%) GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%) KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%) KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%) MALAYALAM: 1047584 out of 1048334 tests passed. 750 failed (0.0715421%) ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%) SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%) TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%) TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%) Myanmar, compared to Windows 10 mmrtext.ttf: MYANMAR: 1123865 out of 1123883 tests passed. 18 failed (0.00160159%)
Behdad Esfahbod 1aaa7d67 2015-01-17T20:16:56 [indic] Fix out-of-bounds access
Behdad Esfahbod c09a607a 2014-07-11T15:05:36 Use hb_in_range() for arabic and indic tables Though, looks like gcc was smart enough to produce the same code before...
Behdad Esfahbod d743ce78 2014-06-30T15:24:02 [indic-table] Update to Unicode 7.0 data Touch code just enough to preserve previous syllable structure and functionality as closely as possible. Many further cleanups coming later.
Behdad Esfahbod 5fa21b3a 2014-06-30T14:30:54 [indic-table] Fix category frequency counts in comments
Behdad Esfahbod 89e49469 2014-06-22T11:32:13 Add new IndicSyllabicCategory short forms for Unicode 7.0
Behdad Esfahbod dcee838e 2014-06-22T11:29:59 Minor
Behdad Esfahbod f2ad86e6 2014-06-21T15:31:10 [indic-table-gen] Minor
Behdad Esfahbod a133e606 2014-06-20T18:01:34 [indic-table] Minor
Behdad Esfahbod c2e11340 2014-06-20T17:57:03 [indic-table] Make output stable
Behdad Esfahbod 55abfbd2 2014-06-20T16:47:43 [indic-table] Minor No output change.
Behdad Esfahbod 171f970e 2014-06-20T15:25:30 [indic-table] Black-list Thai, Lao, and Tibetan We don't need Indic table for those.
Behdad Esfahbod 65ac2dae 2014-06-20T15:12:49 [indic-table] Speed up lookup
Behdad Esfahbod 64442a3f 2014-06-20T14:58:53 [indic-table] Fix compiler warning
Behdad Esfahbod 0436e1d5 2014-06-20T14:56:22 [indic-table] Make table more compact by not covering full blocks -#define indic_offset_total 4416 +#define indic_offset_total 3816 -}; /* Table occupancy: 60% */ +}; /* Table occupancy: 69% */
Behdad Esfahbod 190a2514 2014-06-20T14:41:39 [indic-table] Remove block range from data table No functional change.
Behdad Esfahbod 3a83d33e 2013-02-12T12:14:10 Add South-East Asian shaper Handles Tai Tham, Cham, and New Tai Lue for now.
Behdad Esfahbod ae4a2b93 2012-04-10T16:25:08 Generate fallback Arabic shaping table Not hooked up yet.
Behdad Esfahbod 6d4016f1 2012-03-07T15:33:14 Make src tests pass again
Behdad Esfahbod cdc8b491 2012-03-07T12:08:33 Update Indic table to Unicode 6.1 data
Behdad Esfahbod d606daa4 2011-09-20T14:34:06 Whitespace
Behdad Esfahbod c4a59de6 2011-06-28T14:03:29 [Indic] Generate a single data table instead of multiple ones
Behdad Esfahbod 81426808 2011-06-13T16:02:18 Cosmetic
Behdad Esfahbod b9ddbd55 2011-06-02T17:43:12 [Indic] Start an Indic shaper Nothing functional in there yet. So far, we're parsing IndicSyllabicCategory.txt and IndicMatraCategory.txt fils from Unicode Character Database and store them in an array to be used by the shaper. Also hooked up the shaper, but it does not do anything right now.