|
0fbbf749
|
2025-09-10T21:11:44
|
|
[ot-tags] Update IANA subtags to 2025-08-25 (#5537)
|
|
31b22016
|
2024-10-03T14:16:54
|
|
[ot-tags] Update IANA and OT language registries
|
|
c2b5b7b9
|
2024-06-01T12:48:17
|
|
[ot-tags] Update IANA and OT language registries
|
|
86942e9a
|
2024-03-08T18:12:56
|
|
[ot-tags] Let Võro fall back to Estonian
|
|
88868411
|
2024-03-08T18:11:45
|
|
[ot-tags] Remove obsolete overrides
|
|
f3727c47
|
2024-04-04T19:04:59
|
|
Recognize ot_languages2’s disambiguation priority
|
|
0692d23c
|
2024-03-07T17:30:56
|
|
Update IANA Language Subtag Registry to 2024-03-07
|
|
a7960bdf
|
2022-06-17T15:10:20
|
|
[config] Add HB_NO_LANGUAGE_LONG and enable in TINY profile
Disables 3letter language tags and more complex ones.
Fixes https://github.com/harfbuzz/harfbuzz/issues/3664
|
|
e3e685e5
|
2022-05-18T15:05:55
|
|
[ot-tags] Fix `min_subtag_len` calculations
|
|
e24797ae
|
2022-05-18T11:10:10
|
|
[ot-tags] Follow-up to previous commit
Part of https://github.com/harfbuzz/harfbuzz/issues/3591
|
|
f5d619be
|
2022-05-18T11:04:52
|
|
[ot-tags] Further gate the slow complex case, and add more tests
Part of https://github.com/harfbuzz/harfbuzz/issues/3591
Still 'zh-trad' is the slowest case.
--------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON zh_trad 136 ns 136 ns 5107838
BM_hb_ot_tags_from_script_and_language/COMMON ab_abcd 115 ns 115 ns 6103104
BM_hb_ot_tags_from_script_and_language/COMMON ab_abc 25.4 ns 25.3 ns 27674482
BM_hb_ot_tags_from_script_and_language/COMMON abcdef_XY 20.2 ns 20.1 ns 34795719
BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 19.4 ns 19.3 ns 36390401
BM_hb_ot_tags_from_script_and_language/COMMON cxy_CN 33.5 ns 33.4 ns 20998939
BM_hb_ot_tags_from_script_and_language/COMMON exy_CN 25.1 ns 25.0 ns 27705832
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 34.2 ns 34.1 ns 20564356
BM_hb_ot_tags_from_script_and_language/COMMON en_US 15.5 ns 15.5 ns 45032204
BM_hb_ot_tags_from_script_and_language/LATIN en_US 15.9 ns 15.8 ns 44412379
BM_hb_ot_tags_from_script_and_language/COMMON none 4.72 ns 4.71 ns 149101665
BM_hb_ot_tags_from_script_and_language/LATIN none 4.72 ns 4.70 ns 149254498
|
|
3df8017e
|
2022-05-17T17:29:39
|
|
[ot-tag] Optimize subtag_matches() more
|
|
909f00ac
|
2022-05-17T15:51:41
|
|
[ot-tags] Further speed up language bsearch()
Using an integer tag to bsearch, instead of string.
Part of: https://github.com/harfbuzz/harfbuzz/issues/3591
Before:
------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.11 ns 8.08 ns 87067795
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 53.6 ns 53.5 ns 13042418
BM_hb_ot_tags_from_script_and_language/COMMON en_US 24.2 ns 24.1 ns 29052731
BM_hb_ot_tags_from_script_and_language/LATIN en_US 24.4 ns 24.3 ns 28736769
BM_hb_ot_tags_from_script_and_language/COMMON none 4.43 ns 4.41 ns 160370413
BM_hb_ot_tags_from_script_and_language/LATIN none 4.35 ns 4.34 ns 160578191
After:
------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 7.97 ns 7.95 ns 85208363
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 41.7 ns 41.6 ns 16945817
BM_hb_ot_tags_from_script_and_language/COMMON en_US 16.1 ns 16.0 ns 43613523
BM_hb_ot_tags_from_script_and_language/LATIN en_US 16.5 ns 16.4 ns 42568107
BM_hb_ot_tags_from_script_and_language/COMMON none 4.30 ns 4.29 ns 164055469
BM_hb_ot_tags_from_script_and_language/LATIN none 4.29 ns 4.27 ns 163793591
|
|
15be0ded
|
2022-05-17T14:57:08
|
|
[ot-tags] Optimize lang_matches()
Part of https://github.com/harfbuzz/harfbuzz/issues/3591
Before:
------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.67 ns 8.64 ns 80324382
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 91.2 ns 90.9 ns 7674131
BM_hb_ot_tags_from_script_and_language/COMMON en_US 41.1 ns 41.0 ns 17174093
BM_hb_ot_tags_from_script_and_language/LATIN en_US 41.3 ns 41.2 ns 17000876
BM_hb_ot_tags_from_script_and_language/COMMON none 4.56 ns 4.55 ns 153914130
BM_hb_ot_tags_from_script_and_language/LATIN none 4.53 ns 4.52 ns 153830303
After:
------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.24 ns 8.21 ns 84078465
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 77.5 ns 77.2 ns 9059230
BM_hb_ot_tags_from_script_and_language/COMMON en_US 38.8 ns 38.7 ns 17790692
BM_hb_ot_tags_from_script_and_language/LATIN en_US 37.6 ns 37.5 ns 18648293
BM_hb_ot_tags_from_script_and_language/COMMON none 4.50 ns 4.49 ns 155573267
BM_hb_ot_tags_from_script_and_language/LATIN none 4.49 ns 4.47 ns 156456653
|
|
dd3c858f
|
2022-05-17T14:28:28
|
|
[ot-tags] Speed up hb_ot_tags_from_language()
Part of https://github.com/harfbuzz/harfbuzz/issues/3591
"After that, bulk of the time I suppose is spent in binary-searching the
language table. I suggest we split the language table in 2-letter and
3-letter tags, to speed-up the vast majority of cases that are
2-letter."
benchmark-ot, before:
----------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 112 ns 111 ns 6286271
BM_hb_ot_tags_from_script_and_language/COMMON en_US 60.6 ns 60.4 ns 11671176
BM_hb_ot_tags_from_script_and_language/LATIN en_US 61.3 ns 61.1 ns 11442645
BM_hb_ot_tags_from_script_and_language/COMMON none 4.75 ns 4.74 ns 146997235
BM_hb_ot_tags_from_script_and_language/LATIN none 4.65 ns 4.64 ns 150938747
After:
----------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 89.5 ns 89.2 ns 7747649
BM_hb_ot_tags_from_script_and_language/COMMON en_US 38.5 ns 38.4 ns 18199432
BM_hb_ot_tags_from_script_and_language/LATIN en_US 39.0 ns 38.9 ns 18049238
BM_hb_ot_tags_from_script_and_language/COMMON none 4.53 ns 4.52 ns 154895110
BM_hb_ot_tags_from_script_and_language/LATIN none 4.54 ns 4.52 ns 154762105
|
|
9baccb98
|
2022-05-17T13:34:34
|
|
[ot-tags] Speed up hb_ot_tags_from_complex_language()
Part of https://github.com/harfbuzz/harfbuzz/issues/3591
2. All the subtag_matches outside the switch match long strings (>= 6 or so).
As such, check the tag for such length before going into any of them.
benchmark-ot, before:
----------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 172 ns 171 ns 4083155
BM_hb_ot_tags_from_script_and_language/COMMON en_US 120 ns 119 ns 5849947
BM_hb_ot_tags_from_script_and_language/LATIN en_US 113 ns 112 ns 5840326
BM_hb_ot_tags_from_script_and_language/COMMON none 4.66 ns 4.64 ns 151396224
BM_hb_ot_tags_from_script_and_language/LATIN none 4.66 ns 4.64 ns 149019593
After:
----------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 112 ns 112 ns 6357763
BM_hb_ot_tags_from_script_and_language/COMMON en_US 60.5 ns 60.3 ns 11475091
BM_hb_ot_tags_from_script_and_language/LATIN en_US 54.9 ns 54.8 ns 12575690
BM_hb_ot_tags_from_script_and_language/COMMON none 4.61 ns 4.59 ns 152388450
BM_hb_ot_tags_from_script_and_language/LATIN none 4.66 ns 4.64 ns 151497600
|
|
ae9afd97
|
2021-10-03T20:09:33
|
|
Let BCP 47 tag "mo" fall back to OT tag 'ROM '
|
|
a184c5f8
|
2022-01-30T13:28:23
|
|
Don’t always inherit from macrolanguages
If an OpenType tag maps to a BCP 47 macrolanguage, that is presumably to
support the use of the macrolanguage as a vague stand-in for one of its
individual languages. For example, "ar" and "zh" are often used for
"arb" and "cmn". When the OpenType tag maps to a macrolanguage and some
but not all of its individual languages, that indicates that the
OpenType tag only corresponds to the listed individual languages (which
may be referred to using the macrolanguage subtag) but not the missing
individual languages. In particular, INUK (Nunavik Inuktitut) is mapped
to "ike" (Eastern Canadian Inuktitut) and "iu" (Inuktitut) but not to
"ikt" (Inuinnaqtun), so "ikt" should not inherit the INUK mapping from
its macrolanguage "iu".
|
|
0b1bf89c
|
2022-01-28T22:27:51
|
|
Replace “[family]” with “[collection]”
Not all language collections are language families.
|
|
0e31595e
|
2022-01-28T22:26:38
|
|
Infer tag mappings for unregistered macrolanguages
Every macrolanguage not mentioned in the OT language system tag registry
is mapped to every tag of its individual languages, if those have
registered tags.
|
|
2404617a
|
2021-12-08T21:10:22
|
|
Update language system tag registry to OT 1.9
|
|
d18915f9
|
2021-03-28T10:09:13
|
|
Reformat gen-tag-table.py
|
|
e19de65e
|
2021-03-08T13:12:47
|
|
Update hb-ot-tag-table.hh (#2890)
|
|
b2e7bb2a
|
2020-10-27T19:50:33
|
|
Don’t map BCP 47 to coincidentally similar OT tag
|
|
e1df2c52
|
2020-10-26T19:16:35
|
|
Map ISO 639 code qul to language system tag 'QUH '
|
|
17da41bd
|
2020-11-17T14:29:05
|
|
Update language system tag registry to OT 1.8.4
|
|
27170e05
|
2020-10-28T18:02:55
|
|
Fix names for language tag in gen-tag-table.py
A BCP 47 language tag with both a script subtag and a region subtag
would be printed as a human-readable name in hb-ot-tag-table.hh as if it
only had its language subtag.
|
|
dec52006
|
2020-10-10T14:49:55
|
|
Map BCP 47 tags to all macrolanguages
The general rule is that if a BCP 47 macrolanguage maps to an OpenType
language system tag, all its individual languages map to it too.
Previously, a tag like "prs" (Dari) would not map to the language system
tag ('FAR ') of its macrolanguage ("fa") because "prs" already has its
own language system tag ('DRI '). That exception has been removed: now
"prs" maps to 'DRI ' and falls back to 'FAR '.
|
|
1d53268d
|
2020-10-10T14:46:36
|
|
Fix two-way mapping of "man" and 'MNK '
|
|
ab38cf67
|
2020-10-10T14:21:20
|
|
Map hy-arevmda to 'HYE ' instead of HYE0
|
|
916c5a90
|
2020-10-10T14:15:16
|
|
Consistently emit BCP 47 subtag scope suffixes
|
|
ac3f859a
|
2020-09-09T11:49:56
|
|
Demote unregistered vendor-specific language tags
|
|
91fe20f0
|
2020-09-04T09:18:19
|
|
Disambiguate OT tags when primary tag is not first
|
|
ad87155f
|
2020-05-29T00:11:19
|
|
minor, use py3's open(encoding=)
|
|
7554f618
|
2020-05-28T22:51:29
|
|
minor, use sys.exit print shorthand
|
|
08f1d95a
|
2020-05-28T15:01:15
|
|
minor, move scripts manuals to __doc__
|
|
7a961692
|
2020-04-01T17:26:07
|
|
Update IANA Language Subtag Registry to 2020-05-12
|
|
fd748fac
|
2020-03-15T15:59:31
|
|
Update to Unicode 13.0.0
|
|
e17fd0d9
|
2020-02-23T23:58:39
|
|
[tools] More on py3 compatibility
|
|
8c652f72
|
2020-02-19T16:32:44
|
|
Minor, switch to https links where possible
|
|
bbcbcafc
|
2020-02-19T16:21:47
|
|
[tool] Minor, move input files link
|
|
8d199077
|
2020-02-19T14:56:55
|
|
Remove python2 support from tests/utils scripts
|
|
4dc87365
|
2020-02-09T18:39:33
|
|
Add links to files used by python scripts.
Closes #2150
|
|
6745a600
|
2019-04-16T17:29:34
|
|
Comment out ot_languages where fallback suffices
|
|
1ce11b44
|
2019-04-16T10:04:45
|
|
Reduce LangTag from 3 language system tags to 1
|
|
3f887747
|
2018-07-19T13:48:07
|
|
Switch on the first char of a complex language tag
This results in a tenfold speed-up for the common case of tags that are
not complex, in the sense of `hb_ot_tags_from_complex_language`.
|
|
a754d441
|
2018-07-16T21:14:48
|
|
Map Quechua languages to closest ones with tags
OpenType only officially maps four ISO 639 codes to Quechua languages,
but prior versions of HarfBuzz also mapped qu to 'QUZ '. Because qu is a
macrolanguage, the mapping now applies to all individual Quechua
languages. OpenType calls 'QUZ ' "Quechua", but it really corresponds to
Cusco Quechua, so the individual Quechua languages should not all
necessarily be mapped to it.
|
|
7c7cb2a9
|
2018-01-20T15:53:09
|
|
Match extlang subtags
If the second subtag of a BCP 47 tag is three letters long, it denotes
an extended language. The tag converter ignores the language subtag and
uses the extended language instead.
There are some grandfathered exceptions, which are handled earlier.
|
|
2f1f961c
|
2017-12-08T22:45:52
|
|
Autogenerate the BCP 47 to OpenType mappings
The new script, gen-tag-table.py, generates `ot_languages` automatically
from the [OpenType language system tag registry][ot] and the [IANA
Language Subtag Registry][bcp47] with some manual modifications. If an
OpenType tag maps to a BCP 47 macrolanguage, all the macrolanguage's
individual languages are mapped to the same OpenType tag, except for
individual languages with their own OpenType mappings. Deprecated
BCP 47 tags are canonicalized.
[ot]: https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags
[bcp47]: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
Some OpenType tags correspond to multiple ISO 639 codes. The mapping
from ISO 639 codes lists OpenType tags in priority order, such that more
specific or more likely tags appear first.
Some OpenType tags have no corresponding ISO 639 code in the registry so
their mappings use BCP 47 subtags besides the language. For example, any
BCP 47 tag with a fonipa variant subtag is mapped to 'IPPH', and 'IPPH'
is mapped back to und-fonipa.
Other OpenType tags have no corresponding ISO 639 code because it is not
clear what they are for. HarfBuzz just ignores these tags.
One such ignored tag is 'ZHP ' (Chinese Phonetic). It probably means
zh-Latn. However, it is used in Microsoft JhengHei and Microsoft YaHei
with the script tag 'hani', implying that it is not a romanization
scheme after all. It would be simple enough to add this mapping to
gen-tag-table.py once a definitive mapping is determined.
The manual modifications are mainly either obvious mappings that the
OpenType registry omits or mappings for compatibility with previous
versions of HarfBuzz. Some of the old mappings were discarded, though,
for homophonous language names. For example, OpenType maps 'KUI ' to
kxu; previous versions of HarfBuzz also mapped it to kvd, because kvd
and kxu both happen to be called "Kui".
gen-tag-table.py also generates a function to convert multi-subtag tags
like el-polyton and zh-HK to OpenType tags, replacing `ot_languages_zh`
and the hard-coded list of special cases in `hb_ot_tags_from_language`.
It also generates a function to convert OpenType tags to BCP 47,
replacing the hard-coded list of special cases in
`hb_ot_tag_to_language`.
|
|
bca7a169
|
2018-09-10T12:05:51
|
|
Update language system tag registry to OT 1.8.3
|