|
dd3c858f
|
2022-05-17T14:28:28
|
|
[ot-tags] Speed up hb_ot_tags_from_language()
Part of https://github.com/harfbuzz/harfbuzz/issues/3591
"After that, bulk of the time I suppose is spent in binary-searching the
language table. I suggest we split the language table in 2-letter and
3-letter tags, to speed-up the vast majority of cases that are
2-letter."
benchmark-ot, before:
----------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 112 ns 111 ns 6286271
BM_hb_ot_tags_from_script_and_language/COMMON en_US 60.6 ns 60.4 ns 11671176
BM_hb_ot_tags_from_script_and_language/LATIN en_US 61.3 ns 61.1 ns 11442645
BM_hb_ot_tags_from_script_and_language/COMMON none 4.75 ns 4.74 ns 146997235
BM_hb_ot_tags_from_script_and_language/LATIN none 4.65 ns 4.64 ns 150938747
After:
----------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 89.5 ns 89.2 ns 7747649
BM_hb_ot_tags_from_script_and_language/COMMON en_US 38.5 ns 38.4 ns 18199432
BM_hb_ot_tags_from_script_and_language/LATIN en_US 39.0 ns 38.9 ns 18049238
BM_hb_ot_tags_from_script_and_language/COMMON none 4.53 ns 4.52 ns 154895110
BM_hb_ot_tags_from_script_and_language/LATIN none 4.54 ns 4.52 ns 154762105
|
|
9baccb98
|
2022-05-17T13:34:34
|
|
[ot-tags] Speed up hb_ot_tags_from_complex_language()
Part of https://github.com/harfbuzz/harfbuzz/issues/3591
2. All the subtag_matches outside the switch match long strings (>= 6 or so).
As such, check the tag for such length before going into any of them.
benchmark-ot, before:
----------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 172 ns 171 ns 4083155
BM_hb_ot_tags_from_script_and_language/COMMON en_US 120 ns 119 ns 5849947
BM_hb_ot_tags_from_script_and_language/LATIN en_US 113 ns 112 ns 5840326
BM_hb_ot_tags_from_script_and_language/COMMON none 4.66 ns 4.64 ns 151396224
BM_hb_ot_tags_from_script_and_language/LATIN none 4.66 ns 4.64 ns 149019593
After:
----------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------------
BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 112 ns 112 ns 6357763
BM_hb_ot_tags_from_script_and_language/COMMON en_US 60.5 ns 60.3 ns 11475091
BM_hb_ot_tags_from_script_and_language/LATIN en_US 54.9 ns 54.8 ns 12575690
BM_hb_ot_tags_from_script_and_language/COMMON none 4.61 ns 4.59 ns 152388450
BM_hb_ot_tags_from_script_and_language/LATIN none 4.66 ns 4.64 ns 151497600
|
|
a4d98b63
|
2022-05-16T17:02:40
|
|
[subset/cff1] Collect glyph-to-sid map to avoid an O(n^2) algorithm
Saves 13 for largest benchmark:
BM_subset/subset_glyphs/SourceHanSans-Regular_subset.otf/10000 -0.1313 -0.1308 75 65 75 65
BM_subset/subset_codepoints/SourceHanSans-Regular_subset.otf/4096 -0.1009 -0.1004 54 48 54 48
BM_subset/subset_codepoints/SourceHanSans-Regular_subset.otf/10000 -0.1067 -0.1066 70 62 69 62
|
|
b87f48e9
|
2022-05-16T16:33:31
|
|
[cff1] get_sid() move bounds check into each implementation
|
|
fb413f52
|
2022-05-16T17:08:43
|
|
[subset/cff] Don't use bitfields for hot bools
The struct has room because of alignment, and these bools are hot.
|
|
e1e359b4
|
2022-05-16T15:53:28
|
|
[cff1] Tighten up range_list_t a bit
|
|
3fbac094
|
2022-05-16T15:41:11
|
|
[cff1] Lazy-load & sort glyph names
Improves subset benchmarks by up to 70% for small CFF1 subset of
non-CID fonts!
BM_subset/subset_glyphs/SourceSansPro-Regular.otf/10 -0.7067 -0.7071 1 0 1 0
BM_subset/subset_glyphs/SourceSansPro-Regular.otf/64 -0.4817 -0.4824 1 0 1 0
BM_subset/subset_glyphs/SourceSansPro-Regular.otf/512 -0.1948 -0.1956 2 2 2 2
BM_subset/subset_glyphs/SourceSansPro-Regular.otf/2000 -0.0767 -0.0761 6 6 6 6
|
|
b58bfd98
|
2022-05-16T11:21:45
|
|
[font] Minor move of code to silence gcc-12 warning
See mailing list discussion.
|
|
602e0ca7
|
2022-05-16T10:14:34
|
|
[cff] Minor restructure of struct
Surprisingly this shows tiny benchmark improvement consistently.
|
|
acdab17e
|
2022-05-13T14:14:36
|
|
[cff] Cosmetic in parsed_values_t
|
|
b46c7faa
|
2022-05-13T14:02:54
|
|
[cff] Check buf_len, not buf
Ouch!
|
|
19a8db85
|
2022-05-13T18:05:05
|
|
[subset] fix potential integer overflow in gname_t::cmp.
|
|
2d2f66e1
|
2022-05-13T13:53:17
|
|
[cff-common] In INDEX, return empty bytes if length is zero
Before it was possible to return non-null arrayZ.
|
|
a2f132f1
|
2022-05-13T13:49:17
|
|
[cff] Check glyph-name's length, not arrayZ
As the latter can be non-null while still zero-length.
|
|
c657c4e1
|
2022-05-10T10:00:06
|
|
[meta] fix type traits on gcc 4.9 #3526
Signed-off-by: Thomas Devoogdt <thomas.devoogdt@barco.com>
|
|
e61234c5
|
2022-05-12T13:20:10
|
|
[vector] Add tests for move constructor/assignment
|
|
7fa580bc
|
2022-05-12T13:05:32
|
|
[map] Fix map copy/move constructors to actually work
Ouch!
|
|
a09dd87c
|
2022-05-12T12:58:07
|
|
[set] Fix set copy/move constructors to actually work
Ouch!
|
|
76fc2771
|
2022-05-12T12:14:07
|
|
[vector] Remove explicit std::move
Was confusing compilers. Let them figure it out themselves.
Makes NotoNastaliqu subsetting/1000 benchmark more than twice faster:
Benchmark Time CPU Time Old Time New CPU Old CPU New
------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_subset/subset_glyphs/NotoNastaliqUrdu-Regular.ttf/1000 -0.5064 -0.5065 111 55 110 55
BM_subset/subset_codepoints/NotoNastaliqUrdu-Regular.ttf/1000 -0.5494 -0.5493 132 59 131 59
|
|
c81198b5
|
2022-05-12T11:58:37
|
|
[set] Tweak move operators a bit
Should be equivalent.
|
|
175319cd
|
2022-05-11T13:47:17
|
|
[gsubgpos] Clean up OT::ClassDefFormat2::intersected_class_glyphs 0 case
|
|
137af361
|
2022-05-11T13:39:30
|
|
[gsubgpos] Simplify OT::ClassDefFormat2::intersected_class_glyphs()
|
|
3261e05b
|
2022-05-11T13:16:31
|
|
[subset] Optimize ClassDef1::intersected_class_glyphs() for class0
|
|
c78d8ba6
|
2022-05-11T13:05:41
|
|
[subset] Allocate same size as source table for GSUB/GPOS/name
|
|
2e7f1ae4
|
2022-05-11T12:49:16
|
|
[subset] Use vector.allocated size instead of tracking buf_size
|
|
f0853796
|
2022-05-11T12:10:03
|
|
[cff-subset] Pre-alloc vector for operator decoding
|
|
aeb50b89
|
2022-05-10T18:06:53
|
|
[subset] Retain buffer across table subset operations
|
|
bff78e65
|
2022-05-10T16:33:37
|
|
[cff] Convert interpretation environment to use constructor
|
|
de053e2e
|
2022-05-10T15:38:37
|
|
[cff] Convert subr_subset_param_t to use constructor
|
|
96140db4
|
2022-05-10T15:34:33
|
|
[cff] Convert cff2_extents_param_t to use constructor
|
|
54544f2a
|
2022-05-10T15:31:49
|
|
[cff] Convert cff1_extents_param_t to use constructor
|
|
377befd0
|
2022-05-10T15:29:12
|
|
[cff] Convert get_seac_param_t to use constructor
|
|
8fd70362
|
2022-05-10T15:15:49
|
|
[cff] Use hb_ubytes_t() instead of Null(hb_ubytes_t)
|
|
9033c7f9
|
2022-05-10T14:58:53
|
|
[cff-common] Optimize INDEX::operator[]
Previous try showed slowdown in benchmarks, suprisingly.
Rewrite it keeping the function, hopefully allowing better optimization.
|
|
3aace243
|
2022-05-10T14:54:04
|
|
Revert "[cff-common] Optimize INDEX::operator[]"
This reverts commit 9edb03ac7ac4b4d0814f3fd1f20cc8d2be99e971.
|
|
b31ef081
|
2022-05-10T14:52:40
|
|
Revert "[cff] Add an unlikely()"
This reverts commit 9ba9adb7ed6d48504e97a2af117b7da1fdb28450.
This shows slowdown in benchmarks.
|
|
9ba9adb7
|
2022-05-10T14:42:50
|
|
[cff] Add an unlikely()
|
|
9edb03ac
|
2022-05-10T14:25:08
|
|
[cff-common] Optimize INDEX::operator[]
|
|
0a42410d
|
2022-05-10T12:05:19
|
|
[cff2] Change extents/shape stack to be just a number
Do the blending immediately.
Fixes https://github.com/harfbuzz/harfbuzz/issues/3559
Benchmark on AdobeVFPrototype shows 35% speedup. Now we're faster
than FreeType:
Benchmark Time CPU Time Old Time New CPU Old CPU New
------------------------------------------------------------------------------------------------------------------------------------------------
BM_Font/glyph_extents/AdobeVFPrototype.otf/hb -0.3792 -0.3792 1584 983 1581 982
BM_Font/glyph_extents/AdobeVFPrototype.otf/ft +0.0228 +0.0224 1220 1248 1218 1245
BM_Font/glyph_extents/AdobeVFPrototype.otf/var/hb -0.3513 -0.3518 1616 1048 1613 1046
BM_Font/glyph_extents/AdobeVFPrototype.otf/var/ft +0.0172 +0.0169 1232 1254 1230 1251
|
|
8f9f0c49
|
2022-05-10T17:47:08
|
|
[subset] Enforce cmap12 group ordering constraints in collect_mapping.
Fixes fuzzer issue: https://oss-fuzz.com/testcase-detail/6365271012540416
|
|
1b14d2ff
|
2022-05-09T18:15:31
|
|
[cff] Fix arg-stack peek() impl
|
|
6106ef8c
|
2022-05-09T18:12:09
|
|
[cff] Tighten up arg-stack access
|
|
8c616a6e
|
2022-05-09T17:49:54
|
|
[cff] Allocate stack inline instead of using hb_vector_t
Speeds up glyph_extents and glyph_shape benchmarks for CFF by 10
to 16 percent!
|
|
8bb1a3ce
|
2022-05-09T15:38:40
|
|
[cff-common] Write INDEX offset-size calc using hb_bit_storage()
|
|
2ccfe84e
|
2022-05-09T15:35:04
|
|
[cff-common] Add assert to INDEX::set_offset_at()
|
|
4bcab9e9
|
2022-05-09T15:30:42
|
|
[cff-common] Use byte_str_t() instead of Null(byte_str_t)
|
|
94f7a263
|
2022-05-09T15:29:14
|
|
[cff-common] Fix get_size() for Null object
The special-casing didn't make sense.
|
|
c9cc7d5d
|
2022-05-09T15:27:27
|
|
[cff-common] Inline once-used method in INDEX
|
|
11482a3a
|
2022-05-09T15:25:21
|
|
[cff-common] Remove unused method from INDEX
|
|
d1bb3b08
|
2022-05-09T15:23:59
|
|
[cff-common] Hide more INDEX internals
|
|
d3b21387
|
2022-05-09T15:22:55
|
|
[cff-common] Remove redundant operator implementation
|
|
a96b408d
|
2022-05-09T15:20:16
|
|
[cff-common] Hide INDEX internals
|
|
335b1d83
|
2022-05-06T13:37:11
|
|
[cff-common] No need to check max-offset in INDEX
The length_at() function makes sure out-of-range offsets
are discarded. We just need to check the last offset.
|
|
c941ece6
|
2022-05-09T16:20:22
|
|
[cff] Use using instead of typedef
|
|
64d63ceb
|
2022-05-09T16:16:07
|
|
[cff-common] Use existing types for str_buff_vec_t
|
|
e1838ec1
|
2022-05-09T16:14:13
|
|
[cff-common] Remove unused method
|
|
8aa54aac
|
2022-05-09T16:09:56
|
|
[cff] Replace byte_str_t with hb_bytes_t use
|
|
fe1d85a5
|
2022-05-09T16:04:52
|
|
[cff] Remove custom byte_str_t impl
|
|
c8a5f1e3
|
2022-05-09T15:49:47
|
|
[cff-common] Indent
|
|
be7b2905
|
2022-05-09T15:48:18
|
|
[cff-common] Remove unused INDEX::serialize() method
|
|
60390169
|
2022-05-09T15:44:09
|
|
[cff-common] Write str_buf_t::total_size() as dagger
|
|
258afb45
|
2022-05-09T15:40:55
|
|
[cff-common] Use range-based loop in str_buff_vec_t
|
|
b051f3fa
|
2022-05-05T23:27:34
|
|
[subset] Fix cpal subsetting when there are partial palette overlaps.
The existing code doesn't correctly handle the case where palettes partially overlap in the color record array. This changes the subsetting to only share entries in the color record array when palettes have the same first color index. Partially overlapping palettes will be converted to disjoint segments in the color record array.
Updates one of the color tests to use multiple palettes.
Also fixes fuzzer: https://oss-fuzz.com/testcase-detail/5568200165687296.
|
|
2884eb97
|
2022-05-06T12:54:02
|
|
[cff-common] Remove special-casing of count=0 in INDEX serialize
The generic code-path now can handle count=0.
|
|
fc7f51ae
|
2022-05-06T12:53:19
|
|
[cff-common] Reduce iterator calls
|
|
c857b8e3
|
2022-05-06T12:50:37
|
|
[cff-common] Set INDEX min_size to 2
That is what it is, for an empty INDEX.
|
|
dd71d2c1
|
2022-05-06T13:02:26
|
|
[gvar] Protect against offset underflow
|
|
9a6dabd6
|
2022-05-06T12:01:37
|
|
[gvar] Remove sanitize check for data array
We are not checking in sanitize that offset array is ascending,
so this check was bogus.
|
|
38478d10
|
2022-05-06T12:00:01
|
|
[gvar] DEFINE_SIZE_ARRAY instead of DEFINE_SIZE_MIN
|
|
90d278c9
|
2022-05-06T11:58:53
|
|
[gvar] Remove requirement that num_glyphs matches the font's
|
|
ca8a0f3e
|
2022-05-06T11:54:38
|
|
[gvar] Protect against out-of-range access
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=47281
Fixes https://oss-fuzz.com/testcase-detail/5508865908670464
|
|
f10ddb8d
|
2022-05-05T11:21:24
|
|
[cmap] Use -1 as Unicode sentinel, not U+FFFF in Format12 serialize
|
|
8a19968c
|
2022-05-05T11:17:23
|
|
[cmap] Use iterator bool operator
|
|
8bfeea48
|
2022-05-05T10:48:24
|
|
[subset] Compute set max using previous()
|
|
00cb8c62
|
2022-05-05T10:33:50
|
|
[subset] Don't go into glyf table if it's empty
|
|
4fe69bc4
|
2022-05-05T10:19:16
|
|
[subset] Use del_range in _remove_invalid_gids
|
|
2a42edcc
|
2022-05-04T17:06:18
|
|
[subset] Cosmetic; use set bulk array population instead of for loop
|
|
43938ecd
|
2022-05-04T16:59:28
|
|
[subset] Remove outdated comment
I tried something like that. It was slower because of the allocations.
|
|
50db78ba
|
2022-05-04T15:48:18
|
|
[subset] In cmap planning, remove a qsort()
|
|
052812b6
|
2022-05-04T15:38:30
|
|
Merge pull request #3561 from googlefonts/cmap_opt
[subset] Further cmap subsetting speed optimizations
|
|
7cb36e42
|
2022-05-04T21:22:26
|
|
[subset] Re-introduce size threshold in choosing unicode collection method.
Threshold is needed since the unicodes set might be an inverted set.
|
|
42c54eba
|
2022-05-04T20:21:43
|
|
[subset] Presize unicode to gid list to unicodes + glyphs size.
|
|
7c7c01d2
|
2022-05-03T22:40:56
|
|
[subset] Remove switch to alternate unicode collection at large subset sizes.
Benchmarks show that the first path is always faster even at large subset sizes:
BM_subset_codepoints/subset_roboto/10_median +0.0324 +0.0325 0 0 0 0
BM_subset_codepoints/subset_roboto/64_median +0.0253 +0.0255 0 1 0 1
BM_subset_codepoints/subset_roboto/512_median +0.0126 +0.0128 1 1 1 1
BM_subset_codepoints/subset_roboto/4000_median +0.0500 +0.0491 6 7 6 7
BM_subset_codepoints/subset_amiri/10_median +0.0338 +0.0332 1 1 1 1
BM_subset_codepoints/subset_amiri/64_median +0.0238 +0.0234 1 1 1 1
BM_subset_codepoints/subset_amiri/512_median +0.0066 +0.0063 8 8 8 8
BM_subset_codepoints/subset_amiri/4000_median -0.0011 -0.0012 13 13 13 13
BM_subset_codepoints/subset_noto_nastaliq_urdu/10_median +0.0226 +0.0226 0 0 0 0
BM_subset_codepoints/subset_noto_nastaliq_urdu/64_median +0.0047 +0.0044 20 20 20 20
BM_subset_codepoints/subset_noto_nastaliq_urdu/512_median +0.0022 +0.0021 165 166 165 166
BM_subset_codepoints/subset_noto_nastaliq_urdu/1000_median -0.0021 -0.0023 166 166 166 165
BM_subset_codepoints/subset_noto_devangari/10_median +0.0054 +0.0054 0 0 0 0
BM_subset_codepoints/subset_noto_devangari/64_median +0.0024 +0.0019 0 0 0 0
BM_subset_codepoints/subset_noto_devangari/512_median +0.0089 +0.0090 5 5 5 5
BM_subset_codepoints/subset_noto_devangari/1000_median -0.0028 -0.0019 5 5 5 5
BM_subset_codepoints/subset_mplus1p/10_median +0.0001 +0.0002 0 0 0 0
BM_subset_codepoints/subset_mplus1p/64_median +0.0073 +0.0075 1 1 1 1
BM_subset_codepoints/subset_mplus1p/512_median +0.0034 +0.0034 1 1 1 1
BM_subset_codepoints/subset_mplus1p/4096_median -0.1248 -0.1248 7 6 7 6
BM_subset_codepoints/subset_mplus1p/10000_median -0.0885 -0.0885 13 12 13 12
BM_subset_codepoints/subset_notocjk/10_median +0.0031 +0.0032 2 2 2 2
BM_subset_codepoints/subset_notocjk/64_median -0.0010 -0.0010 2 2 2 2
BM_subset_codepoints/subset_notocjk/512_median -0.0023 -0.0023 9 9 9 9
BM_subset_codepoints/subset_notocjk/4096_median -0.1725 -0.1726 28 23 28 23
BM_subset_codepoints/subset_notocjk/32768_median -0.0277 -0.0287 140 137 140 136
BM_subset_codepoints/subset_notocjk/100000_median -0.0929 -0.0926 162 147 162 147
|
|
f0c04114
|
2022-05-03T22:02:59
|
|
[subset] Embed unicode to gid list vector in subset plan.
|
|
15fa8afb
|
2022-05-02T16:46:41
|
|
Add fast-path for big-endian 32-bit byteswap
Speeds up cmap format-12 decoding by some 40% as measured by
the newly added test in perf/benchmark-font!
|
|
3fff2e91
|
2022-05-02T16:31:59
|
|
[perf/benchmark-font] Cosmetic
|
|
307d2d8b
|
2022-05-02T16:30:22
|
|
[cmap] Sprinkle some 'unlikely's
|
|
85ec5cbc
|
2022-05-02T22:29:43
|
|
[subset] In _populate_unicodes_to_retain populate unicodes in order.
Allows the set insert to take advantage of page lookup cache.
|
|
088133d9
|
2022-05-02T21:29:16
|
|
[subset] cache cp to new gid list in subset plan.
This avoids having to recompute the ordered list multiple times during cmap generation.
|
|
6922a256
|
2022-04-29T23:30:32
|
|
[subset] Change serialize_rangeoffset_glyid back to using iterator.
|
|
c66fd50c
|
2022-04-29T23:18:53
|
|
[subset] in cmap4 serialization save cp to gid iter to memory.
Iterator accesses are slow and it's iterated multiple times.
|
|
17b98563
|
2022-04-29T22:49:02
|
|
[subset] In cmap4 serialization reduce unnessecary calls into the iterator.
Gives ~20% speedup for large subsets.
|
|
5e241094
|
2022-04-29T22:44:43
|
|
[subset] In unicodes cache cleanup if set insert fails.
|
|
a424a92c
|
2022-04-29T22:14:03
|
|
[subset] s/void */intptr_t.
|
|
aad67f56
|
2022-04-29T22:01:06
|
|
[subset] cache results of collect_unicodes.
|
|
b4236b7d
|
2022-04-29T19:21:13
|
|
[subset] Optimize Cmap4 collect_unicodes.
Use set add_range() instead of individual add() calls.
|
|
067225a8
|
2022-04-29T13:04:36
|
|
[set] Optimize const page_for() using last_page_lookup caching
Similar to previous commit.
This speeds up SetLookup benchmark by 50%, but that's because that
lookup always hits the same page...
|
|
c283e41c
|
2022-04-29T12:45:48
|
|
[set] Optimize non-const page_for() using last_page_lookup caching
This speeds up SetOrderedInsert tests by 15 to 40 percent, and the
subset_mplus1p benchmarks by 9 to 27 percent.
|
|
d8292b84
|
2022-04-27T12:38:35
|
|
[CFF] Fix parsing of empty Index
https://github.com/harfbuzz/harfbuzz/issues/3545#issuecomment-1111047941
|
|
6454cec0
|
2022-04-24T11:10:17
|
|
[USE] Classify U+10A38 as CONS_MOD_BELOW
|