Don't create hb_apply_context_t per glyph! I couldn't measure significant performance gains out of this; maybe about 5% (with one million Malayalam strings). Still, not bad. But reminds me that optimizing this codebase without profiling first is simply not going to work. Oh well...