[cff] Optimize byte_str_ref_t Make it 16 bytes instead of 24. This struct is used in the subroutine call stack heavily. This change makes the HB AdobeVFPrototype benchmark to become faster than FT one, with about 6% speedup as a result of this change.