Use a static context map with two buckets for UTF8 data. Enabled for quality >= 4, and if there are no obvious UTF8 violations detected. For each block, we gather two separate histograms, one for continuation bytes and one for ASCII or lead bytes.