Commit e5060c6e9c3daecb785af986324018ddda35ae49

Bruno Haible 2023-07-25T22:20:51

unistr/u8-*: Make Unicode decoder more Unicode Standard compliant. Based on a remark by Paul Eggert in <https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00120.html>. * tests/unistr/test-u8-mbtouc.c (test_safe_function): Change expected results for "non-shortest form" or out-of-range byte sequences. Add new test cases of incomplete well-formed byte sequences. * tests/unistr/test-u8-mbsnlen.c (main): Likewise. * lib/unistr/u8-mbtouc-aux.c (u8_mbtouc_aux): Reject a first byte in the range 0xF5..0xF7 as invalid. Distinguish incomplete from invalid byte sequences correctly. For the former, return only the number of bytes in the maximal well-formed subpart. * lib/unistr/u8-mbtouc.c (u8_mbtouc): Likewise. * lib/unistr/u8-check.c (u8_check): Reject a first byte in the range 0xF5..0xF7 as invalid. * lib/unistr/u8-mblen.c (u8_mblen): Likewise. * lib/unistr/u8-mbtoucr.c (u8_mbtoucr): Likewise. * lib/unistr/u8-strmbtouc.c (u8_strmbtouc): Likewise. * lib/unistr/u8-strmblen.c (u8_strmblen): Likewise. * lib/unistr/u8-prev.c (u8_prev): Likewise.