• Show log

    Commit

  • Hash : 03bb9293
    Author : David Kilzer
    Date : 2021-07-07T18:23:18

    Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
    
    This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8().
    
    * encoding.c:
    (UTF16LEToUTF8):
    - Fix comment to describe what the code does.
    (UTF16BEToUTF8):
    - Fix undefined behavior which was applied to UTF16LEToUTF8() in
      2f9382033e.
    - Add bounds check to while() loop which was applied to
      UTF16LEToUTF8() in be803967db.
    - Do not return -2 when (in >= inend) to fix the bug.  This was
      applied to UTF16LEToUTF8() in 496a1cf592.
    - Inline (<< 8) statements to match UTF16LEToUTF8().
    
    Add the following tests and results:
    
      test/text-4-byte-UTF-16-BE-offset.xml
      test/text-4-byte-UTF-16-BE.xml
      test/text-4-byte-UTF-16-LE-offset.xml
      test/text-4-byte-UTF-16-LE.xml