• Show log

    Commit

  • Hash : 69b83bb6
    Author : Nick Wellnhofer
    Date : 2025-03-10T02:18:51

    encoding: Detect truncated multi-byte sequences with ICU
    
    Unlike iconv or the internal converters, ICU consumes truncated multi-
    byte sequences at the end of an input buffer. We currently check for a
    non-empty raw input buffer to detect truncated sequences, so this fails
    with ICU.
    
    It might be possible to inspect the pivot buffer pointers, but it seems
    cleaner to implement a `flush` flag for some encoding and I/O functions.
    After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
    detect remaining input with other converters.
    
    Also fix detection of truncated sequences for HTML, XML content and
    DTDs with iconv.