• Show log

    Commit

  • Hash : 834b8123
    Author : Nick Wellnhofer
    Date : 2023-08-08T15:21:28

    parser: Stream data when reading from memory
    
    Don't create a copy of the whole input buffer. Read the data chunk by
    chunk to save memory.
    
    Historically, it was probably envisioned to read data from memory
    without additional copying. This doesn't work reliably with the current
    design of the XML parser which requires a terminating null byte at the
    end of input buffers. This lead to xmlReadMemory interfaces, which
    expect pointer and size arguments, being changed to make a
    zero-terminated copy of the input buffer. Interfaces based on
    xmlReadDoc, which actually expect a zero-terminated string and
    would make zero-copy operation work, were then simplified to rely on
    xmlReadMemoryi, resulting in an unnecessary copy.
    
    To avoid copying (possibly gigabytes) of memory temporarily, we now
    stream in-memory input just like content read from files in a
    chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of
    250 bytes). As a side effect, we also avoid another copy of the whole
    input when handling non-UTF-8 data which was made possible by some
    earlier commits.
    
    Interfaces expecting zero-terminated strings now make use of strnlen
    which unfortunately isn't part of the standard C library and only
    mandated since POSIX 2008.
    

  • README

  • libFuzzer instructions for libxml2
    ==================================
    
    Set compiler and options:
    
        export CC=clang
        export CFLAGS="-g -fsanitize=fuzzer-no-link,address,undefined \
            -fno-sanitize-recover=all \
            -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION"
    
    Build libxml2 with instrumentation:
    
        ./configure --without-python
        make
    
    Run fuzzers:
    
        make -C fuzz fuzz-xml