• Show log

    Commit

  • Hash : dcb80b92
    Author : Nick Wellnhofer
    Date : 2021-02-20T20:30:43

    Fix slow parsing of HTML with encoding errors
    
    Under certain circumstances, the HTML parser would try to guess and
    switch input encodings multiple times, leading to slow processing of
    documents with encoding errors. The repeated scanning of the input
    buffer when guessing encodings could even lead to quadratic behavior.
    
    The code htmlCurrentChar probably assumed that if there's an encoding
    handler, it is guaranteed to produce valid UTF-8. This holds true in
    general, but if the detected encoding was "UTF-8", the UTF8ToUTF8
    encoding handler simply invoked memcpy without checking for invalid
    UTF-8. This still must be fixed, preferably by not using this handler
    at all.
    
    Also leave a note that switching encodings twice seems impossible to
    implement correctly. Add a check when handling UTF-8 encoding errors
    in htmlCurrentChar to avoid this situation, even if encoders produce
    invalid UTF-8.
    
    Found by OSS-Fuzz.
    

  • README

  •                   XML toolkit from the GNOME project
    
    Full documentation is available on-line at
        http://xmlsoft.org/
    
    This code is released under the MIT Licence see the Copyright file.
    
    To build on an Unixised setup:
       ./configure ; make ; make install
       if the ./configure file does not exist, run ./autogen.sh instead.
    To build on Windows:
       see instructions on win32/Readme.txt
    
    To assert build quality:
       on an Unixised setup:
          run make tests
       otherwise:
           There is 3 standalone tools runtest.c runsuite.c testapi.c, which
           should compile as part of the build or as any application would.
           Launch them from this directory to get results, runtest checks 
           the proper functioning of libxml2 main APIs while testapi does
           a full coverage check. Report failures to the list.
    
    To report bugs, follow the instructions at: 
      http://xmlsoft.org/bugs.html
    
    A mailing-list xml@gnome.org is available, to subscribe:
        http://mail.gnome.org/mailman/listinfo/xml
    
    The list archive is at:
        http://mail.gnome.org/archives/xml/
    
    All technical answers asked privately will be automatically answered on
    the list and archived for public access unless privacy is explicitly
    required and justified.
    
    Daniel Veillard
    
    $Id$