Log

Author Commit Date CI Message
Nick Wellnhofer 3e80560d 2021-05-07T10:51:38 Fix line numbers in error messages for mismatched tags Commit 62150ed2 introduced a small regression in the error messages for mismatched tags. This typically only affected messages after the first mismatch, but with custom SAX handlers all line numbers would be off. This also fixes line numbers in the SAX push parser which were never handled correctly.
Nick Wellnhofer 7279d236 2021-05-06T10:37:07 Fix htmlTagLookup Fix regression introduced with b25acce8. Some users like libxslt may call the HTML output functions on documents with uppercase tag names, so we must keep case-insensitive string comparison. Fixes #248.
PaulHiggs 33468d7e 2021-05-03T16:09:44 update for xsd:language type check Fixes #242.
Nick Wellnhofer babe7503 2021-05-01T16:53:33 Propagate error in xmlParseElementChildrenContentDeclPriv Check return value of recursive calls to xmlParseElementChildrenContentDeclPriv and return immediately in case of errors. Otherwise, struct xmlElementContent could contain unexpected null pointers, leading to a null deref when post-validating documents which aren't well-formed and parsed in recovery mode. Fixes #243.
Nick Wellnhofer 5465a8e5 2021-04-25T21:19:59 Update INSTALL.libxml2 Fixes #238.
Nick Wellnhofer 1098c30a 2021-04-22T19:26:28 Fix user-after-free with `xmllint --xinclude --dropdtd` The --dropdtd option can leave dangling pointers in entity reference nodes. Make sure to skip these nodes when processing XIncludes. This also avoids scanning entity declarations and even modifying them inadvertently during XInclude processing. Move from a block list to an allow list approach to avoid descending into other node types that can't contain elements. Fixes #237.
Nick Wellnhofer 72b3c067 2021-04-22T19:24:50 Fix dangling pointer with `xmllint --dropdtd` Reset doc->intSubset when dropping the DTD.
Joel Hockey bf227135 2020-08-16T17:19:35 Validate UTF8 in xmlEncodeEntities Code is currently assuming UTF-8 without validating. Truncated UTF-8 input can cause out-of-bounds array access. Adds further checks to partial fix in 50f06b3e. Fixes #178
Nick Wellnhofer 1358d157 2021-04-21T13:23:27 Fix use-after-free with `xmllint --html --push` Call htmlCtxtUseOptions to make sure that names aren't stored in dictionaries. Note that this issue only affects xmllint using the HTML push parser. Fixes #230.
Nick Wellnhofer fb08d9fe 2021-03-20T22:02:26 Fix include order in c14n.h - Include xmlversion.h before testing feature flags. - Include libxml headers before extern "C". Fixes #226.
Christopher Degawa d3a02679 2021-03-15T13:44:34 CMake: Only add postfixes if MSVC Currently, it catches mingw-w64 in there as well, but mingw-w64 follows linux-like naming with no weird postfixes Signed-off-by: Christopher Degawa <ccom@randomderp.com>
Nick Wellnhofer 868e49cf 2021-03-16T10:36:04 Allow FP division by zero in xmlXPathInit
Nick Wellnhofer d25460da 2021-03-13T19:12:00 Fix XPath NaN/Inf for older GCC versions The DBL_MAX approach could lead to errors caused by excess precision. Switch back to the division-by-zero approach with a work-around for MSVC and use the extern globals instead of macro expressions.
Nick Wellnhofer e20c9c14 2021-03-13T18:41:47 Fix xmlGetNodePath with invalid node types Make xmlGetNodePath return NULL instead of invalid XPath when hitting unsupported node types like DTD content. Reported here: https://mail.gnome.org/archives/xml/2021-January/msg00012.html Original report: https://bugs.php.net/bug.php?id=80680
Nick Wellnhofer c3fd8c42 2021-03-13T17:19:32 Fix exponential behavior with recursive entities Fix another case where only recursion depth was limited, but entities would still be expanded over and over again. The test case discovered by fuzzing only affected parsing in recovery mode with XML_PARSE_RECOVER. Found by OSS-Fuzz.
Nick Wellnhofer 683de7ef 2021-03-04T19:06:04 Fix duplicate xmlStrEqual calls in htmlParseEndTag
Nick Wellnhofer 8095365b 2021-03-04T18:46:11 Speed up htmlCheckAutoClose Switch to binary search.
Nick Wellnhofer b25acce8 2021-03-04T17:44:45 Speed up htmlTagLookup Switch to binary search. This is the first time bsearch is used in the libxml2 code base. But it's a standard library function since C89 and should be portable.
Nick Wellnhofer ad101bb5 2021-03-02T13:32:53 Clarify xmlNewDocProp documentation
Nick Wellnhofer a6e6498f 2021-03-02T13:09:06 Stop checking attributes for UTF-8 validity I can't see a reason to check attribute content for UTF-8 validity. Other parts of the API like xmlNewText have always assumed valid UTF-8 as extra checks only slow down processing. Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not freeing the old encoding would cause a memory leak. Note that this was last changed in 2008 with commit 6f8611fd which removed unnecessary encoding/decoding steps. Setting attributes should be even faster now. Found by OSS-Fuzz.
Nick Wellnhofer 8446d459 2021-03-01T20:56:40 Reduce some fuzzer timeouts OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for several hundred hours without hitting the 20s timeout. It seems that most timeouts resulting from accidentally quadratic behavior in the HTML parser have been fixed. Start to gradually reduce the timeout to find new performance issues.
Nick Wellnhofer 688b41a0 2021-03-01T14:17:42 Fix quadratic behavior when looking up xml:* attributes Add a special case for the predefined XML namespace when looking up DTD attribute defaults in xmlGetPropNodeInternal to avoid calling xmlGetNsList. This fixes quadratic behavior in - xmlNodeGetBase - xmlNodeGetLang - xmlNodeGetSpacePreserve Found by OSS-Fuzz.
Nick Wellnhofer ce2fbaa8 2021-02-22T22:01:57 Only run a few CI tests unless scheduled Only run the following tests by default - gcc - clang:asan - cmake:mingw:w64-x86_64:shared - cmake:msvc:v141:x64:shared
Nick Wellnhofer 85c817a2 2021-02-22T21:28:21 Improve fuzzer stability - Add more calls to xmlInitializeCatalog. - Call xmlResetLastError after fuzzing each input.
Nick Wellnhofer f9ccb3b8 2021-02-22T21:26:13 Check for feature flags in fuzzer tests
Markus Rickert 88c657d6 2021-02-22T21:11:00 Use CMake PROJECT_VERSION
Nick Wellnhofer 7a90bdfa 2021-02-22T17:58:06 Another attempt at improving fuzzer stability xmlInitializeCatalog is not called from xmlInitParser.
Nick Wellnhofer 0fb3ae58 2021-02-22T17:31:05 Revert "Improve HTML fuzzer stability" This reverts commit de1b51eddcc17fd7ed1bbcc6d5d7d529407dfbe2.
Nick Wellnhofer 0987001c 2021-02-22T12:29:56 Add charset names to fuzzing dictionaries
Nick Wellnhofer de1b51ed 2021-02-22T12:25:29 Improve HTML fuzzer stability Call htmlInitAutoClose during fuzzer initialization to fix stability issue. Leave a note concerning problems with this function.
Markus Rickert 09320f05 2021-02-21T14:26:40 Add CI for MSVC x86
Nick Wellnhofer dcb80b92 2021-02-20T20:30:43 Fix slow parsing of HTML with encoding errors Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.
hhb 02bee4c4 2021-02-02T22:27:52 Add a flag to not output anything when xmllint succeeded
Simon Josefsson 4defa2c2 2021-02-12T09:39:38 Fix warnings in libxml.m4 with autoconf 2.70+. Closes #219.
Nick Wellnhofer cbe1212d 2021-02-09T17:07:21 Fix null deref introduced with previous commit Found by OSS-Fuzz.
Nick Wellnhofer 01411e7c 2021-02-08T20:58:32 Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
SVGAnimate 07920b43 2021-01-26T05:42:48 Add the copy of type from original xmlDoc in xmlCopyDoc() A bug related to php DOMDocument: https://bugs.php.net/bug.php?id=80665 When copy/clone an html document, the xmlDoc->type goes from XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
Markus Rickert 2065d340 2021-02-05T23:40:18 Add CI for CMake on MSVC
Mike Dalessio afad3721 2021-01-31T09:53:56 parser.c: shrink the input buffer when appropriate Fixes GNOME/libxml2#200 Also see discussions at: - GNOME/libxml2#192 - https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e - https://github.com/sparklemotion/nokogiri/issues/2132
Nick Wellnhofer ec808a44 2021-02-07T13:57:49 Speed up HTML fuzzer htmlDocDumpMemory uses the "HTML" encoding if no other encoding was specified in the source HTML. This encoding can be extremely slow because of an inefficiency in htmlEntityValueLookup. Stop encoding the output for now.
Nick Wellnhofer e6495e47 2021-02-07T13:38:01 Remove unused encoding parameter of HTML output functions The encoding string is unused. Encodings are set by way of the output buffer.
Nick Wellnhofer 954696e7 2021-02-07T13:23:09 Fix infinite loop in HTML parser introduced with recent commits Check for XML_PARSER_EOF to avoid an infinite loop introduced with recent changes to the HTML push parser. Found by OSS-Fuzz.
Nick Wellnhofer acb35667 2021-02-03T13:48:40 Fix quadratic runtime when parsing CDATA sections Use optimized concatenation for CDATA sections in addition to normal text. This also affects HTML script content. Found by OSS-Fuzz.
Markus Rickert f93ca3e1 2021-01-15T17:53:27 Update minimum required CMake version
Markus Rickert 00487289 2020-12-31T16:34:25 Add variables for configured options to CMake config files
Markus Rickert 95519737 2020-12-31T13:41:19 Check if variables exist when defining targets
Markus Rickert c26e4525 2020-12-31T13:18:14 Check if target exists when reading target properties
Markus Rickert ec119875 2020-12-30T14:40:43 Add xmlcatalog target and definition to config files
Markus Rickert 2377a312 2020-12-30T14:40:04 Remove include directories for link-only dependencies
Markus Rickert 26835480 2020-12-30T14:28:24 Fix ICU build in CMake
Markus Rickert 296ab61e 2020-11-19T22:06:36 Configure pkgconfig, xml2-config, and xml2Conf.sh file
Nick Wellnhofer 79301d3d 2020-12-18T12:50:21 Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.
Nick Wellnhofer 45da175c 2020-12-18T12:14:52 Fix memory leak in xmlParseElementMixedContentDecl Free parsed content if malloc fails to avoid a memory leak. Found with libFuzzer.
Nick Wellnhofer 1d73f07d 2020-12-18T00:55:00 Fix null deref in xmlStringGetNodeList Check for malloc failure to avoid null deref. Found with libFuzzer.
Nick Wellnhofer e2b975c3 2020-12-18T00:50:34 Handle malloc failures in fuzzing code Avoid misdiagnosis in OOM situations.
Mike Dalessio a67b63d1 2020-10-11T14:15:37 use new htmlParseLookupCommentEnd to find comment ends Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
Mike Dalessio 29f5d20e 2020-08-03T17:36:05 htmlParseComment: treat `--!>` as if it closed the comment See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
Mike Dalessio e28d9347 2020-08-04T14:53:19 add test coverage for incorrectly-closed comments this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.
Nick Wellnhofer 9086988f 2020-12-16T15:41:52 Enforce maximum length of fuzz input Remove the libfuzzer max_len option which doesn't apply to other fuzzing engines. Enforce the maximum length directly in the fuzz targets. For the xml target, lower the maximum when expanding entities to avoid timeout and OOM errors.
Nick Wellnhofer 1fe38530 2020-12-16T15:27:13 Remove temporary members from struct _xmlXPathContext These values are hardcoded now and the struct members, while public, were recently introduced and never part of an official release.
Nick Wellnhofer 8ca3a59b 2020-12-15T20:14:28 Fix integer overflow in xmlSchemaGetParticleTotalRangeMin The function is only used once and its return value is only checked for zero. Disable the function like its Max counterpart and add an implementation for the special case. Found by OSS-Fuzz.
Xiaoming Ni 649d02ea 2020-12-07T20:19:53 encoding: fix memleak in xmlRegisterCharEncodingHandler() The return type of xmlRegisterCharEncodingHandler() is void. The invoker cannot determine whether xmlRegisterCharEncodingHandler() is executed successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the "handler" is not added to the array "handlers". As a result, the memory of "handler" cannot be managed and released: memory leakage. so add "xmlfree(handler)" to fix memory leakage on the failure branch of xmlRegisterCharEncodingHandler(). Reported-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Xiaoming Ni cb7a572b 2020-12-07T20:17:34 xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val" The xmlSchemaGetFacetValueAsUlong() API is an external API. The validity of external input parameters must be strictly verified. Before accessing "facet->val->value", we need check whether "facet->val" is a null pointer. Signed-off-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Markus Rickert 84b76d99 2020-12-06T17:26:23 Update CMake config files
Markus Rickert d0ccb3a6 2020-12-06T17:25:52 Add xmlcatalog and xmllint to CMake export
Nick Wellnhofer acdc2ff3 2020-06-04T23:02:08 Simplify xmlexports.h All the compiler switches essentially set the same macros. The only exception was MSVC which omitted the "extern" keyword for exported variables. This in turn broke clang-cl. This commit rewrites and simplifies the whole header. Closes #163.
Nick Wellnhofer a218ff0e 2020-12-06T17:26:36 Fix null pointer deref in xmlXPtrRangeInsideFunction Found by OSS-Fuzz.
Nick Wellnhofer 94c2e415 2020-12-06T16:38:00 Fix quadratic runtime in HTML push parser with null bytes Null bytes in the input stream do not necessarily signal an EOF condition. Check the stream pointers for EOF to avoid quadratic rescanning of input data. Note that the CUR_CHAR macro used in functions like htmlParseCharData calls htmlCurrentChar which translates null bytes. Found by OSS-Fuzz.
Markus Rickert 1c4f9a6d 2020-11-25T18:01:51 Require dependencies based on enabled CMake options
Michael Matz faea2fa9 2020-11-21T01:21:56 Avoid quadratic checking of identity-constraints key/unique/keyref schema attributes currently use qudratic loops to check their various constraints (that keys are unique and that keyrefs refer to existing keys). That becomes extremely slow if there are many elements with keys. This happens in the wild with e.g. the OVAL XML descriptions of security patches. You need the openscap schemata, and then an example xml file: % zypper in openscap-utils % wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml % time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 16m59,857s user 16m55,787s sys 0m1,060s This patch makes libxml use a hash table to avoid the quadratic behaviour. The existing hash table only accepts strings as keys, so we're mostly reusing the canonical representation of key values to derive such strings (with the caveat given in a comment). The alternative would be to rework the hash table code to accept either numbers or free functions as hash workers, but the code is fast enough as is. With the patch we have this then: % time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 0m3,531s user 0m3,427s sys 0m0,103s So, a ~300x speedup. This patch survives 'make check' and 'make tests'.
Markus Rickert 8272db53 2020-11-28T22:54:40 Use NAMELINK_COMPONENT in CMake install
Markus Rickert 5c7bdbc9 2020-11-25T18:41:14 Add CMake files to EXTRA_DIST
Markus Rickert 7a62870a 2020-11-19T22:06:23 Add missing compile definition for static builds to CMake
Markus Rickert e028d293 2020-11-19T17:58:46 Add CI for CMake on Linux and MinGW
Frederik Seiffert b516ed18 2020-11-12T12:53:43 Fix building with ICU 68. ICU 68 no longer defines the TRUE macro. Closes #204.
Victor Stinner ac5e9991 2020-11-10T15:42:36 Convert python/libxml.c to PY_SSIZE_T_CLEAN Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string length (int len) explicitly to Py_ssize_t when passing a string to a function call using PyObject_CallMethod() with the "s#" format.
Victor Stinner f42a0524 2020-11-09T18:19:31 Build the Python extension with PY_SSIZE_T_CLEAN The Python extension module now uses Py_ssize_t rather than int for string lengths. This change makes the extension compatible with Python 3.10. Fixes #203.
Nick Wellnhofer 0ace6c4d 2020-11-19T17:35:11 Add CI test for Python 3
Elliott Hughes 7c06d99e 2020-10-27T11:29:20 Fix xmlURIEscape memory leaks. Found by running the fuzz/uri.c fuzzer under asan (internal Android bug 171610679). Always free `ret` when exiting on failure. I've moved the definition of NULLCHK down past where ret is always initialized to make it clear that this is safe. This patch also fixes the indentation of two of the NULLCHK call sites to make it more obvious that NULLCHK isn't `if`-like.
Nick Wellnhofer 31c6ce3b 2020-11-09T17:55:44 Avoid call stack overflow with XML reader and recursive XIncludes Don't process XIncludes in the result of another inclusion to avoid infinite recursion resulting in a call stack overflow. This is something the XInclude engine shouldn't allow but correct handling of intra-document includes would require major changes. Found by OSS-Fuzz.
Nick Wellnhofer 7d6837ba 2020-10-25T20:21:43 Fix caret in regexp character group Apply Per Hedeland's patch from https://bugzilla.gnome.org/show_bug.cgi?id=779751 Fixes #188.
Nick Wellnhofer 8a85263f 2020-10-25T20:08:16 Add fuzzing dictionaries to EXTRA_DIST Also add static seed corpus for the URI fuzzer.
Nick Wellnhofer 1bde1040 2020-10-25T20:02:23 Add 'fuzz' subdirectory to DIST_SUBDIRS Fixes #191.
Mike Dalessio c0c26ff2 2020-10-11T16:33:07 parser.c: xmlParseCharData peek behavior fixed wrt newlines Previously, xmlParseCharData and xmlParseComment would consider 0xA to be unhandleable when seen as the first byte of an input chunk, and fall back to xmlParseCharDataComplex and xmlParseCommentComplex, which have different memory and performance characteristics. Fixes GNOME/libxml2#192
Nick Wellnhofer b46016b8 2020-10-17T18:03:09 Allow port numbers up to INT_MAX Also return an error on overflow.
Nick Wellnhofer 46837d47 2020-10-03T01:13:35 Fix memory leaks in XPointer string-range function Found by OSS-Fuzz.
Nick Wellnhofer 0b3c64d9 2020-09-29T18:08:37 Handle dumps of corrupted documents more gracefully Check parent pointers for NULL after the non-recursive rewrite of the serialization code. This avoids segfaults with corrupted documents which can apparently be seen with lxml, see issue #187.
Nick Wellnhofer 847a3a11 2020-09-28T12:28:29 Fix use-after-free when XIncluding text from Reader The XML Reader can free text nodes coming from the XInclude engine before parsing has finished. Cache a copy of the text string, not the included node to avoid use after free. Found by OSS-Fuzz.
yanjinjq 7929f057 2020-08-30T10:34:01 Fix SEGV in xmlSAXParseFileWithData Fixes #181.
Nick Wellnhofer e6ec58ec 2020-09-21T12:49:36 Fix null deref in XPointer expression error path Make sure that the filter functions introduced with commit c2f4da1a return node-sets without NULL pointers also in the error case. Found by OSS-Fuzz.
Nick Wellnhofer 4e9cc18b 2020-09-21T11:00:23 Fix variable name in win32/configure.js Fix copy/paste error from previous commit.
Nick Wellnhofer 5614c078 2020-09-21T10:55:45 Fix version parsing in win32/configure.js Adjust to configure.ac changes. Should fix #185.
Nick Wellnhofer 8b88503a 2020-09-18T19:15:27 Don't call xmlXPathInit directly Call xmlInitParser which uses a lock to avoid race conditions. Fixes #184.
Nick Wellnhofer b215c270 2020-09-13T12:19:48 Fix cleanup of attributes in XML reader xml:id creates ID attributes even in documents without a DTD, so the check in xmlTextReaderFreeProp must be changed to avoid use after free. Found by OSS-Fuzz.
Nick Wellnhofer f0fd1b67 2020-08-26T00:16:38 Limit size of free lists in XML reader when fuzzing Keeping objects on a free list can hide memory errors. Only allow a single node on free lists used by the XML reader when fuzzing. This should hide fewer errors while still exercising the free list logic.
Nick Wellnhofer ba589adc 2020-08-25T23:50:39 Fix double free in XML reader with XIncludes An XInclude with empty fallback could lead to a double free in xmlTextReaderRead. Found by OSS-Fuzz.
Nick Wellnhofer 6f1470a5 2020-08-25T18:50:45 Hardcode maximum XPath recursion depth Always limit nested functions calls to 5000. This avoids call stack overflows with deeply nested expressions. The expression parser produces about 10 nested function calls when parsing a subexpression in parentheses, so the effective nesting limit is about 500 which should be more than enough. Use a lower limit when fuzzing to account for increased memory usage when using sanitizers.
Nick Wellnhofer 8c3ef083 2020-08-24T23:17:34 Pass URL of main entity in XML fuzzer
Nick Wellnhofer 0d5f3710 2020-08-24T16:28:54 Consolidate seed corpus generation Implement file handling in C to speed up corpus generation.
Nick Wellnhofer 0d9da029 2020-08-24T03:16:25 Test fuzz targets with dummy driver Run fuzz targets with files in seed corpus during test.