Log

Author Commit Date CI Message
Nick Wellnhofer 954696e7 2021-02-07T13:23:09 Fix infinite loop in HTML parser introduced with recent commits Check for XML_PARSER_EOF to avoid an infinite loop introduced with recent changes to the HTML push parser. Found by OSS-Fuzz.
Nick Wellnhofer acb35667 2021-02-03T13:48:40 Fix quadratic runtime when parsing CDATA sections Use optimized concatenation for CDATA sections in addition to normal text. This also affects HTML script content. Found by OSS-Fuzz.
Markus Rickert f93ca3e1 2021-01-15T17:53:27 Update minimum required CMake version
Markus Rickert 00487289 2020-12-31T16:34:25 Add variables for configured options to CMake config files
Markus Rickert 2377a312 2020-12-30T14:40:04 Remove include directories for link-only dependencies
Markus Rickert 26835480 2020-12-30T14:28:24 Fix ICU build in CMake
Markus Rickert 95519737 2020-12-31T13:41:19 Check if variables exist when defining targets
Markus Rickert 296ab61e 2020-11-19T22:06:36 Configure pkgconfig, xml2-config, and xml2Conf.sh file
Markus Rickert c26e4525 2020-12-31T13:18:14 Check if target exists when reading target properties
Markus Rickert ec119875 2020-12-30T14:40:43 Add xmlcatalog target and definition to config files
Nick Wellnhofer 79301d3d 2020-12-18T12:50:21 Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.
Nick Wellnhofer 45da175c 2020-12-18T12:14:52 Fix memory leak in xmlParseElementMixedContentDecl Free parsed content if malloc fails to avoid a memory leak. Found with libFuzzer.
Nick Wellnhofer 1d73f07d 2020-12-18T00:55:00 Fix null deref in xmlStringGetNodeList Check for malloc failure to avoid null deref. Found with libFuzzer.
Nick Wellnhofer e2b975c3 2020-12-18T00:50:34 Handle malloc failures in fuzzing code Avoid misdiagnosis in OOM situations.
Mike Dalessio 29f5d20e 2020-08-03T17:36:05 htmlParseComment: treat `--!>` as if it closed the comment See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
Mike Dalessio e28d9347 2020-08-04T14:53:19 add test coverage for incorrectly-closed comments this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.
Nick Wellnhofer 9086988f 2020-12-16T15:41:52 Enforce maximum length of fuzz input Remove the libfuzzer max_len option which doesn't apply to other fuzzing engines. Enforce the maximum length directly in the fuzz targets. For the xml target, lower the maximum when expanding entities to avoid timeout and OOM errors.
Mike Dalessio a67b63d1 2020-10-11T14:15:37 use new htmlParseLookupCommentEnd to find comment ends Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
Nick Wellnhofer 1fe38530 2020-12-16T15:27:13 Remove temporary members from struct _xmlXPathContext These values are hardcoded now and the struct members, while public, were recently introduced and never part of an official release.
Nick Wellnhofer 8ca3a59b 2020-12-15T20:14:28 Fix integer overflow in xmlSchemaGetParticleTotalRangeMin The function is only used once and its return value is only checked for zero. Disable the function like its Max counterpart and add an implementation for the special case. Found by OSS-Fuzz.
Xiaoming Ni 649d02ea 2020-12-07T20:19:53 encoding: fix memleak in xmlRegisterCharEncodingHandler() The return type of xmlRegisterCharEncodingHandler() is void. The invoker cannot determine whether xmlRegisterCharEncodingHandler() is executed successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the "handler" is not added to the array "handlers". As a result, the memory of "handler" cannot be managed and released: memory leakage. so add "xmlfree(handler)" to fix memory leakage on the failure branch of xmlRegisterCharEncodingHandler(). Reported-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Xiaoming Ni cb7a572b 2020-12-07T20:17:34 xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val" The xmlSchemaGetFacetValueAsUlong() API is an external API. The validity of external input parameters must be strictly verified. Before accessing "facet->val->value", we need check whether "facet->val" is a null pointer. Signed-off-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Markus Rickert 84b76d99 2020-12-06T17:26:23 Update CMake config files
Markus Rickert d0ccb3a6 2020-12-06T17:25:52 Add xmlcatalog and xmllint to CMake export
Nick Wellnhofer acdc2ff3 2020-06-04T23:02:08 Simplify xmlexports.h All the compiler switches essentially set the same macros. The only exception was MSVC which omitted the "extern" keyword for exported variables. This in turn broke clang-cl. This commit rewrites and simplifies the whole header. Closes #163.
Nick Wellnhofer a218ff0e 2020-12-06T17:26:36 Fix null pointer deref in xmlXPtrRangeInsideFunction Found by OSS-Fuzz.
Nick Wellnhofer 94c2e415 2020-12-06T16:38:00 Fix quadratic runtime in HTML push parser with null bytes Null bytes in the input stream do not necessarily signal an EOF condition. Check the stream pointers for EOF to avoid quadratic rescanning of input data. Note that the CUR_CHAR macro used in functions like htmlParseCharData calls htmlCurrentChar which translates null bytes. Found by OSS-Fuzz.
Markus Rickert 1c4f9a6d 2020-11-25T18:01:51 Require dependencies based on enabled CMake options
Michael Matz faea2fa9 2020-11-21T01:21:56 Avoid quadratic checking of identity-constraints key/unique/keyref schema attributes currently use qudratic loops to check their various constraints (that keys are unique and that keyrefs refer to existing keys). That becomes extremely slow if there are many elements with keys. This happens in the wild with e.g. the OVAL XML descriptions of security patches. You need the openscap schemata, and then an example xml file: % zypper in openscap-utils % wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml % time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 16m59,857s user 16m55,787s sys 0m1,060s This patch makes libxml use a hash table to avoid the quadratic behaviour. The existing hash table only accepts strings as keys, so we're mostly reusing the canonical representation of key values to derive such strings (with the caveat given in a comment). The alternative would be to rework the hash table code to accept either numbers or free functions as hash workers, but the code is fast enough as is. With the patch we have this then: % time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 0m3,531s user 0m3,427s sys 0m0,103s So, a ~300x speedup. This patch survives 'make check' and 'make tests'.
Markus Rickert 8272db53 2020-11-28T22:54:40 Use NAMELINK_COMPONENT in CMake install
Markus Rickert 5c7bdbc9 2020-11-25T18:41:14 Add CMake files to EXTRA_DIST
Markus Rickert 7a62870a 2020-11-19T22:06:23 Add missing compile definition for static builds to CMake
Markus Rickert e028d293 2020-11-19T17:58:46 Add CI for CMake on Linux and MinGW
Frederik Seiffert b516ed18 2020-11-12T12:53:43 Fix building with ICU 68. ICU 68 no longer defines the TRUE macro. Closes #204.
Victor Stinner ac5e9991 2020-11-10T15:42:36 Convert python/libxml.c to PY_SSIZE_T_CLEAN Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string length (int len) explicitly to Py_ssize_t when passing a string to a function call using PyObject_CallMethod() with the "s#" format.
Victor Stinner f42a0524 2020-11-09T18:19:31 Build the Python extension with PY_SSIZE_T_CLEAN The Python extension module now uses Py_ssize_t rather than int for string lengths. This change makes the extension compatible with Python 3.10. Fixes #203.
Nick Wellnhofer 0ace6c4d 2020-11-19T17:35:11 Add CI test for Python 3
Elliott Hughes 7c06d99e 2020-10-27T11:29:20 Fix xmlURIEscape memory leaks. Found by running the fuzz/uri.c fuzzer under asan (internal Android bug 171610679). Always free `ret` when exiting on failure. I've moved the definition of NULLCHK down past where ret is always initialized to make it clear that this is safe. This patch also fixes the indentation of two of the NULLCHK call sites to make it more obvious that NULLCHK isn't `if`-like.
Nick Wellnhofer 31c6ce3b 2020-11-09T17:55:44 Avoid call stack overflow with XML reader and recursive XIncludes Don't process XIncludes in the result of another inclusion to avoid infinite recursion resulting in a call stack overflow. This is something the XInclude engine shouldn't allow but correct handling of intra-document includes would require major changes. Found by OSS-Fuzz.
Nick Wellnhofer 7d6837ba 2020-10-25T20:21:43 Fix caret in regexp character group Apply Per Hedeland's patch from https://bugzilla.gnome.org/show_bug.cgi?id=779751 Fixes #188.
Nick Wellnhofer 8a85263f 2020-10-25T20:08:16 Add fuzzing dictionaries to EXTRA_DIST Also add static seed corpus for the URI fuzzer.
Nick Wellnhofer 1bde1040 2020-10-25T20:02:23 Add 'fuzz' subdirectory to DIST_SUBDIRS Fixes #191.
Mike Dalessio c0c26ff2 2020-10-11T16:33:07 parser.c: xmlParseCharData peek behavior fixed wrt newlines Previously, xmlParseCharData and xmlParseComment would consider 0xA to be unhandleable when seen as the first byte of an input chunk, and fall back to xmlParseCharDataComplex and xmlParseCommentComplex, which have different memory and performance characteristics. Fixes GNOME/libxml2#192
Nick Wellnhofer b46016b8 2020-10-17T18:03:09 Allow port numbers up to INT_MAX Also return an error on overflow.
Nick Wellnhofer 46837d47 2020-10-03T01:13:35 Fix memory leaks in XPointer string-range function Found by OSS-Fuzz.
Nick Wellnhofer 0b3c64d9 2020-09-29T18:08:37 Handle dumps of corrupted documents more gracefully Check parent pointers for NULL after the non-recursive rewrite of the serialization code. This avoids segfaults with corrupted documents which can apparently be seen with lxml, see issue #187.
Nick Wellnhofer 847a3a11 2020-09-28T12:28:29 Fix use-after-free when XIncluding text from Reader The XML Reader can free text nodes coming from the XInclude engine before parsing has finished. Cache a copy of the text string, not the included node to avoid use after free. Found by OSS-Fuzz.
yanjinjq 7929f057 2020-08-30T10:34:01 Fix SEGV in xmlSAXParseFileWithData Fixes #181.
Nick Wellnhofer e6ec58ec 2020-09-21T12:49:36 Fix null deref in XPointer expression error path Make sure that the filter functions introduced with commit c2f4da1a return node-sets without NULL pointers also in the error case. Found by OSS-Fuzz.
Nick Wellnhofer 4e9cc18b 2020-09-21T11:00:23 Fix variable name in win32/configure.js Fix copy/paste error from previous commit.
Nick Wellnhofer 5614c078 2020-09-21T10:55:45 Fix version parsing in win32/configure.js Adjust to configure.ac changes. Should fix #185.
Nick Wellnhofer 8b88503a 2020-09-18T19:15:27 Don't call xmlXPathInit directly Call xmlInitParser which uses a lock to avoid race conditions. Fixes #184.
Nick Wellnhofer b215c270 2020-09-13T12:19:48 Fix cleanup of attributes in XML reader xml:id creates ID attributes even in documents without a DTD, so the check in xmlTextReaderFreeProp must be changed to avoid use after free. Found by OSS-Fuzz.
Nick Wellnhofer f0fd1b67 2020-08-26T00:16:38 Limit size of free lists in XML reader when fuzzing Keeping objects on a free list can hide memory errors. Only allow a single node on free lists used by the XML reader when fuzzing. This should hide fewer errors while still exercising the free list logic.
Nick Wellnhofer ba589adc 2020-08-25T23:50:39 Fix double free in XML reader with XIncludes An XInclude with empty fallback could lead to a double free in xmlTextReaderRead. Found by OSS-Fuzz.
Nick Wellnhofer 6f1470a5 2020-08-25T18:50:45 Hardcode maximum XPath recursion depth Always limit nested functions calls to 5000. This avoids call stack overflows with deeply nested expressions. The expression parser produces about 10 nested function calls when parsing a subexpression in parentheses, so the effective nesting limit is about 500 which should be more than enough. Use a lower limit when fuzzing to account for increased memory usage when using sanitizers.
Nick Wellnhofer 8c3ef083 2020-08-24T23:17:34 Pass URL of main entity in XML fuzzer
Nick Wellnhofer 0d5f3710 2020-08-24T16:28:54 Consolidate seed corpus generation Implement file handling in C to speed up corpus generation.
Nick Wellnhofer 0d9da029 2020-08-24T03:16:25 Test fuzz targets with dummy driver Run fuzz targets with files in seed corpus during test.
Nick Wellnhofer 3fcf3193 2020-08-22T00:43:18 Fix regression introduced with commit d88df4b Revert the commit and use a different approach. Found by OSS-Fuzz.
Nick Wellnhofer 87d20b55 2020-08-19T13:52:08 Fix regression introduced with commit 74dcc10b The code wasn't dead after all, but I can see no reason in delaying the XPointer evaluation. This could lead to nodes included earlier appearing in XPointer results.
Nick Wellnhofer fbb7fa9a 2020-08-19T13:13:20 Fix memory leak in xmlXIncludeAddNode error paths Found by OSS-Fuzz.
Nick Wellnhofer 19cae17f 2020-08-19T13:07:28 Revert "Fix quadratic runtime in xi:fallback processing" This reverts commit 27119ec33c9f6b9830efa1e0da0acfa353dfa55a. Not copying fallback children didn't fix up namespaces and could lead to use-after-free errors. Found by OSS-Fuzz.
Nick Wellnhofer d63cfeca 2020-08-17T15:40:06 Add TODO comment in xinclude.c Add some thoughts on the major remaining problems with the XInclude implementation.
Nick Wellnhofer 804c5297 2020-08-17T03:37:18 Stop using maxParserDepth in xpath.c Only use a single maxDepth value.
Nick Wellnhofer 74dcc10b 2020-08-17T03:24:56 Remove dead code in xinclude.c 'doc' is checked for NULL in xmlXIncludeLoadDoc, so several code paths can be eliminated.
Nick Wellnhofer 0ff52748 2020-08-17T02:54:28 Fix autotools warnings
Nick Wellnhofer 2c747129 2020-08-17T00:54:12 Fix error reporting with xi:fallback When reporting errors, don't use href of xi:include if xi:fallback was used. I think this can only be reproduced with "xmllint --postvalid", see the original bug report: https://bugzilla.gnome.org/show_bug.cgi?id=152623
Nick Wellnhofer 27119ec3 2020-08-17T00:05:19 Fix quadratic runtime in xi:fallback processing Copying the tree would lead to runtime quadratic in nested fallback depth, similar to naive string concatenation.
Nick Wellnhofer d88df4bd 2020-08-16T23:38:48 Fix corner case with empty xi:fallback xi:fallback could become empty after recursive expansion. Use a flag to track whether nodes should be skipped.
Nick Wellnhofer 00a86d41 2020-08-16T23:38:00 Don't add formatting newlines to XInclude nodes
Nick Wellnhofer dba82a8c 2020-08-16T23:02:20 Fix XInclude regression introduced with recent commit The change to xmlXIncludeLoadFallback in commit 11b57459 could process already freed nodes if text nodes were merged after deleting nodes with an empty fallback. Found by OSS-Fuzz.
Nick Wellnhofer e1c2d0ad 2020-08-16T22:22:57 Fix memory leak in runtest.c
Nick Wellnhofer 2b4769a6 2020-08-16T22:02:04 Make "xmllint --push --recovery" work
Nick Wellnhofer 99fc048d 2020-08-14T14:18:50 Don't use SAX1 if all element handlers are NULL Running xmllint with "--sax --noout" installs a SAX2 handler with all callbacks set to NULL. In this case or similar situations, we don't want to switch to SAX1 parsing.
Nick Wellnhofer c1ba6f54 2020-08-15T18:32:29 Revert "Do not URI escape in server side includes" This reverts commit 960f0e275616cadc29671a218d7fb9b69eb35588. This commit introduced - an infinite loop, found by OSS-Fuzz, which could be easily fixed. - an algorithm with quadratic runtime - a security issue, see https://bugzilla.gnome.org/show_bug.cgi?id=769760 A better approach is to add an option not to escape URLs at all which libxml2 should have possibly done in the first place.
Nick Wellnhofer b82fa3dd 2020-08-09T14:50:46 Fix column number accounting in xmlParse*NameAndCompare Thanks to Frederic Vancraeyveldt for the report.
Nick Wellnhofer 438e595a 2020-08-09T14:43:53 Stop counting nbChars in parser context The value was inaccurate and never used.
Nick Wellnhofer f6a9541f 2020-08-09T14:29:35 Remove unneeded progress checks in HTML parser The HTML parser should now be guaranteed to make progress, so the checks became unnecessary.
Nick Wellnhofer 9de7b94d 2020-08-08T20:37:30 Use strcmp when fuzzing This should improve data-flow-guided fuzzing.
Nick Wellnhofer 10a07948 2020-08-08T17:46:11 Fix XPath fuzzer
Nick Wellnhofer 6c128fd5 2020-06-05T13:43:45 Fuzz XInclude engine
Nick Wellnhofer 50f06b3e 2020-08-07T21:54:27 Fix out-of-bounds read with 'xmllint --htmlout' Make sure that truncated UTF-8 sequences don't cause an out-of-bounds array access. Thanks to @SuhwanSong and the Agency for Defense Development (ADD) for the report. Fixes #178.
Nick Wellnhofer 1abf2967 2020-08-06T17:51:57 Fix exponential runtime and memory in xi:fallback processing When creating XML_XINCLUDE_START nodes, the children of the original xi:include node must be freed, otherwise fallback content is copied twice, doubling runtime and memory consumption for each nested xi:fallback/xi:include pair. Found with libFuzzer.
Nick Wellnhofer 11b57459 2020-08-07T18:39:19 Don't process siblings of root in xmlXIncludeProcess xmlXIncludeDoProcess would follow the siblings of the tree root and also expand these nodes. When using an XML reader, this could lead to siblings of the current node being expanded without having been parsed completely.
Nick Wellnhofer 0f9817c7 2020-06-10T16:34:52 Don't recurse into xi:include children in xmlXIncludeDoProcess Otherwise, nested xi:include nodes might result in a use-after-free if XML_PARSE_NOXINCNODE is specified. Found with libFuzzer and ASan.
Nick Wellnhofer 5725c115 2020-06-10T15:11:40 Fix memory leak in xmlXIncludeIncludeNode error paths Found with libFuzzer and ASan.
Nick Wellnhofer ad26a60f 2020-08-06T13:20:01 Add XPath and XPointer fuzzer
Nick Wellnhofer 956534e0 2020-08-04T19:27:13 Check for custom free function in global destructor Calling a custom deallocation function in the global destructor could cause all kinds of unexpected problems. See for example https://github.com/sparklemotion/nokogiri/issues/2059 Only clean up if memory is managed with malloc/free.
Nick Wellnhofer 8e7c20a1 2020-08-03T17:30:41 Fix integer overflow when comparing schema dates Found by OSS-Fuzz.
Nick Wellnhofer 905820a4 2020-07-12T22:59:39 Update fuzzing code - Shorten timeouts - Align options from Makefile and options files - Add section headers to Makefile - Skip invalid UTF-8 in regexp fuzzer - Update regexp.dict - Generate HTML seed corpus in correct format
Nick Wellnhofer 68eadabd 2020-07-11T21:32:10 Fix exponential runtime in xmlFARecurseDeterminism In order to prevent visiting a state twice, states must be marked as visited for the whole duration of graph traversal because states might be reached by different paths. Otherwise state graphs like the following can lead to exponential runtime: ->O-->O-->O-->O-->O-> \ / \ / \ / \ / O O O O Reset the "visited" flag only after the graph was traversed. xmlFAComputesDeterminism still has massive performance problems when handling fuzzed input. By design, it has quadratic time complexity in the number of reachable states. Some issues might also stem from redundant epsilon transitions. With this fix, fuzzing regexes with a maximum length of 100 becomes feasible at least. Found with libFuzzer.
Nick Wellnhofer 1a360c1c 2020-07-29T00:39:15 More *NodeDumpOutput fixes When leaving nodes, restrict more operations to XML_ELEMENT_NODEs.
Nick Wellnhofer 7b2e5172 2020-07-28T21:52:55 Fix *NodeDumpOutput functions Only output end tag for elements. Should fix serialization of document fragments.
Nick Wellnhofer dc6f0092 2020-07-28T19:07:19 Make xmlNodeDumpOutputInternal non-recursive Fixes stack overflow with deeply nested documents.
Nick Wellnhofer 5330153d 2020-07-28T18:33:50 Make xhtmlNodeDumpOutput non-recursive Fixes stack overflow with deeply nested documents.
Nick Wellnhofer b79ab6e6 2020-07-28T02:42:37 Make htmlNodeDumpFormatOutput non-recursive Fixes stack overflow with deeply nested HTML documents. Found by OSS-Fuzz.
Nick Wellnhofer 21ca8829 2020-07-25T17:57:29 Don't try to handle namespaces when building HTML documents Don't try to resolve namespace in xmlSAX2StartElement when parsing HTML documents. This useless operation could slow down the parser considerably. Found by OSS-Fuzz.
Nick Wellnhofer 93ce33c2 2020-07-23T17:34:08 Fix several quadratic runtime issues in HTML push parser Fix a few remaining cases where the HTML push parser would scan more content during lookahead than being parsed later. Make sure that htmlParseDocTypeDecl consumes all content up to the final '>' in case of errors. The old comment said "We shouldn't try to resynchronize", but ignoring invalid content is also what the HTML5 spec mandates. Likewise, make htmlParseEndTag skip to the final '>' in invalid end tags even if not in recovery mode. This is probably the most visible change in practice and leads to different output for some tests but is also more in line with HTML5. Make sure that htmlParsePI and htmlParseComment don't abort if invalid characters are encountered but log an error and ignore the character. Change some other end-of-buffer checks to test for a zero byte instead of relying on IS_CHAR. Fix usage of IS_CHAR macro in htmlParseScript.
Nick Wellnhofer 10d09472 2020-07-23T19:16:21 Fix .gitattributes The files in 'test' and 'result' have mixed line endings, so disable end-of-line conversion.