Log

Author Commit Date CI Message
Nick Wellnhofer 19cae17f 2020-08-19T13:07:28 Revert "Fix quadratic runtime in xi:fallback processing" This reverts commit 27119ec33c9f6b9830efa1e0da0acfa353dfa55a. Not copying fallback children didn't fix up namespaces and could lead to use-after-free errors. Found by OSS-Fuzz.
Nick Wellnhofer d63cfeca 2020-08-17T15:40:06 Add TODO comment in xinclude.c Add some thoughts on the major remaining problems with the XInclude implementation.
Nick Wellnhofer 804c5297 2020-08-17T03:37:18 Stop using maxParserDepth in xpath.c Only use a single maxDepth value.
Nick Wellnhofer 74dcc10b 2020-08-17T03:24:56 Remove dead code in xinclude.c 'doc' is checked for NULL in xmlXIncludeLoadDoc, so several code paths can be eliminated.
Nick Wellnhofer 0ff52748 2020-08-17T02:54:28 Fix autotools warnings
Nick Wellnhofer d88df4bd 2020-08-16T23:38:48 Fix corner case with empty xi:fallback xi:fallback could become empty after recursive expansion. Use a flag to track whether nodes should be skipped.
Nick Wellnhofer 00a86d41 2020-08-16T23:38:00 Don't add formatting newlines to XInclude nodes
Nick Wellnhofer dba82a8c 2020-08-16T23:02:20 Fix XInclude regression introduced with recent commit The change to xmlXIncludeLoadFallback in commit 11b57459 could process already freed nodes if text nodes were merged after deleting nodes with an empty fallback. Found by OSS-Fuzz.
Nick Wellnhofer e1c2d0ad 2020-08-16T22:22:57 Fix memory leak in runtest.c
Nick Wellnhofer 2c747129 2020-08-17T00:54:12 Fix error reporting with xi:fallback When reporting errors, don't use href of xi:include if xi:fallback was used. I think this can only be reproduced with "xmllint --postvalid", see the original bug report: https://bugzilla.gnome.org/show_bug.cgi?id=152623
Nick Wellnhofer 2b4769a6 2020-08-16T22:02:04 Make "xmllint --push --recovery" work
Nick Wellnhofer 99fc048d 2020-08-14T14:18:50 Don't use SAX1 if all element handlers are NULL Running xmllint with "--sax --noout" installs a SAX2 handler with all callbacks set to NULL. In this case or similar situations, we don't want to switch to SAX1 parsing.
Nick Wellnhofer 27119ec3 2020-08-17T00:05:19 Fix quadratic runtime in xi:fallback processing Copying the tree would lead to runtime quadratic in nested fallback depth, similar to naive string concatenation.
Nick Wellnhofer c1ba6f54 2020-08-15T18:32:29 Revert "Do not URI escape in server side includes" This reverts commit 960f0e275616cadc29671a218d7fb9b69eb35588. This commit introduced - an infinite loop, found by OSS-Fuzz, which could be easily fixed. - an algorithm with quadratic runtime - a security issue, see https://bugzilla.gnome.org/show_bug.cgi?id=769760 A better approach is to add an option not to escape URLs at all which libxml2 should have possibly done in the first place.
Nick Wellnhofer b82fa3dd 2020-08-09T14:50:46 Fix column number accounting in xmlParse*NameAndCompare Thanks to Frederic Vancraeyveldt for the report.
Nick Wellnhofer 438e595a 2020-08-09T14:43:53 Stop counting nbChars in parser context The value was inaccurate and never used.
Nick Wellnhofer f6a9541f 2020-08-09T14:29:35 Remove unneeded progress checks in HTML parser The HTML parser should now be guaranteed to make progress, so the checks became unnecessary.
Nick Wellnhofer 9de7b94d 2020-08-08T20:37:30 Use strcmp when fuzzing This should improve data-flow-guided fuzzing.
Nick Wellnhofer 10a07948 2020-08-08T17:46:11 Fix XPath fuzzer
Nick Wellnhofer 6c128fd5 2020-06-05T13:43:45 Fuzz XInclude engine
Nick Wellnhofer 50f06b3e 2020-08-07T21:54:27 Fix out-of-bounds read with 'xmllint --htmlout' Make sure that truncated UTF-8 sequences don't cause an out-of-bounds array access. Thanks to @SuhwanSong and the Agency for Defense Development (ADD) for the report. Fixes #178.
Nick Wellnhofer 1abf2967 2020-08-06T17:51:57 Fix exponential runtime and memory in xi:fallback processing When creating XML_XINCLUDE_START nodes, the children of the original xi:include node must be freed, otherwise fallback content is copied twice, doubling runtime and memory consumption for each nested xi:fallback/xi:include pair. Found with libFuzzer.
Nick Wellnhofer 11b57459 2020-08-07T18:39:19 Don't process siblings of root in xmlXIncludeProcess xmlXIncludeDoProcess would follow the siblings of the tree root and also expand these nodes. When using an XML reader, this could lead to siblings of the current node being expanded without having been parsed completely.
Nick Wellnhofer 0f9817c7 2020-06-10T16:34:52 Don't recurse into xi:include children in xmlXIncludeDoProcess Otherwise, nested xi:include nodes might result in a use-after-free if XML_PARSE_NOXINCNODE is specified. Found with libFuzzer and ASan.
Nick Wellnhofer 5725c115 2020-06-10T15:11:40 Fix memory leak in xmlXIncludeIncludeNode error paths Found with libFuzzer and ASan.
Nick Wellnhofer ad26a60f 2020-08-06T13:20:01 Add XPath and XPointer fuzzer
Nick Wellnhofer 956534e0 2020-08-04T19:27:13 Check for custom free function in global destructor Calling a custom deallocation function in the global destructor could cause all kinds of unexpected problems. See for example https://github.com/sparklemotion/nokogiri/issues/2059 Only clean up if memory is managed with malloc/free.
Nick Wellnhofer 8e7c20a1 2020-08-03T17:30:41 Fix integer overflow when comparing schema dates Found by OSS-Fuzz.
Nick Wellnhofer 905820a4 2020-07-12T22:59:39 Update fuzzing code - Shorten timeouts - Align options from Makefile and options files - Add section headers to Makefile - Skip invalid UTF-8 in regexp fuzzer - Update regexp.dict - Generate HTML seed corpus in correct format
Nick Wellnhofer 68eadabd 2020-07-11T21:32:10 Fix exponential runtime in xmlFARecurseDeterminism In order to prevent visiting a state twice, states must be marked as visited for the whole duration of graph traversal because states might be reached by different paths. Otherwise state graphs like the following can lead to exponential runtime: ->O-->O-->O-->O-->O-> \ / \ / \ / \ / O O O O Reset the "visited" flag only after the graph was traversed. xmlFAComputesDeterminism still has massive performance problems when handling fuzzed input. By design, it has quadratic time complexity in the number of reachable states. Some issues might also stem from redundant epsilon transitions. With this fix, fuzzing regexes with a maximum length of 100 becomes feasible at least. Found with libFuzzer.
Nick Wellnhofer 1a360c1c 2020-07-29T00:39:15 More *NodeDumpOutput fixes When leaving nodes, restrict more operations to XML_ELEMENT_NODEs.
Nick Wellnhofer 7b2e5172 2020-07-28T21:52:55 Fix *NodeDumpOutput functions Only output end tag for elements. Should fix serialization of document fragments.
Nick Wellnhofer dc6f0092 2020-07-28T19:07:19 Make xmlNodeDumpOutputInternal non-recursive Fixes stack overflow with deeply nested documents.
Nick Wellnhofer 5330153d 2020-07-28T18:33:50 Make xhtmlNodeDumpOutput non-recursive Fixes stack overflow with deeply nested documents.
Nick Wellnhofer b79ab6e6 2020-07-28T02:42:37 Make htmlNodeDumpFormatOutput non-recursive Fixes stack overflow with deeply nested HTML documents. Found by OSS-Fuzz.
Nick Wellnhofer 21ca8829 2020-07-25T17:57:29 Don't try to handle namespaces when building HTML documents Don't try to resolve namespace in xmlSAX2StartElement when parsing HTML documents. This useless operation could slow down the parser considerably. Found by OSS-Fuzz.
Nick Wellnhofer 93ce33c2 2020-07-23T17:34:08 Fix several quadratic runtime issues in HTML push parser Fix a few remaining cases where the HTML push parser would scan more content during lookahead than being parsed later. Make sure that htmlParseDocTypeDecl consumes all content up to the final '>' in case of errors. The old comment said "We shouldn't try to resynchronize", but ignoring invalid content is also what the HTML5 spec mandates. Likewise, make htmlParseEndTag skip to the final '>' in invalid end tags even if not in recovery mode. This is probably the most visible change in practice and leads to different output for some tests but is also more in line with HTML5. Make sure that htmlParsePI and htmlParseComment don't abort if invalid characters are encountered but log an error and ignore the character. Change some other end-of-buffer checks to test for a zero byte instead of relying on IS_CHAR. Fix usage of IS_CHAR macro in htmlParseScript.
Nick Wellnhofer 10d09472 2020-07-23T19:16:21 Fix .gitattributes The files in 'test' and 'result' have mixed line endings, so disable end-of-line conversion.
Nick Wellnhofer 173a0830 2020-07-22T23:15:35 Fix quadratic runtime when push parsing HTML start tags Make sure that htmlParseStartTag doesn't terminate on characters for which IS_CHAR_CH is false like control chars. In htmlParseTryOrFinish, only switch to START_TAG if the next character starts a valid name. Otherwise, htmlParseStartTag might return without consuming all characters up to the final '>'. Found by OSS-Fuzz.
David Kilzer 0e5c4fec 2020-07-13T15:20:45 Reset XML parser input before reporting errors Apply changes to htmlParseChunk() in 13ba5b61 and 3f18e748 to xmlParseChunk().
Nick Wellnhofer 6995eed0 2020-07-19T13:54:52 Fix quadratic runtime when push parsing HTML entity refs The HTML push parser would look ahead for characters in "; >/" to terminate an entity reference but actual parsing could stop earlier, potentially resulting in quadratic runtime. Parse char data and references alternately in htmlParseTryOrFinish and only look ahead once for a terminating '<' character. Found by OSS-Fuzz.
Nick Wellnhofer 8e219b15 2020-07-12T21:43:44 Fix HTML push parser lookahead The parsing rules when looking for terminating chars or sequences in the push parser differed from the actual parsing code. This could result in the lookahead to overshoot and data being rescanned, potentially leading to quadratic runtime. Comments must never be handled during lookahead. Attribute values must only be skipped for start tags and doctype declarations, not for end tags, comments, PIs and script content.
Nick Wellnhofer e050062c 2020-07-15T14:38:55 Make htmlCurrentChar always translate U+0000 The general assumption is that htmlCurrentChar only returns 0 if the end of the input buffer is reached. The UTF-8 path already logged an error if a zero byte U+0000 was found and returned a space character instead. Make the ASCII code path do the same. htmlParseTryOrFinish skips zero bytes at the beginning of a buffer, so even if 0 was returned from htmlCurrentChar, the push parser would make progress. But rescanning the input could cause performance problems. The pull parser would abort parsing and now handles zero bytes in ASCII mode the same way as the push parser or as in UTF-8 mode. It would be better to return the replacement character U+FFFD instead, but some of the client code assumes that the UTF-8 length of input and output matches.
Nick Wellnhofer dfd4e330 2020-07-15T14:22:08 Rework control flow in htmlCurrentChar Don't call xmlCurrentChar after switching encodings. Rearrange code blocks and fall through to normal UTF-8 handling.
Nick Wellnhofer 922bebcc 2020-07-15T14:20:42 Make 'xmllint --html --push -' read from stdin
Nick Wellnhofer 1493130e 2020-07-15T12:54:25 Fix UTF-8 decoder in HTML parser Reject sequences starting with a continuation byte as well as overlong sequences like the XML parser. Also fixes an infinite loop in connection with previous commit 50078922 since htmlCurrentChar would return 0 even if not at the end of the buffer. Found by OSS-Fuzz.
Nick Wellnhofer beb7d71a 2020-07-13T12:41:19 Remove misleading comments in xpath.c Fixes #169
Nick Wellnhofer 50078922 2020-07-12T20:28:47 Fix quadratic runtime when parsing HTML script content If htmlParseScript returns upon hitting an invalid character, htmlParseLookupSequence will be called again with checkIndex reset to zero, potentially resulting in quadratic runtime. Make sure that htmlParseScript consumes all input in one go and simply skips over invalid characters similar to htmlParseCharDataInternal. Found by OSS-Fuzz.
Andre Klapper d6761e70 2020-07-13T11:59:45 Update to Devhelp index file format version 2 Fixes #89
Markus Rickert d514e2bd 2020-07-12T18:42:49 Set project language to C
Markus Rickert 5ddf02f2 2020-06-07T16:06:17 Update config.h.cmake.in
Markus Rickert 8bec210d 2020-06-04T17:37:21 Add variable for working directory of XML Conformance Test Suite
Markus Rickert 270e1655 2020-06-04T14:45:48 Add additional tests and XML Conformance Test Suite
Markus Rickert e6ba4bd7 2020-06-04T11:58:04 Add command line option for temp directory in runtest
Markus Rickert 40e7ceaa 2020-06-04T11:57:28 Ensure LF line endings for test files
Markus Rickert 9ecf5ad6 2020-06-04T00:16:15 Enable runtests and testThreads
Nick Wellnhofer 3f18e748 2020-07-11T14:34:57 Reset HTML parser input before reporting error Avoid use-after-free, similar to 13ba5b61. Also make sure that xmlBufSetInputBaseCur sets valid pointers in case of buffer errors. Found by OSS-Fuzz.
Nick Wellnhofer 3da8d947 2020-07-09T16:08:38 Fix more quadratic runtime issues in HTML push parser Make sure that checkIndex is set when returning without match from inside a comment. Also track parser state in htmlParseLookupChars. Found by OSS-Fuzz.
Nick Wellnhofer 741b0d0a 2020-07-07T12:54:34 Fix regression introduced with 477c7f6a The 'inSubset' member is actually used by the SAX2 handlers. Store extra parser state in 'hasPErefs'.
Nick Wellnhofer fc842f6e 2020-07-06T15:22:12 Limit regexp nesting depth Enforce a maximum nesting depth of 50 for regular expressions. Avoids stack overflows with deeply nested regexes. Found by OSS-Fuzz.
Nick Wellnhofer 1e41e4fa 2020-06-30T02:43:57 Fix return values and documentation in encoding.c Make xmlEncInputChunk and xmlEncOutputChunk return 0 on success and never a positive value. Make xmlCharEncFirstLineInt, xmlCharEncFirstLineInt and xmlCharEncOutFunc return the number of bytes written.
David Kilzer 6b4717d6 2020-07-06T12:36:27 Add regexp regression tests - Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup <https://bugzilla.gnome.org/show_bug.cgi?id=757711> - Bug 783015 - Integer-overflow in xmlFAParseQuantExact <https://bugzilla.gnome.org/show_bug.cgi?id=783015> (Regexptests): Add support for checking stderr output when running regexp tests. This makes it possible to check in test cases that fail and not see false-positive error output when running the tests. Unlike other libxml2 test suites, if there is no stderr output, no *.err file needs to be created.
Nick Wellnhofer 477c7f6a 2020-06-28T15:54:23 Fix quadratic runtime in HTML parser Commit eeb99329 removed an important optimization avoiding quadratic runtime when repeatedly scanning the input buffer for terminating characters in the HTML push parser. The related bug is https://bugzilla.gnome.org/show_bug.cgi?id=444994 Make sure that ctxt->checkIndex is always written and store additional parser state in ctxt->inSubset which is unused in the HTML parser. Found by OSS-Fuzz.
Nick Wellnhofer f8329fdc 2020-07-02T11:51:31 Report error for invalid regexp quantifiers
Nick Wellnhofer 13ba5b61 2020-06-28T13:16:46 Reset HTML parser input before reporting encoding error If charset conversion fails, reset the input pointers before reporting the error and bailing out. Otherwise, the input pointers are left in an invalid state which could lead to use-after-free and other memory errors. Similar to f9e7997e. Found by OSS-Fuzz.
Nick Wellnhofer 1e7851b5 2020-06-25T12:17:50 Fix integer overflow in xmlFAParseQuantExact Found by OSS-Fuzz.
Nick Wellnhofer 84bab955 2020-06-24T20:07:32 Fix return value of xmlC14NDocDumpMemory Make sure to return -1 in case of buffer errors. Fixes #174.
Martin Vidner 43a8836c 2020-05-31T18:46:21 Fix rebuilding docs, by hiding __attribute__((...)) behind a macro. When enabled via `./configure --enable-rebuild-docs`, `make -C doc libxml2-api.xml` will invoke apibuild.py to rebuild libxml2-api.xml from the sources. But the code added in 9fa3200cb366c726f7c8ef234282603bb9e8816d made it error out with ``` Parsing ../parser.c Parse Error: parsing type : expecting a name ('Got token ', ('sep', '(')) ('Last token: ', ('sep', '(')) ('Token queue: ', [('name', 'destructor'), ('sep', ')'), ('sep', ')')]) ('Line 14689 end: ', '') ```
Nick Wellnhofer 9f42f6ba 2020-06-24T15:33:38 Don't follow next pointer on documents in xmlXPathRunStreamEval RVTs from libxslt are document nodes which are linked using the 'next' pointer. These pointers must never be used to navigate the document tree. Otherwise, random content from other RVTs could be returned when evaluating XPath expressions. It's interesting that this seemingly long-standing bug wasn't discovered earlier. This issue could also cause severe performance degradation. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/37
Nick Wellnhofer c0440868 2020-06-22T13:08:11 Copy xs:duration parser from libexslt The duration parser in libexslt checks for integer overflows.
Nick Wellnhofer 18425d3a 2020-06-21T19:14:23 Fix integer overflow in _xmlSchemaParseGYear Found with libFuzzer and UBSan.
Nick Wellnhofer 070d635e 2020-06-21T16:26:38 Fix integer overflow when parsing {min,max}Occurs Clamp value to INT_MAX. Found with libFuzzer and UBSan.
Nick Wellnhofer 50f18830 2020-06-21T15:21:45 Fix another memory leak in xmlSchemaValAtomicType Don't collapse language IDs twice. Found with libFuzzer and ASan.
Nick Wellnhofer eac1c7e2 2020-06-21T14:42:00 Fuzz target for XML Schemas This only tests the schema parser for now.
Nick Wellnhofer ffd31dbe 2020-06-21T12:14:19 Move entity recorder to fuzz.c
Nick Wellnhofer 681f094e 2020-06-15T15:23:05 Fix unsigned integer overflow in htmlParseTryOrFinish Cast to signed type before subtraction to avoid unsigned integer overflow. Also use ptrdiff_t to avoid potential integer truncation. Found with libFuzzer and UBSan.
Nick Wellnhofer 31ca4a72 2020-06-15T18:47:53 Fix integer overflow in htmlParseCharRef Fixes #115.
Nick Wellnhofer 2f938203 2020-06-15T15:45:47 Fix undefined behavior in UTF16LEToUTF8 Don't perform arithmetic on null pointer. Found with libFuzzer and UBSan.
Nick Wellnhofer 536f421d 2020-06-15T12:20:54 Fuzz target for HTML parser
Nick Wellnhofer a697ed1e 2020-06-15T14:49:22 Fix return value of xmlCharEncOutput Commit 407b393d introduced a regression caused by xmlCharEncOutput returning 0 in case of success instead of the number of bytes written. Always use its return value for nbchars in xmlOutputBufferWrite. Fixes #166.
Nick Wellnhofer af893a58 2020-06-11T16:08:16 Update GitLab CI container
Nick Wellnhofer a28f7d87 2020-06-10T13:41:13 Never expand parameter entities in text declaration When parsing the text declaration of external DTDs or entities, make sure that parameter entities are not expanded. This also fixes a memory leak in certain error cases. The change to xmlSkipBlankChars assumes that the parser state is maintained correctly when parsing external DTDs or parameter entities, and might expose bugs in the code that were hidden previously. Found by OSS-Fuzz.
Nick Wellnhofer 487871b0 2020-06-10T13:23:43 Fix undefined behavior in xmlXPathTryStreamCompile &NULL[0] is undefined behavior.
Nick Wellnhofer e98150d4 2020-06-09T13:45:31 Add options file for xml fuzzer This will be picked up OSS-Fuzz, limiting the maximum input size to 80 KB and hopefully avoiding timeouts. Some of the timeouts seem to be related to our suboptimal handling of excessive entity expansion. The new fuzzers support external entities and make this problem even more prominent.
Nick Wellnhofer 2af3c2a8 2020-06-08T12:49:51 Fix use-after-free with validating reader Just like IDs, IDREF attributes must be removed from the document's refs table when they're freed by a reader. This bug is often hidden because xmlAttr structs are reused and strings are stored in a dictionary unless XML_PARSE_NODICT is specified. Found by OSS-Fuzz.
Nick Wellnhofer 00ed736e 2020-06-05T12:49:25 Add a couple of libFuzzer targets - XML fuzzer Currently tests the pull parser, push parser and reader, as well as serialization. Supports splitting fuzz data into multiple documents for things like external DTDs or entities. The seed corpus is built from parts of the test suite. - Regexp fuzzer Seed corpus was statically generated from test suite. - URI fuzzer Tests parsing and most other functions from uri.c.
Nick Wellnhofer 2e8cc66d 2020-05-30T15:40:08 xmlParseBalancedChunkMemory must not be called with NULL doc There is no way to avoid memory leaks without a document to hold the namespace list.
Nick Wellnhofer a0a8059b 2020-05-30T15:33:03 Revert "Fix memory leak in xmlParseBalancedChunkMemoryRecover" This reverts commit 5a02583c7e683896d84878bd90641d8d9b0d0549. Fixes #161.
Nick Wellnhofer ff009f99 2020-05-30T15:32:25 Fix memory leak in xmlXIncludeLoadDoc error path Found by OSS-Fuzz.
Michael Stahl a230b728 2020-04-10T19:22:07 win32: allow passing *FLAGS on command line nmake is a primitive tool, so this is a primitive implementation: append EXTRA_CFLAGS etc. variables. Command line variables should be appended to allow overriding flags set in the makefile. It doesn't work to pass in CFLAGS like in make because that always overrides the assignments in the makefile.
Nick Wellnhofer 4f2aee18 2020-05-04T14:03:52 Make schema validation fail with multiple top-level elements Closes #126.
Daniel Cheng 106757e8 2020-04-10T14:52:03 Guard new calls to xmlValidatePopElement in xml_reader.c Closes #154.
Łukasz Wojniłowicz 386fb276 2020-04-28T17:00:37 Add LIBXML_VALID_ENABLED to xmlreader There are already LIBXML_VALID_ENABLED in this file to guard against "--without-valid" at "./configure" step, but here they were missing.
Markus Rickert e7ff2efc 2020-04-21T21:16:07 Configure file xmlwin32version.h.in on MSVC
Markus Rickert e2f10494 2020-04-21T21:04:23 List headers individually
Markus Rickert 2a2c38f3 2020-04-21T00:53:12 Add CMake build files Closes #24.
Samuel Thibault 9fa3200c 2020-03-31T23:18:25 Call xmlCleanupParser on ELF destruction Fixes #153.
Miro Hrončok e4fb3684 2020-02-28T12:48:14 Parenthesize Py<type>_Check() in ifs In C, if expressions should be parenthesized. PyLong_Check, PyUnicode_Check etc. happened to expand to a parenthesized expression before, but that's not API to rely on. Since Python 3.9.0a4 it needs to be parenthesized explicitly. Fixes https://gitlab.gnome.org/GNOME/libxml2/issues/149
Nick Wellnhofer 20c60886 2020-03-08T17:19:42 Fix typos Resolves #133.
Nick Wellnhofer 2a7b6684 2020-03-02T11:52:52 Disable LeakSanitizer The GitLab runner doesn't run in privileged mode anymore [1], at least for projects outside the GNOME group. Disable LeakSanitizer for now as it needs the ptrace capability. [1] https://gitlab.gnome.org/Infrastructure/Infrastructure/issues/251