kmx git

Commit	Date	Message
652dd12a	2022-02-08T03:29:24	[CVE-2022-23308] Use-after-free of ID and IDREF attributes If a document is parsed with XML_PARSE_DTDVALID and without XML_PARSE_NOENT, the value of ID attributes has to be normalized after potentially expanding entities in xmlRemoveID. Otherwise, later calls to xmlGetID can return a pointer to previously freed memory. ID attributes which are empty or contain only whitespace after entity expansion are affected in a similar way. This is fixed by not storing such attributes in the ID table. The test to detect streaming mode when validating against a DTD was broken. In connection with the defects above, this could result in a use-after-free when using the xmlReader interface with validation. Fix detection of streaming mode to avoid similar issues. (This changes the expected result of a test case. But as far as I can tell, using the XML reader with XIncludes referencing the root document never worked properly, anyway.) All of these issues can result in denial of service. Using xmlReader with validation could result in disclosure of memory via the error channel, typically stderr. The security impact of xmlGetID returning a pointer to freed memory depends on the application. The typical use case of calling xmlGetID on an unmodified document is not affected.
9edc20c1	2022-02-07T20:38:30	Fix double counting of CRLF in comments Fixes #151.
5408c10c	2022-02-04T14:00:09	Don't normalize namespace URIs in XPointer xmlns() scheme Namespace URIs should be compared without escaping or unescaping: https://www.w3.org/TR/REC-xml-names/#NSNameComparison Fixes #289.
1c7d91ab	2022-02-03T23:31:19	Fix handling of XSD with empty namespace An empty namespace means no default namespace. Fixes #303.
f480f750	2022-02-03T14:43:17	Update NewsML DTD in test suite Switch to version 1.2 which has a clearer license. Fixes #291.
d85245f9	2022-01-16T21:39:04	Fix regression with PEs in external DTD Fix a regression introduced with commit a28f7d87. In some cases, parameter entity references in external DTDs wouldn't be expanded. Fixes #306.
03bb9293	2021-07-07T18:23:18	Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8(). * encoding.c: (UTF16LEToUTF8): - Fix comment to describe what the code does. (UTF16BEToUTF8): - Fix undefined behavior which was applied to UTF16LEToUTF8() in 2f9382033e. - Add bounds check to while() loop which was applied to UTF16LEToUTF8() in be803967db. - Do not return -2 when (in >= inend) to fix the bug. This was applied to UTF16LEToUTF8() in 496a1cf592. - Inline (<< 8) statements to match UTF16LEToUTF8(). Add the following tests and results: test/text-4-byte-UTF-16-BE-offset.xml test/text-4-byte-UTF-16-BE.xml test/text-4-byte-UTF-16-LE-offset.xml test/text-4-byte-UTF-16-LE.xml
2732b234	2022-01-10T13:32:14	Fix regression parsing public IDs literals in HTML Fix regression introduced when reworking htmlParsePubidLiteral in commit 93ce33c2. Fixes #318.
de5b624f	2021-05-08T20:21:29	Fix handling of unexpected EOF in xmlParseContent Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was removed in commit 62150ed2. This commit also introduced a regression for direct users of xmlParseContent. Unclosed tags weren't checked.
3e80560d	2021-05-07T10:51:38	Fix line numbers in error messages for mismatched tags Commit 62150ed2 introduced a small regression in the error messages for mismatched tags. This typically only affected messages after the first mismatch, but with custom SAX handlers all line numbers would be off. This also fixes line numbers in the SAX push parser which were never handled correctly.
01411e7c	2021-02-08T20:58:32	Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
79301d3d	2020-12-18T12:50:21	Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.
a67b63d1	2020-10-11T14:15:37	use new htmlParseLookupCommentEnd to find comment ends Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
29f5d20e	2020-08-03T17:36:05	htmlParseComment: treat `--!>` as if it closed the comment See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
e28d9347	2020-08-04T14:53:19	add test coverage for incorrectly-closed comments this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.
87d20b55	2020-08-19T13:52:08	Fix regression introduced with commit 74dcc10b The code wasn't dead after all, but I can see no reason in delaying the XPointer evaluation. This could lead to nodes included earlier appearing in XPointer results.
d88df4bd	2020-08-16T23:38:48	Fix corner case with empty xi:fallback xi:fallback could become empty after recursive expansion. Use a flag to track whether nodes should be skipped.
1abf2967	2020-08-06T17:51:57	Fix exponential runtime and memory in xi:fallback processing When creating XML_XINCLUDE_START nodes, the children of the original xi:include node must be freed, otherwise fallback content is copied twice, doubling runtime and memory consumption for each nested xi:fallback/xi:include pair. Found with libFuzzer.
0f9817c7	2020-06-10T16:34:52	Don't recurse into xi:include children in xmlXIncludeDoProcess Otherwise, nested xi:include nodes might result in a use-after-free if XML_PARSE_NOXINCNODE is specified. Found with libFuzzer and ASan.
93ce33c2	2020-07-23T17:34:08	Fix several quadratic runtime issues in HTML push parser Fix a few remaining cases where the HTML push parser would scan more content during lookahead than being parsed later. Make sure that htmlParseDocTypeDecl consumes all content up to the final '>' in case of errors. The old comment said "We shouldn't try to resynchronize", but ignoring invalid content is also what the HTML5 spec mandates. Likewise, make htmlParseEndTag skip to the final '>' in invalid end tags even if not in recovery mode. This is probably the most visible change in practice and leads to different output for some tests but is also more in line with HTML5. Make sure that htmlParsePI and htmlParseComment don't abort if invalid characters are encountered but log an error and ignore the character. Change some other end-of-buffer checks to test for a zero byte instead of relying on IS_CHAR. Fix usage of IS_CHAR macro in htmlParseScript.
6b4717d6	2020-07-06T12:36:27	Add regexp regression tests - Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup <https://bugzilla.gnome.org/show_bug.cgi?id=757711> - Bug 783015 - Integer-overflow in xmlFAParseQuantExact <https://bugzilla.gnome.org/show_bug.cgi?id=783015> (Regexptests): Add support for checking stderr output when running regexp tests. This makes it possible to check in test cases that fail and not see false-positive error output when running the tests. Unlike other libxml2 test suites, if there is no stderr output, no *.err file needs to be created.
477c7f6a	2020-06-28T15:54:23	Fix quadratic runtime in HTML parser Commit eeb99329 removed an important optimization avoiding quadratic runtime when repeatedly scanning the input buffer for terminating characters in the HTML push parser. The related bug is https://bugzilla.gnome.org/show_bug.cgi?id=444994 Make sure that ctxt->checkIndex is always written and store additional parser state in ctxt->inSubset which is unused in the HTML parser. Found by OSS-Fuzz.
32cb5dcc	2020-02-11T13:16:10	Add test case for recursive external parsed entities
f20daa9e	2020-02-11T13:13:52	Enable error tests with entity substitution
eddfbc38	2020-01-22T22:03:45	Don't load external entity from xmlSAX2GetEntity Despite the comment, I can't see a reason why external entities must be loaded in the SAX handler. For external entities, the handler is typically first invoked via xmlParseReference which will later load the entity on its own if it wasn't loaded yet. The old code also lead to duplicated SAX events which makes it basically impossible to reuse xmlSAX2GetEntity for a custom SAX parser. See the change to the expected test output. Note that xmlSAX2GetEntity was loading the entity via xmlParseCtxtExternalEntity while xmlParseReference uses xmlParseExternalEntityPrivate. In the previous commit, the two functions were merged, trying to compensate for some slight differences between the two mostly identical implementations. But the more urgent reason for this change is that xmlParseReference has the facility to abort early when recursive entities are detected, avoiding what could practically amount to an infinite loop. If you want to backport this change, note that the previous three commits are required as well: f9ea1a24 Fix copying of entities in xmlParseReference 5c7e0a9a Copy some XMLReader option flags to parser context 1a3e584a Merge code paths loading external entities Found by OSS-Fuzz.
f9ea1a24	2020-02-11T16:17:34	Fix copying of entities in xmlParseReference Before, reader mode would end up in a branch that didn't handle entities with multiple children and failed to update ent->last, so the hack copying the "extra" reader data wouldn't trigger. Consequently, some empty nodes in entities are correctly detected now in the test suite. (The detection of empty nodes in entities is still buggy, though.)
2a350ee9	2019-09-30T17:04:54	Large batch of typo fixes Closes #109.
c2f209c0	2019-09-30T14:13:21	Disallow conditional sections in internal subset Conditional sections are only allowed in external parameter entities referenced from the internal subset.
c51e38cb	2019-09-30T13:50:02	Make xmlParseConditionalSections non-recursive Avoid call stack overflow in deeply nested conditional sections. Found by OSS-Fuzz.
99a864a1	2019-09-25T15:27:45	Fix Regextests - One of the bug316338 test cases is expected to succeed. - Memory leak in testRegexp.c. - Refcount handling in xmlExpHashGetEntry.
c2b0a184	2019-09-25T13:57:42	Fix empty branch in regex Fixes bug 649244: https://bugzilla.gnome.org/show_bug.cgi?id=649244 Closes #57.
62150ed2	2019-09-23T14:46:41	Make xmlParseContent and xmlParseElement non-recursive Split xmlParseElement into subfunctions. Use nameNsPush to store prefix, URI and nsNr on the heap, similar to the push parser. Closes #84.
6705f4d2	2019-09-16T15:45:27	Remove executable bit from non-executable files
eee1dd5a	2019-09-16T15:36:44	Fix expected output of test/schemas/any4 libxml2 correctly rejects any4_0.xsd as invalid schema. I can't figure out what the intent behind this test case was. Simply adjust the expected output to match the current behavior. Closes #92.
e8c9cd5c	2019-09-16T15:36:02	Fix Schema determinism check of ##other namespaces Non-compound (##local) and compound string atoms are always disjoint regardless of whether the compound atom is negated (##other). Closes #40.
01d8cf07	2019-08-15T15:15:42	Misleading error message with xs:{min\|max}Inclusive Closes #53.
ea695ac0	2019-08-09T15:09:22	Fix unability to RelaxNG-validate grammar with choice-based name class Previously, test/relaxng/ambig_name-class2.xml would fail to validate against test/relaxng/ambig_name-class2.rng: > test/relaxng/ambig_name-class2.rng:4: > element attribute: Relax-NG parser error : > Found anyName attribute without oneOrMore ancestor > Relax-NG schema test/relaxng/ambig_name-class2.rng failed to compile Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
8074b881	2019-08-08T23:33:48	Fix unability to validate ambiguously constructed interleave for RelaxNG Previously, test/relaxng/ambig_name-class.xml would fail to validate for a simple reason -- interleave within "open-name-class" context is supposed to be fine with whatever else is pending the consumption, since effectively, it's unrelated from a higher parsing perspective. Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
f9fce963	2019-05-16T21:16:01	Fix unsigned integer overflow It's defined behavior but -fsanitize=unsigned-integer-overflow is useful to discover bugs.
c2f4da1a	2017-05-21T22:08:50	Improve XPath predicate and filter evaluation Consolidate code paths evaluating XPath predicates and filters. Don't push context node on stack when evaluating predicates. I have no idea why this was done. It seems completely useless and trying to pop the context node from a corrupted stack has already caused security issues. Filter nodesets in-place and don't create node sets with NULL gaps which allows to simplify merging a great deal. Simply move matched nodes backward and create a compact node set. Merge xmlXPathCompOpEvalPositionalPredicate into xmlXPathCompOpEvalPredicate.
30a6533e	2019-03-08T12:15:17	Fix float casts in xmlXPathSubstringFunction Rewrite conversion of double to int in xmlXPathSubstringFunction, adding range checks to avoid undefined behavior. Make sure to add start and length as floating-point numbers before converting to int. Fix a bug when rounding negative start indices. Remove unneeded calls to xmlXPathIs{Inf,NaN} and rely on IEEE math instead. Avoid computing the string length. xmlUTF8Strsub works as expected if the length of the requested substring exceeds the input. Found with libFuzzer and UBSan.
c64d4efb	2018-10-13T00:12:12	Remove redefined starts and defines inside include elements When including a grammar from another grammar, we need to make sure that any redefines of starts and includes that that grammar does inside any of its include elements are also removed.
46da8fc5	2018-10-12T23:46:24	Allow choice within choice in nameClass in RELAX NG The pattern nameClass allows for nested choice elements, for example <name> <choice> <choice> <name>a</name> <name>b</name> </choice> <name>c</name> </choice> </name> which is semantically equivalent to <name> <choice> <name>a</name> <name>b</name> <name>c</name> </choice> </name> The old code didn’t handle this correctly, as it never expected a choice inside another choice. This patch fixes this by flattening any nested choices. This pattern of nested choice elements comes up in RELAX NG simplification, where all choice elements are rewritten in this nested manner, see section 4.12 of the RELAX NG specification.
4338c310	2018-10-12T22:30:26	Look inside divs for starts and defines inside include RELAX NG allows for div elements inside of include elements. We need to look inside those div elements for start and define elements that may be redefining start and define elements in the included grammar.
123234f2	2018-09-11T14:52:07	Free input buffer in xmlHaltParser This avoids miscalculation of available bytes. Thanks to Yunho Kim for the report. Closes: #26
72182550	2017-11-04T15:38:58	Add test for ICU flush and pivot buffer
5af594d8	2017-10-07T14:54:45	Fix comparison of nodesets to strings Fix two bugs in xmlXPathNodeValHash which could lead to errors when comparing nodesets to strings: - Only use contents of text nodes to compute the hash for element nodes. Comments, PIs, and other node types don't affect the string-value and must be ignored. - Reset `string` to NULL for node types other than text. Reported by Aleksei on the mailing list: https://mail.gnome.org/archives/xml/2017-September/msg00016.html
69936b12	2017-08-30T14:16:01	Revert "Print error messages for truncated UTF-8 sequences" This reverts commit 79c8a6b which caused a serious regression in streaming mode. Also reverts part of commit 52ceced "Fix infinite loops with push parser in recovery mode". Fixes bug 786554.
899a5d9f	2017-07-25T14:59:49	Detect infinite recursion in parameter entities When expanding a parameter entity in a DTD, infinite recursion could lead to an infinite loop or memory exhaustion. Thanks to Wei Lei for the first of many reports. Fixes bug 759579.
872fea94	2017-06-19T00:24:12	Get rid of "blanks wrapper" for parameter entities Now that replacement of parameter entities goes exclusively through xmlSkipBlankChars, we can account for the surrounding space characters there and remove the "blanks wrapper" hack.
24246c76	2017-06-20T12:56:36	Fix xmlHaltParser Pop all extra input streams before resetting the input. Otherwise, a call to xmlPopInput could make input available again. Also set input->end to input->cur. Changes the test output for some error tests. Unfortunately, some fuzzed test cases were added to the test suite without manual cleanup. This makes it almost impossible to review the impact of later changes on the test output.
8bbe4508	2017-06-17T16:15:09	Spelling and grammar fixes Fixes bug 743172, bug 743489, bug 769632, bug 782400 and a few other misspellings.
5f440d8c	2017-06-12T14:32:34	Rework entity boundary checks Make sure to finish all entities in the internal subset. Nevertheless, readd a sanity check in xmlParseStartTag2 that was lost in my previous commit. Also add a sanity check in xmlPopInput. Popping an input unexpectedly was the source of many recent memory bugs. The check doesn't mitigate such issues but helps with diagnosis. Always base entity boundary checks on the input ID, not the input pointer. The pointer could have been reallocated to the old address. Always throw a well-formedness error if a boundary check fails. In a few places, a validity error was thrown. Fix a few error codes and improve indentation.
dbaab1f3	2017-06-16T21:38:57	Test SAX2 callbacks with entity substitution This detects regressions like bug 760367.
67f9f9d6	2017-06-12T19:25:01	Misc fixes for 'make tests' - Silence test output. - Clean up after doc/examples tests. - Adjust expected output for script tests. - Add missing results for relaxng/pattern3 There are still two test failures I can't comment on: - regexp/bug316338 - schemas/any4_0
0b2d5c48	2017-06-12T19:10:04	Initialize keepBlanks in HTML parser This caused failures in the HTML push tests but the fix required to change the expected output of the HTML SAX tests.
85c112a0	2017-06-12T18:26:11	Add test cases for bug 758518 test/HTML/758518-entity.html exposed a bug in pushParseTest() in runtest.c which assumed that an input file was at least 4 bytes long. That test case is only 3 bytes, so we now take the minimum of 4 bytes or the length of the test input. We also now use 'chunkSize' in place of the hard-coded value '1024' later in the function.
79c8a6b1	2017-06-10T17:01:27	Print error messages for truncated UTF-8 sequences Before, truncated UTF-8 sequences at the end of a file were treated as EOF. Create an error message containing the offending bytes. xmlStringCurrentChar would also print characters from the input stream, not the string it's working on.
932cc989	2017-06-03T02:01:29	Fix buffer size checks in xmlSnprintfElementContent xmlSnprintfElementContent failed to correctly check the available buffer space in two locations. Fixes bug 781333 (CVE-2017-9047) and bug 781701 (CVE-2017-9048). Thanks to Marcel Böhme and Thuan Pham for the report.
e2663054	2017-06-05T15:37:17	Fix handling of parameter-entity references There were two bugs where parameter-entity references could lead to an unexpected change of the input buffer in xmlParseNameComplex and xmlDictLookup being called with an invalid pointer. Percent sign in DTD Names ========================= The NEXTL macro used to call xmlParserHandlePEReference. When parsing "complex" names inside the DTD, this could result in entity expansion which created a new input buffer. The fix is to simply remove the call to xmlParserHandlePEReference from the NEXTL macro. This is safe because no users of the macro require expansion of parameter entities. - xmlParseNameComplex - xmlParseNCNameComplex - xmlParseNmtoken The percent sign is not allowed in names, which are grammatical tokens. - xmlParseEntityValue Parameter-entity references in entity values are expanded but this happens in a separate step in this function. - xmlParseSystemLiteral Parameter-entity references are ignored in the system literal. - xmlParseAttValueComplex - xmlParseCharDataComplex - xmlParseCommentComplex - xmlParsePI - xmlParseCDSect Parameter-entity references are ignored outside the DTD. - xmlLoadEntityContent This function is only called from xmlStringLenDecodeEntities and entities are replaced in a separate step immediately after the function call. This bug could also be triggered with an internal subset and double entity expansion. This fixes bug 766956 initially reported by Wei Lei and independently by Chromium's ClusterFuzz, Hanno Böck, and Marco Grassi. Thanks to everyone involved. xmlParseNameComplex with XML_PARSE_OLD10 ======================================== When parsing Names inside an expanded parameter entity with the XML_PARSE_OLD10 option, xmlParseNameComplex would call xmlGROW via the GROW macro if the input buffer was exhausted. At the end of the parameter entity's replacement text, this function would then call xmlPopInput which invalidated the input buffer. There should be no need to invoke GROW in this situation because the buffer is grown periodically every XML_PARSER_CHUNK_SIZE characters and, at least for UTF-8, in xmlCurrentChar. This also matches the code path executed when XML_PARSE_OLD10 is not set. This fixes bugs 781205 (CVE-2017-9049) and 781361 (CVE-2017-9050). Thanks to Marcel Böhme and Thuan Pham for the report. Additional hardening ==================== A separate check was added in xmlParseNameComplex to validate the buffer size.
7482f41f	2017-06-01T22:00:19	Check for integer overflow in xmlXPathFormatNumber Check for overflow before casting double to int. Found with afl-fuzz and UBSan.
855c19ef	2017-06-01T01:04:08	Avoid reparsing in xmlParseStartTag2 The code in xmlParseStartTag2 must handle the case that the input buffer was grown and reallocated which can invalidate pointers to attribute values. Before, this was handled by detecting changes of the input buffer "base" pointer and, in case of a change, jumping back to the beginning of the function and reparsing the start tag. The major problem of this approach is that whether an input buffer is reallocated is nondeterministic, resulting in seemingly random test failures. See the mailing list thread "runtest mystery bug: name2.xml error case regression test" from 2012, for example. If a reallocation was detected, the code also made no attempts to continue parsing in case of errors which makes a difference in the lax "recover" mode. Now we store the current input buffer "base" pointer for each (not separately allocated) attribute in the namespace URI field, which isn't used until later. After the whole start tag was parsed, the pointers to the attribute values are reconstructed using the offset between the new and the old input buffer. This relies on arithmetic on dangling pointers which is technically undefined behavior. But it seems like the easiest and most efficient fix and a similar approach is used in xmlParserInputGrow. This changes the error output of several tests, typically making it more verbose because we try harder to continue parsing in case of errors. (Another possible solution is to check not only the "base" pointer but the size of the input buffer as well. But this would result in even more reparsing.)
f4029cd4	2016-04-21T16:37:26	Check XPath exponents for overflow Avoid undefined behavior and wrong results with huge exponents. Found with afl-fuzz and UBSan.
a58331a6	2017-05-29T21:02:21	Check for overflow in xmlXPathIsPositionalPredicate Avoid undefined behavior when casting from double to int. Found with afl-fuzz and UBSan.
a851868a	2017-05-29T20:14:42	Parse small XPath numbers more accurately Don't count leading zeros towards the fraction size limit. This allows to parse numbers like 0.0000000000000000000000000000000000000000000000000000000001 which is the only standard-conformant way to represent such numbers, as scientific notation isn't allowed in XPath 1.0. (It is allowed in XPath 2.0 and in libxml2 as an extension, though.) Overall accuracy is still bad, see bug 783238.
4bebb030	2016-04-21T13:41:09	Rework XPath rounding functions Use the C library's floor and ceil functions. The old code was overly complicated for no apparent reason and could result in undefined behavior when handling NaNs (found with afl-fuzz and UBSan). Fix wrong comment in xmlXPathRoundFunction. The implementation was already following the spec and rounding half up.
40f58521	2017-05-26T20:16:35	Fix axis traversal from attribute and namespace nodes When traversing the "preceding" axis from an attribute node, we must first go up to the attribute's containing element. Otherwise, text children of other attributes could be returned. This made it possible to hit a code path in xmlXPathNextAncestor which contained another bug: The attribute node was initialized with the context node instead of the current node. Normally, this code path is only hit via xmlXPathNextAncestorOrSelf in which case the current and context node are the same. The combination of the two bugs could result in an infinite loop, found with libFuzzer. Traversing the "following" and the "preceding" axis from namespace nodes should be handled similarly. This wasn't supported at all previously.
9ab01a27	2016-06-28T14:22:23	Fix XPointer paths beginning with range-to The old code would invoke the broken xmlXPtrRangeToFunction. range-to isn't really a function but a special kind of location step. Remove this function and always handle range-to in the XPath code. The old xmlXPtrRangeToFunction could also be abused to trigger a use-after-free error with the potential for remote code execution. Found with afl-fuzz. Fixes CVE-2016-5131.
d8083bf7	2016-06-25T12:35:50	Fix NULL pointer deref in XPointer range-to - Check for errors after evaluating first operand. - Add sanity check for empty stack. Found with afl-fuzz.
0bcd05c5	2016-03-01T15:18:04	Heap-based buffer overread in htmlCurrentChar For https://bugzilla.gnome.org/show_bug.cgi?id=758606 * parserInternals.c: (xmlNextChar): Add an test to catch other issues on ctxt->input corruption proactively. For non-UTF-8 charsets, xmlNextChar() failed to check for the end of the input buffer and would continuing reading. Fix this by pulling out the check for the end of the input buffer into common code, and return if we reach the end of the input buffer prematurely. * result/HTML/758606.html: Added. * result/HTML/758606.html.err: Added. * result/HTML/758606.html.sax: Added. * result/HTML/758606_2.html: Added. * result/HTML/758606_2.html.err: Added. * result/HTML/758606_2.html.sax: Added. * test/HTML/758606.html: Added test case. * test/HTML/758606_2.html: Added test case.
00906759	2016-01-26T16:57:03	Heap-based buffer-underreads due to xmlParseName For https://bugzilla.gnome.org/show_bug.cgi?id=759573 * parser.c: (xmlParseElementDecl): Return early on invalid input to fix non-minimized test case (759573-2.xml). Otherwise the parser gets into a bad state in SKIP(3) at the end of the function. (xmlParseConditionalSections): Halt parsing when hitting invalid input that would otherwise caused xmlParserHandlePEReference() to recurse unexpectedly. This fixes the minimized test case (759573.xml). * result/errors/759573-2.xml: Add. * result/errors/759573-2.xml.err: Add. * result/errors/759573-2.xml.str: Add. * result/errors/759573.xml: Add. * result/errors/759573.xml.err: Add. * result/errors/759573.xml.str: Add. * test/errors/759573-2.xml: Add. * test/errors/759573.xml: Add.
38eae571	2016-03-07T14:04:08	Heap use-after-free in xmlSAX2AttributeNs For https://bugzilla.gnome.org/show_bug.cgi?id=759020 * parser.c: (xmlParseStartTag2): Attribute strings are only valid if the base does not change, so add another check where the base may change. Make sure to set 'attvalue' to NULL after freeing it. * result/errors/759020.xml: Added. * result/errors/759020.xml.err: Added. * result/errors/759020.xml.str: Added. * test/errors/759020.xml: Added test case.
beca86e8	2016-05-04T11:23:49	Detect change of encoding when parsing HTML names From https://bugzilla.gnome.org/show_bug.cgi?id=758518 Happens when a file has a name getting parsed, but no valid encoding set, so libxml has to guess what the encoding is. This patch detects when the buffer location changes, and if it does, restarts the parsing of the name. This slightly change a couple of regression tests output
45752d2c	2016-03-03T11:50:34	Bug 759398: Heap use-after-free in xmlDictComputeFastKey <https://bugzilla.gnome.org/show_bug.cgi?id=759398> * parser.c: (xmlParseNCNameComplex): Store start position instead of a pointer to the name since the underlying buffer may change, resulting in a stale pointer being used. * result/errors/759398.xml: Added. * result/errors/759398.xml.err: Added. * result/errors/759398.xml.str: Added. * test/errors/759398.xml: Added test case.
a820dbea	2016-03-01T11:34:04	Bug 758605: Heap-based buffer overread in xmlDictAddString <https://bugzilla.gnome.org/show_bug.cgi?id=758605> Reviewed by David Kilzer. * HTMLparser.c: (htmlParseName): Add bounds check. (htmlParseNameComplex): Ditto. * result/HTML/758605.html: Added. * result/HTML/758605.html.err: Added. * result/HTML/758605.html.sax: Added. * runtest.c: (pushParseTest): The input for the new test case was so small (4 bytes) that htmlParseChunk() was never called after htmlCreatePushParserCtxt(), thereby creating a false positive test failure. Fixed by using a do-while loop so we always call htmlParseChunk() at least once. * test/HTML/758605.html: Added.
db07dd61	2016-02-12T09:58:29	Bug 758588: Heap-based buffer overread in xmlParserPrintFileContextInternal <https://bugzilla.gnome.org/show_bug.cgi?id=758588> * parser.c: (xmlParseEndTag2): Add bounds checks before dereferencing ctxt->input->cur past the end of the buffer, or incrementing the pointer past the end of the buffer. * result/errors/758588.xml: Add test result. * result/errors/758588.xml.err: Ditto. * result/errors/758588.xml.str: Ditto. * test/errors/758588.xml: Add regression test.
6eb0894a	2016-05-05T16:49:00	Fix memory leak with XPath namespace nodes Set hasNsNodes to 1 when adding namespace nodes via XP_TEST_HIT.
82b73039	2016-04-30T17:53:10	Fix namespace axis traversal When the namespace axis is traversed in "toBool" mode, the traversal can exit early, before visiting all nodes. In this case, the XPath context still contains a non-NULL tmpNsList. This means that - the check when to start a new traversal was wrong and - the tmpNsList could be leaked. Fixes bug #750037 and, by accident, bug #756075: https://bugzilla.gnome.org/show_bug.cgi?id=750037 https://bugzilla.gnome.org/show_bug.cgi?id=756075
839689a9	2016-04-27T18:00:12	Don't recurse into OP_VALUEs in xmlXPathOptimizeExpression The ch1 slot of OP_VALUEs contains an invalid value. Ignore it. Fixes bug #760325: https://bugzilla.gnome.org/show_bug.cgi?id=760325
f39fd66e	2016-04-27T03:01:16	Fix namespace::node() XPath expression Make sure that xmlXPathNodeSetAddNs is called for namespace nodes when matched with a namespace::node() step. This correctly sets the parent of namespace nodes. Note that xmlXPathNodeSetAddNs must only be called if working on the namespace axis. Otherwise, the context node is not the parent of the namespace node and the standard XP_TEST_HIT macro must be invoked. This explains the errors in the C14N tests that the old TODO comment mentioned.
e2893903	2016-04-21T19:19:23	Fix parsing of NCNames in XPath The NCName parser would allow any NameChar as start character. For example, the following XPath expressions would compile: self::-abc self::0abc self::.abc
cad102b8	2016-04-15T22:41:24	Do normalize string-based datatype value in RelaxNG facet checking Original patch is from Jan Pokorný <jpokorny redhat com> https://mail.gnome.org/archives/xml/2013-November/msg00028.html Improve it according to reviews and add test files.
5be1a6e8	2016-01-19T11:38:52	Bug 760861: REGRESSION (bf9c1dad): Missing results for test/schemas/regexp-char-ref_[01].xsd <https://bugzilla.gnome.org/show_bug.cgi?id=760861> Add missing test results to fix the following errors when running "make Schemastests": ## Schemas regression tests diff: ./result/schemas/regexp-char-ref_0_0.err: No such file or directory diff: ./result/schemas/regexp-char-ref_1_0.err: No such file or directory * result/schemas/regexp-char-ref_0_0.err: Added. * result/schemas/regexp-char-ref_1_0.err: Added.
49bbfdb6	2016-03-14T15:53:16	Add missing RNG test files For https://bugzilla.gnome.org/show_bug.cgi?id=760249 Add missing test results from Bug 710744 for commit 6473a41a49601da8355c4b407b99474ada170213.
4f8606c1	2016-01-05T13:38:09	Bug 760183: REGRESSION (v2.9.3): XML push parser fails with bogus UTF-8 encoding error when multi-byte character in large CDATA section is split across buffer <https://bugzilla.gnome.org/show_bug.cgi?id=760183> * parser.c: (xmlCheckCdataPush): Add 'complete' argument to describe whether the buffer passed in is the whole CDATA buffer, or if there is more data to parse. If there is more data to parse, don't return a negative value for an invalid multi-byte UTF-8 character that is split between buffers. (xmlParseTryOrFinish): Pass 'complete' argument to xmlCheckCdataPush() as appropriate. * result/cdata-2-byte-UTF-8.xml: Added. * result/cdata-2-byte-UTF-8.xml.rde: Added. * result/cdata-2-byte-UTF-8.xml.rdr: Added. * result/cdata-2-byte-UTF-8.xml.sax: Added. * result/cdata-2-byte-UTF-8.xml.sax2: Added. * result/cdata-3-byte-UTF-8.xml: Added. * result/cdata-3-byte-UTF-8.xml.rde: Added. * result/cdata-3-byte-UTF-8.xml.rdr: Added. * result/cdata-3-byte-UTF-8.xml.sax: Added. * result/cdata-3-byte-UTF-8.xml.sax2: Added. * result/cdata-4-byte-UTF-8.xml: Added. * result/cdata-4-byte-UTF-8.xml.rde: Added. * result/cdata-4-byte-UTF-8.xml.rdr: Added. * result/cdata-4-byte-UTF-8.xml.sax: Added. * result/cdata-4-byte-UTF-8.xml.sax2: Added. * result/noent/cdata-2-byte-UTF-8.xml: Added. * result/noent/cdata-3-byte-UTF-8.xml: Added. * result/noent/cdata-4-byte-UTF-8.xml: Added. * test/cdata-2-byte-UTF-8.xml: Added. * test/cdata-3-byte-UTF-8.xml: Added. * test/cdata-4-byte-UTF-8.xml: Added. - Add tests and results. Only 'make Readertests XMLPushtests' fails prior to the fix.
a7a94612	2016-02-09T12:55:29	Heap-based buffer overread in xmlNextChar For https://bugzilla.gnome.org/show_bug.cgi?id=759671 when the end of the internal subset isn't properly detected xmlParseInternalSubset should just return instead of trying to process input further.
f1063fdb	2015-11-20T16:06:59	CVE-2015-7500 Fix memory access error due to incorrect entities boundaries For https://bugzilla.gnome.org/show_bug.cgi?id=756525 handle properly the case where we popped out of the current entity while processing a start tag Reported by Kostya Serebryany @ Google This slightly modifies the output of 754946 in regression tests
4a5d80ad	2015-09-18T15:06:46	Fix a bug in CData error handling in the push parser For https://bugzilla.gnome.org/show_bug.cgi?id=754947 The checking function was returning incorrect args in some cases Adds the test to teh reg suite and fix one of the existing test output
51f02b0a	2015-09-15T16:50:32	Fix a bug on name parsing at the end of current input buffer For https://bugzilla.gnome.org/show_bug.cgi?id=754946 When hitting the end of the current input buffer while parsing a name we could end up loosing the beginning of the name, which led to various issues.
ef709ce2	2015-09-10T19:41:41	Fix the spurious ID already defined error For https://bugzilla.gnome.org/show_bug.cgi?id=737840 the fix for 724903 introduced a regression on external entities carrying IDs, revert that patch in part and add a specific test to avoid readding it
2fab235d	2015-03-16T08:38:36	Fix support for except in nameclasses For https://bugzilla.gnome.org/show_bug.cgi?id=565219 The code was imply missing even if simple, added a few regression tests.
02b252d7	2015-03-08T17:00:37	Regression test for bug #695699
342658a1	2015-03-08T16:46:04	Add a couple of XPath tests
f6aaabce	2015-03-08T16:05:26	Allow attributes on descendant-or-self axis If the context node is an attribute, the attribute itself is on the descendant-or-self axis. The principal node type of this axis is element, so the only node test that can return the attribute is "node()". In other words, "@attr/descendant-or-self::node()" is equivalent to "@attr". This matches the behavior of Saxon-CE.
df23f584	2014-10-23T13:52:47	Adding example from bugs 738805 to regression tests For https://bugzilla.gnome.org/show_bug.cgi?id=738805 Tortuous test case provided by pierre.labastie@neuf.fr
6473a41a	2013-10-23T14:51:33	Implement choice for name classes on attributes https://bugzilla.gnome.org/show_bug.cgi?id=710744
dcc19503	2013-05-22T22:56:45	Fix a parsing bug on non-ascii element and CR/LF usage https://bugzilla.gnome.org/show_bug.cgi?id=698550 Somehow the behaviour of the internal parser routine changed slightly when encountering CR/LF, which led to a bug when parsing document with non-ascii Names
483272f3	2013-03-27T13:37:14	Added a regression tests from bug 694228 data Provided by Mark Rowe <mrowe@apple.com>
a3f1e3e5	2013-03-11T13:57:53	Avoid extra processing on entities If an entity has already been checked for correctness no need to check it on every reference
a7982ce2	2012-10-25T15:39:39	Adding streaming validation to runtest checks

652dd12a

2022-02-08T03:29:24

[CVE-2022-23308] Use-after-free of ID and IDREF attributes If a document is parsed with XML_PARSE_DTDVALID and without XML_PARSE_NOENT, the value of ID attributes has to be normalized after potentially expanding entities in xmlRemoveID. Otherwise, later calls to xmlGetID can return a pointer to previously freed memory. ID attributes which are empty or contain only whitespace after entity expansion are affected in a similar way. This is fixed by not storing such attributes in the ID table. The test to detect streaming mode when validating against a DTD was broken. In connection with the defects above, this could result in a use-after-free when using the xmlReader interface with validation. Fix detection of streaming mode to avoid similar issues. (This changes the expected result of a test case. But as far as I can tell, using the XML reader with XIncludes referencing the root document never worked properly, anyway.) All of these issues can result in denial of service. Using xmlReader with validation could result in disclosure of memory via the error channel, typically stderr. The security impact of xmlGetID returning a pointer to freed memory depends on the application. The typical use case of calling xmlGetID on an unmodified document is not affected.

9edc20c1

2022-02-07T20:38:30

Fix double counting of CRLF in comments Fixes #151.

5408c10c

2022-02-04T14:00:09

Don't normalize namespace URIs in XPointer xmlns() scheme Namespace URIs should be compared without escaping or unescaping: https://www.w3.org/TR/REC-xml-names/#NSNameComparison Fixes #289.

1c7d91ab

2022-02-03T23:31:19

Fix handling of XSD with empty namespace An empty namespace means no default namespace. Fixes #303.

f480f750

2022-02-03T14:43:17

Update NewsML DTD in test suite Switch to version 1.2 which has a clearer license. Fixes #291.

d85245f9

2022-01-16T21:39:04

Fix regression with PEs in external DTD Fix a regression introduced with commit a28f7d87. In some cases, parameter entity references in external DTDs wouldn't be expanded. Fixes #306.

03bb9293

2021-07-07T18:23:18

Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8(). * encoding.c: (UTF16LEToUTF8): - Fix comment to describe what the code does. (UTF16BEToUTF8): - Fix undefined behavior which was applied to UTF16LEToUTF8() in 2f9382033e. - Add bounds check to while() loop which was applied to UTF16LEToUTF8() in be803967db. - Do not return -2 when (in >= inend) to fix the bug. This was applied to UTF16LEToUTF8() in 496a1cf592. - Inline (<< 8) statements to match UTF16LEToUTF8(). Add the following tests and results: test/text-4-byte-UTF-16-BE-offset.xml test/text-4-byte-UTF-16-BE.xml test/text-4-byte-UTF-16-LE-offset.xml test/text-4-byte-UTF-16-LE.xml

2732b234

2022-01-10T13:32:14

Fix regression parsing public IDs literals in HTML Fix regression introduced when reworking htmlParsePubidLiteral in commit 93ce33c2. Fixes #318.

de5b624f

2021-05-08T20:21:29

Fix handling of unexpected EOF in xmlParseContent Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was removed in commit 62150ed2. This commit also introduced a regression for direct users of xmlParseContent. Unclosed tags weren't checked.

3e80560d

2021-05-07T10:51:38

Fix line numbers in error messages for mismatched tags Commit 62150ed2 introduced a small regression in the error messages for mismatched tags. This typically only affected messages after the first mismatch, but with custom SAX handlers all line numbers would be off. This also fixes line numbers in the SAX push parser which were never handled correctly.

01411e7c

2021-02-08T20:58:32

Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.

79301d3d

2020-12-18T12:50:21

Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.

a67b63d1

2020-10-11T14:15:37

use new htmlParseLookupCommentEnd to find comment ends Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment

29f5d20e

2020-08-03T17:36:05

htmlParseComment: treat `--!>` as if it closed the comment See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment

e28d9347

2020-08-04T14:53:19

add test coverage for incorrectly-closed comments this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.

87d20b55

2020-08-19T13:52:08

Fix regression introduced with commit 74dcc10b The code wasn't dead after all, but I can see no reason in delaying the XPointer evaluation. This could lead to nodes included earlier appearing in XPointer results.

d88df4bd

2020-08-16T23:38:48

Fix corner case with empty xi:fallback xi:fallback could become empty after recursive expansion. Use a flag to track whether nodes should be skipped.

1abf2967

2020-08-06T17:51:57

Fix exponential runtime and memory in xi:fallback processing When creating XML_XINCLUDE_START nodes, the children of the original xi:include node must be freed, otherwise fallback content is copied twice, doubling runtime and memory consumption for each nested xi:fallback/xi:include pair. Found with libFuzzer.

0f9817c7

2020-06-10T16:34:52

Don't recurse into xi:include children in xmlXIncludeDoProcess Otherwise, nested xi:include nodes might result in a use-after-free if XML_PARSE_NOXINCNODE is specified. Found with libFuzzer and ASan.

93ce33c2

2020-07-23T17:34:08

Fix several quadratic runtime issues in HTML push parser Fix a few remaining cases where the HTML push parser would scan more content during lookahead than being parsed later. Make sure that htmlParseDocTypeDecl consumes all content up to the final '>' in case of errors. The old comment said "We shouldn't try to resynchronize", but ignoring invalid content is also what the HTML5 spec mandates. Likewise, make htmlParseEndTag skip to the final '>' in invalid end tags even if not in recovery mode. This is probably the most visible change in practice and leads to different output for some tests but is also more in line with HTML5. Make sure that htmlParsePI and htmlParseComment don't abort if invalid characters are encountered but log an error and ignore the character. Change some other end-of-buffer checks to test for a zero byte instead of relying on IS_CHAR. Fix usage of IS_CHAR macro in htmlParseScript.

6b4717d6

2020-07-06T12:36:27

Add regexp regression tests - Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup <https://bugzilla.gnome.org/show_bug.cgi?id=757711> - Bug 783015 - Integer-overflow in xmlFAParseQuantExact <https://bugzilla.gnome.org/show_bug.cgi?id=783015> (Regexptests): Add support for checking stderr output when running regexp tests. This makes it possible to check in test cases that fail and not see false-positive error output when running the tests. Unlike other libxml2 test suites, if there is no stderr output, no *.err file needs to be created.

477c7f6a

2020-06-28T15:54:23

Fix quadratic runtime in HTML parser Commit eeb99329 removed an important optimization avoiding quadratic runtime when repeatedly scanning the input buffer for terminating characters in the HTML push parser. The related bug is https://bugzilla.gnome.org/show_bug.cgi?id=444994 Make sure that ctxt->checkIndex is always written and store additional parser state in ctxt->inSubset which is unused in the HTML parser. Found by OSS-Fuzz.

32cb5dcc

2020-02-11T13:16:10

Add test case for recursive external parsed entities

f20daa9e

2020-02-11T13:13:52

Enable error tests with entity substitution

eddfbc38

2020-01-22T22:03:45

Don't load external entity from xmlSAX2GetEntity Despite the comment, I can't see a reason why external entities must be loaded in the SAX handler. For external entities, the handler is typically first invoked via xmlParseReference which will later load the entity on its own if it wasn't loaded yet. The old code also lead to duplicated SAX events which makes it basically impossible to reuse xmlSAX2GetEntity for a custom SAX parser. See the change to the expected test output. Note that xmlSAX2GetEntity was loading the entity via xmlParseCtxtExternalEntity while xmlParseReference uses xmlParseExternalEntityPrivate. In the previous commit, the two functions were merged, trying to compensate for some slight differences between the two mostly identical implementations. But the more urgent reason for this change is that xmlParseReference has the facility to abort early when recursive entities are detected, avoiding what could practically amount to an infinite loop. If you want to backport this change, note that the previous three commits are required as well: f9ea1a24 Fix copying of entities in xmlParseReference 5c7e0a9a Copy some XMLReader option flags to parser context 1a3e584a Merge code paths loading external entities Found by OSS-Fuzz.

f9ea1a24

2020-02-11T16:17:34

Fix copying of entities in xmlParseReference Before, reader mode would end up in a branch that didn't handle entities with multiple children and failed to update ent->last, so the hack copying the "extra" reader data wouldn't trigger. Consequently, some empty nodes in entities are correctly detected now in the test suite. (The detection of empty nodes in entities is still buggy, though.)

2a350ee9

2019-09-30T17:04:54

Large batch of typo fixes Closes #109.

c2f209c0

2019-09-30T14:13:21

Disallow conditional sections in internal subset Conditional sections are only allowed in *external* parameter entities referenced from the internal subset.

c51e38cb

2019-09-30T13:50:02

Make xmlParseConditionalSections non-recursive Avoid call stack overflow in deeply nested conditional sections. Found by OSS-Fuzz.

99a864a1

2019-09-25T15:27:45

Fix Regextests - One of the bug316338 test cases is expected to succeed. - Memory leak in testRegexp.c. - Refcount handling in xmlExpHashGetEntry.

c2b0a184

2019-09-25T13:57:42

Fix empty branch in regex Fixes bug 649244: https://bugzilla.gnome.org/show_bug.cgi?id=649244 Closes #57.

62150ed2

2019-09-23T14:46:41

Make xmlParseContent and xmlParseElement non-recursive Split xmlParseElement into subfunctions. Use nameNsPush to store prefix, URI and nsNr on the heap, similar to the push parser. Closes #84.

6705f4d2

2019-09-16T15:45:27

Remove executable bit from non-executable files

eee1dd5a

2019-09-16T15:36:44

Fix expected output of test/schemas/any4 libxml2 correctly rejects any4_0.xsd as invalid schema. I can't figure out what the intent behind this test case was. Simply adjust the expected output to match the current behavior. Closes #92.

e8c9cd5c

2019-09-16T15:36:02

Fix Schema determinism check of ##other namespaces Non-compound (##local) and compound string atoms are always disjoint regardless of whether the compound atom is negated (##other). Closes #40.

01d8cf07

2019-08-15T15:15:42

Misleading error message with xs:{min|max}Inclusive Closes #53.

ea695ac0

2019-08-09T15:09:22

Fix unability to RelaxNG-validate grammar with choice-based name class Previously, test/relaxng/ambig_name-class2.xml would fail to validate against test/relaxng/ambig_name-class2.rng: > test/relaxng/ambig_name-class2.rng:4: > element attribute: Relax-NG parser error : > Found anyName attribute without oneOrMore ancestor > Relax-NG schema test/relaxng/ambig_name-class2.rng failed to compile Signed-off-by: Jan Pokorný <jpokorny@redhat.com>

8074b881

2019-08-08T23:33:48

Fix unability to validate ambiguously constructed interleave for RelaxNG Previously, test/relaxng/ambig_name-class.xml would fail to validate for a simple reason -- interleave within "open-name-class" context is supposed to be fine with whatever else is pending the consumption, since effectively, it's unrelated from a higher parsing perspective. Signed-off-by: Jan Pokorný <jpokorny@redhat.com>

f9fce963

2019-05-16T21:16:01

Fix unsigned integer overflow It's defined behavior but -fsanitize=unsigned-integer-overflow is useful to discover bugs.

c2f4da1a

2017-05-21T22:08:50

Improve XPath predicate and filter evaluation Consolidate code paths evaluating XPath predicates and filters. Don't push context node on stack when evaluating predicates. I have no idea why this was done. It seems completely useless and trying to pop the context node from a corrupted stack has already caused security issues. Filter nodesets in-place and don't create node sets with NULL gaps which allows to simplify merging a great deal. Simply move matched nodes backward and create a compact node set. Merge xmlXPathCompOpEvalPositionalPredicate into xmlXPathCompOpEvalPredicate.

30a6533e

2019-03-08T12:15:17

Fix float casts in xmlXPathSubstringFunction Rewrite conversion of double to int in xmlXPathSubstringFunction, adding range checks to avoid undefined behavior. Make sure to add start and length as floating-point numbers before converting to int. Fix a bug when rounding negative start indices. Remove unneeded calls to xmlXPathIs{Inf,NaN} and rely on IEEE math instead. Avoid computing the string length. xmlUTF8Strsub works as expected if the length of the requested substring exceeds the input. Found with libFuzzer and UBSan.

c64d4efb

2018-10-13T00:12:12

Remove redefined starts and defines inside include elements When including a grammar from another grammar, we need to make sure that any redefines of starts and includes that that grammar does inside any of its include elements are also removed.

46da8fc5

2018-10-12T23:46:24

Allow choice within choice in nameClass in RELAX NG The pattern nameClass allows for nested choice elements, for example <name> <choice> <choice> <name>a</name> <name>b</name> </choice> <name>c</name> </choice> </name> which is semantically equivalent to <name> <choice> <name>a</name> <name>b</name> <name>c</name> </choice> </name> The old code didn’t handle this correctly, as it never expected a choice inside another choice. This patch fixes this by flattening any nested choices. This pattern of nested choice elements comes up in RELAX NG simplification, where all choice elements are rewritten in this nested manner, see section 4.12 of the RELAX NG specification.

4338c310

2018-10-12T22:30:26

Look inside divs for starts and defines inside include RELAX NG allows for div elements inside of include elements. We need to look inside those div elements for start and define elements that may be redefining start and define elements in the included grammar.

123234f2

2018-09-11T14:52:07

Free input buffer in xmlHaltParser This avoids miscalculation of available bytes. Thanks to Yunho Kim for the report. Closes: #26

72182550

2017-11-04T15:38:58

Add test for ICU flush and pivot buffer

5af594d8

2017-10-07T14:54:45

Fix comparison of nodesets to strings Fix two bugs in xmlXPathNodeValHash which could lead to errors when comparing nodesets to strings: - Only use contents of text nodes to compute the hash for element nodes. Comments, PIs, and other node types don't affect the string-value and must be ignored. - Reset `string` to NULL for node types other than text. Reported by Aleksei on the mailing list: https://mail.gnome.org/archives/xml/2017-September/msg00016.html

69936b12

2017-08-30T14:16:01

Revert "Print error messages for truncated UTF-8 sequences" This reverts commit 79c8a6b which caused a serious regression in streaming mode. Also reverts part of commit 52ceced "Fix infinite loops with push parser in recovery mode". Fixes bug 786554.

899a5d9f

2017-07-25T14:59:49

Detect infinite recursion in parameter entities When expanding a parameter entity in a DTD, infinite recursion could lead to an infinite loop or memory exhaustion. Thanks to Wei Lei for the first of many reports. Fixes bug 759579.

872fea94

2017-06-19T00:24:12

Get rid of "blanks wrapper" for parameter entities Now that replacement of parameter entities goes exclusively through xmlSkipBlankChars, we can account for the surrounding space characters there and remove the "blanks wrapper" hack.

24246c76

2017-06-20T12:56:36

Fix xmlHaltParser Pop all extra input streams before resetting the input. Otherwise, a call to xmlPopInput could make input available again. Also set input->end to input->cur. Changes the test output for some error tests. Unfortunately, some fuzzed test cases were added to the test suite without manual cleanup. This makes it almost impossible to review the impact of later changes on the test output.

8bbe4508

2017-06-17T16:15:09

Spelling and grammar fixes Fixes bug 743172, bug 743489, bug 769632, bug 782400 and a few other misspellings.

5f440d8c

2017-06-12T14:32:34

Rework entity boundary checks Make sure to finish all entities in the internal subset. Nevertheless, readd a sanity check in xmlParseStartTag2 that was lost in my previous commit. Also add a sanity check in xmlPopInput. Popping an input unexpectedly was the source of many recent memory bugs. The check doesn't mitigate such issues but helps with diagnosis. Always base entity boundary checks on the input ID, not the input pointer. The pointer could have been reallocated to the old address. Always throw a well-formedness error if a boundary check fails. In a few places, a validity error was thrown. Fix a few error codes and improve indentation.

dbaab1f3

2017-06-16T21:38:57

Test SAX2 callbacks with entity substitution This detects regressions like bug 760367.

67f9f9d6

2017-06-12T19:25:01

Misc fixes for 'make tests' - Silence test output. - Clean up after doc/examples tests. - Adjust expected output for script tests. - Add missing results for relaxng/pattern3 There are still two test failures I can't comment on: - regexp/bug316338 - schemas/any4_0

0b2d5c48

2017-06-12T19:10:04

Initialize keepBlanks in HTML parser This caused failures in the HTML push tests but the fix required to change the expected output of the HTML SAX tests.

85c112a0

2017-06-12T18:26:11

Add test cases for bug 758518 test/HTML/758518-entity.html exposed a bug in pushParseTest() in runtest.c which assumed that an input file was at least 4 bytes long. That test case is only 3 bytes, so we now take the minimum of 4 bytes or the length of the test input. We also now use 'chunkSize' in place of the hard-coded value '1024' later in the function.

79c8a6b1

2017-06-10T17:01:27

Print error messages for truncated UTF-8 sequences Before, truncated UTF-8 sequences at the end of a file were treated as EOF. Create an error message containing the offending bytes. xmlStringCurrentChar would also print characters from the input stream, not the string it's working on.

932cc989

2017-06-03T02:01:29

Fix buffer size checks in xmlSnprintfElementContent xmlSnprintfElementContent failed to correctly check the available buffer space in two locations. Fixes bug 781333 (CVE-2017-9047) and bug 781701 (CVE-2017-9048). Thanks to Marcel Böhme and Thuan Pham for the report.

e2663054

2017-06-05T15:37:17

Fix handling of parameter-entity references There were two bugs where parameter-entity references could lead to an unexpected change of the input buffer in xmlParseNameComplex and xmlDictLookup being called with an invalid pointer. Percent sign in DTD Names ========================= The NEXTL macro used to call xmlParserHandlePEReference. When parsing "complex" names inside the DTD, this could result in entity expansion which created a new input buffer. The fix is to simply remove the call to xmlParserHandlePEReference from the NEXTL macro. This is safe because no users of the macro require expansion of parameter entities. - xmlParseNameComplex - xmlParseNCNameComplex - xmlParseNmtoken The percent sign is not allowed in names, which are grammatical tokens. - xmlParseEntityValue Parameter-entity references in entity values are expanded but this happens in a separate step in this function. - xmlParseSystemLiteral Parameter-entity references are ignored in the system literal. - xmlParseAttValueComplex - xmlParseCharDataComplex - xmlParseCommentComplex - xmlParsePI - xmlParseCDSect Parameter-entity references are ignored outside the DTD. - xmlLoadEntityContent This function is only called from xmlStringLenDecodeEntities and entities are replaced in a separate step immediately after the function call. This bug could also be triggered with an internal subset and double entity expansion. This fixes bug 766956 initially reported by Wei Lei and independently by Chromium's ClusterFuzz, Hanno Böck, and Marco Grassi. Thanks to everyone involved. xmlParseNameComplex with XML_PARSE_OLD10 ======================================== When parsing Names inside an expanded parameter entity with the XML_PARSE_OLD10 option, xmlParseNameComplex would call xmlGROW via the GROW macro if the input buffer was exhausted. At the end of the parameter entity's replacement text, this function would then call xmlPopInput which invalidated the input buffer. There should be no need to invoke GROW in this situation because the buffer is grown periodically every XML_PARSER_CHUNK_SIZE characters and, at least for UTF-8, in xmlCurrentChar. This also matches the code path executed when XML_PARSE_OLD10 is not set. This fixes bugs 781205 (CVE-2017-9049) and 781361 (CVE-2017-9050). Thanks to Marcel Böhme and Thuan Pham for the report. Additional hardening ==================== A separate check was added in xmlParseNameComplex to validate the buffer size.

7482f41f

2017-06-01T22:00:19

Check for integer overflow in xmlXPathFormatNumber Check for overflow before casting double to int. Found with afl-fuzz and UBSan.

855c19ef

2017-06-01T01:04:08

Avoid reparsing in xmlParseStartTag2 The code in xmlParseStartTag2 must handle the case that the input buffer was grown and reallocated which can invalidate pointers to attribute values. Before, this was handled by detecting changes of the input buffer "base" pointer and, in case of a change, jumping back to the beginning of the function and reparsing the start tag. The major problem of this approach is that whether an input buffer is reallocated is nondeterministic, resulting in seemingly random test failures. See the mailing list thread "runtest mystery bug: name2.xml error case regression test" from 2012, for example. If a reallocation was detected, the code also made no attempts to continue parsing in case of errors which makes a difference in the lax "recover" mode. Now we store the current input buffer "base" pointer for each (not separately allocated) attribute in the namespace URI field, which isn't used until later. After the whole start tag was parsed, the pointers to the attribute values are reconstructed using the offset between the new and the old input buffer. This relies on arithmetic on dangling pointers which is technically undefined behavior. But it seems like the easiest and most efficient fix and a similar approach is used in xmlParserInputGrow. This changes the error output of several tests, typically making it more verbose because we try harder to continue parsing in case of errors. (Another possible solution is to check not only the "base" pointer but the size of the input buffer as well. But this would result in even more reparsing.)

f4029cd4

2016-04-21T16:37:26

Check XPath exponents for overflow Avoid undefined behavior and wrong results with huge exponents. Found with afl-fuzz and UBSan.

a58331a6

2017-05-29T21:02:21

Check for overflow in xmlXPathIsPositionalPredicate Avoid undefined behavior when casting from double to int. Found with afl-fuzz and UBSan.

a851868a

2017-05-29T20:14:42

Parse small XPath numbers more accurately Don't count leading zeros towards the fraction size limit. This allows to parse numbers like 0.0000000000000000000000000000000000000000000000000000000001 which is the only standard-conformant way to represent such numbers, as scientific notation isn't allowed in XPath 1.0. (It is allowed in XPath 2.0 and in libxml2 as an extension, though.) Overall accuracy is still bad, see bug 783238.

4bebb030

2016-04-21T13:41:09

Rework XPath rounding functions Use the C library's floor and ceil functions. The old code was overly complicated for no apparent reason and could result in undefined behavior when handling NaNs (found with afl-fuzz and UBSan). Fix wrong comment in xmlXPathRoundFunction. The implementation was already following the spec and rounding half up.

40f58521

2017-05-26T20:16:35

Fix axis traversal from attribute and namespace nodes When traversing the "preceding" axis from an attribute node, we must first go up to the attribute's containing element. Otherwise, text children of other attributes could be returned. This made it possible to hit a code path in xmlXPathNextAncestor which contained another bug: The attribute node was initialized with the context node instead of the current node. Normally, this code path is only hit via xmlXPathNextAncestorOrSelf in which case the current and context node are the same. The combination of the two bugs could result in an infinite loop, found with libFuzzer. Traversing the "following" and the "preceding" axis from namespace nodes should be handled similarly. This wasn't supported at all previously.

9ab01a27

2016-06-28T14:22:23

Fix XPointer paths beginning with range-to The old code would invoke the broken xmlXPtrRangeToFunction. range-to isn't really a function but a special kind of location step. Remove this function and always handle range-to in the XPath code. The old xmlXPtrRangeToFunction could also be abused to trigger a use-after-free error with the potential for remote code execution. Found with afl-fuzz. Fixes CVE-2016-5131.

d8083bf7

2016-06-25T12:35:50

Fix NULL pointer deref in XPointer range-to - Check for errors after evaluating first operand. - Add sanity check for empty stack. Found with afl-fuzz.

0bcd05c5

2016-03-01T15:18:04

Heap-based buffer overread in htmlCurrentChar For https://bugzilla.gnome.org/show_bug.cgi?id=758606 * parserInternals.c: (xmlNextChar): Add an test to catch other issues on ctxt->input corruption proactively. For non-UTF-8 charsets, xmlNextChar() failed to check for the end of the input buffer and would continuing reading. Fix this by pulling out the check for the end of the input buffer into common code, and return if we reach the end of the input buffer prematurely. * result/HTML/758606.html: Added. * result/HTML/758606.html.err: Added. * result/HTML/758606.html.sax: Added. * result/HTML/758606_2.html: Added. * result/HTML/758606_2.html.err: Added. * result/HTML/758606_2.html.sax: Added. * test/HTML/758606.html: Added test case. * test/HTML/758606_2.html: Added test case.

00906759

2016-01-26T16:57:03

Heap-based buffer-underreads due to xmlParseName For https://bugzilla.gnome.org/show_bug.cgi?id=759573 * parser.c: (xmlParseElementDecl): Return early on invalid input to fix non-minimized test case (759573-2.xml). Otherwise the parser gets into a bad state in SKIP(3) at the end of the function. (xmlParseConditionalSections): Halt parsing when hitting invalid input that would otherwise caused xmlParserHandlePEReference() to recurse unexpectedly. This fixes the minimized test case (759573.xml). * result/errors/759573-2.xml: Add. * result/errors/759573-2.xml.err: Add. * result/errors/759573-2.xml.str: Add. * result/errors/759573.xml: Add. * result/errors/759573.xml.err: Add. * result/errors/759573.xml.str: Add. * test/errors/759573-2.xml: Add. * test/errors/759573.xml: Add.

38eae571

2016-03-07T14:04:08

Heap use-after-free in xmlSAX2AttributeNs For https://bugzilla.gnome.org/show_bug.cgi?id=759020 * parser.c: (xmlParseStartTag2): Attribute strings are only valid if the base does not change, so add another check where the base may change. Make sure to set 'attvalue' to NULL after freeing it. * result/errors/759020.xml: Added. * result/errors/759020.xml.err: Added. * result/errors/759020.xml.str: Added. * test/errors/759020.xml: Added test case.

beca86e8

2016-05-04T11:23:49

Detect change of encoding when parsing HTML names From https://bugzilla.gnome.org/show_bug.cgi?id=758518 Happens when a file has a name getting parsed, but no valid encoding set, so libxml has to guess what the encoding is. This patch detects when the buffer location changes, and if it does, restarts the parsing of the name. This slightly change a couple of regression tests output

45752d2c

2016-03-03T11:50:34

Bug 759398: Heap use-after-free in xmlDictComputeFastKey <https://bugzilla.gnome.org/show_bug.cgi?id=759398> * parser.c: (xmlParseNCNameComplex): Store start position instead of a pointer to the name since the underlying buffer may change, resulting in a stale pointer being used. * result/errors/759398.xml: Added. * result/errors/759398.xml.err: Added. * result/errors/759398.xml.str: Added. * test/errors/759398.xml: Added test case.

a820dbea

2016-03-01T11:34:04

Bug 758605: Heap-based buffer overread in xmlDictAddString <https://bugzilla.gnome.org/show_bug.cgi?id=758605> Reviewed by David Kilzer. * HTMLparser.c: (htmlParseName): Add bounds check. (htmlParseNameComplex): Ditto. * result/HTML/758605.html: Added. * result/HTML/758605.html.err: Added. * result/HTML/758605.html.sax: Added. * runtest.c: (pushParseTest): The input for the new test case was so small (4 bytes) that htmlParseChunk() was never called after htmlCreatePushParserCtxt(), thereby creating a false positive test failure. Fixed by using a do-while loop so we always call htmlParseChunk() at least once. * test/HTML/758605.html: Added.

db07dd61

2016-02-12T09:58:29

Bug 758588: Heap-based buffer overread in xmlParserPrintFileContextInternal <https://bugzilla.gnome.org/show_bug.cgi?id=758588> * parser.c: (xmlParseEndTag2): Add bounds checks before dereferencing ctxt->input->cur past the end of the buffer, or incrementing the pointer past the end of the buffer. * result/errors/758588.xml: Add test result. * result/errors/758588.xml.err: Ditto. * result/errors/758588.xml.str: Ditto. * test/errors/758588.xml: Add regression test.

6eb0894a

2016-05-05T16:49:00

Fix memory leak with XPath namespace nodes Set hasNsNodes to 1 when adding namespace nodes via XP_TEST_HIT.

82b73039

2016-04-30T17:53:10

Fix namespace axis traversal When the namespace axis is traversed in "toBool" mode, the traversal can exit early, before visiting all nodes. In this case, the XPath context still contains a non-NULL tmpNsList. This means that - the check when to start a new traversal was wrong and - the tmpNsList could be leaked. Fixes bug #750037 and, by accident, bug #756075: https://bugzilla.gnome.org/show_bug.cgi?id=750037 https://bugzilla.gnome.org/show_bug.cgi?id=756075

839689a9

2016-04-27T18:00:12

Don't recurse into OP_VALUEs in xmlXPathOptimizeExpression The ch1 slot of OP_VALUEs contains an invalid value. Ignore it. Fixes bug #760325: https://bugzilla.gnome.org/show_bug.cgi?id=760325

f39fd66e

2016-04-27T03:01:16

Fix namespace::node() XPath expression Make sure that xmlXPathNodeSetAddNs is called for namespace nodes when matched with a namespace::node() step. This correctly sets the parent of namespace nodes. Note that xmlXPathNodeSetAddNs must only be called if working on the namespace axis. Otherwise, the context node is not the parent of the namespace node and the standard XP_TEST_HIT macro must be invoked. This explains the errors in the C14N tests that the old TODO comment mentioned.

e2893903

2016-04-21T19:19:23

Fix parsing of NCNames in XPath The NCName parser would allow any NameChar as start character. For example, the following XPath expressions would compile: self::-abc self::0abc self::.abc

cad102b8

2016-04-15T22:41:24

Do normalize string-based datatype value in RelaxNG facet checking Original patch is from Jan Pokorný <jpokorny redhat com> https://mail.gnome.org/archives/xml/2013-November/msg00028.html Improve it according to reviews and add test files.

5be1a6e8

2016-01-19T11:38:52

Bug 760861: REGRESSION (bf9c1dad): Missing results for test/schemas/regexp-char-ref_[01].xsd <https://bugzilla.gnome.org/show_bug.cgi?id=760861> Add missing test results to fix the following errors when running "make Schemastests": ## Schemas regression tests diff: ./result/schemas/regexp-char-ref_0_0.err: No such file or directory diff: ./result/schemas/regexp-char-ref_1_0.err: No such file or directory * result/schemas/regexp-char-ref_0_0.err: Added. * result/schemas/regexp-char-ref_1_0.err: Added.

49bbfdb6

2016-03-14T15:53:16

Add missing RNG test files For https://bugzilla.gnome.org/show_bug.cgi?id=760249 Add missing test results from Bug 710744 for commit 6473a41a49601da8355c4b407b99474ada170213.

4f8606c1

2016-01-05T13:38:09

Bug 760183: REGRESSION (v2.9.3): XML push parser fails with bogus UTF-8 encoding error when multi-byte character in large CDATA section is split across buffer <https://bugzilla.gnome.org/show_bug.cgi?id=760183> * parser.c: (xmlCheckCdataPush): Add 'complete' argument to describe whether the buffer passed in is the whole CDATA buffer, or if there is more data to parse. If there is more data to parse, don't return a negative value for an invalid multi-byte UTF-8 character that is split between buffers. (xmlParseTryOrFinish): Pass 'complete' argument to xmlCheckCdataPush() as appropriate. * result/cdata-2-byte-UTF-8.xml: Added. * result/cdata-2-byte-UTF-8.xml.rde: Added. * result/cdata-2-byte-UTF-8.xml.rdr: Added. * result/cdata-2-byte-UTF-8.xml.sax: Added. * result/cdata-2-byte-UTF-8.xml.sax2: Added. * result/cdata-3-byte-UTF-8.xml: Added. * result/cdata-3-byte-UTF-8.xml.rde: Added. * result/cdata-3-byte-UTF-8.xml.rdr: Added. * result/cdata-3-byte-UTF-8.xml.sax: Added. * result/cdata-3-byte-UTF-8.xml.sax2: Added. * result/cdata-4-byte-UTF-8.xml: Added. * result/cdata-4-byte-UTF-8.xml.rde: Added. * result/cdata-4-byte-UTF-8.xml.rdr: Added. * result/cdata-4-byte-UTF-8.xml.sax: Added. * result/cdata-4-byte-UTF-8.xml.sax2: Added. * result/noent/cdata-2-byte-UTF-8.xml: Added. * result/noent/cdata-3-byte-UTF-8.xml: Added. * result/noent/cdata-4-byte-UTF-8.xml: Added. * test/cdata-2-byte-UTF-8.xml: Added. * test/cdata-3-byte-UTF-8.xml: Added. * test/cdata-4-byte-UTF-8.xml: Added. - Add tests and results. Only 'make Readertests XMLPushtests' fails prior to the fix.

a7a94612

2016-02-09T12:55:29

Heap-based buffer overread in xmlNextChar For https://bugzilla.gnome.org/show_bug.cgi?id=759671 when the end of the internal subset isn't properly detected xmlParseInternalSubset should just return instead of trying to process input further.

f1063fdb

2015-11-20T16:06:59

CVE-2015-7500 Fix memory access error due to incorrect entities boundaries For https://bugzilla.gnome.org/show_bug.cgi?id=756525 handle properly the case where we popped out of the current entity while processing a start tag Reported by Kostya Serebryany @ Google This slightly modifies the output of 754946 in regression tests

4a5d80ad

2015-09-18T15:06:46

Fix a bug in CData error handling in the push parser For https://bugzilla.gnome.org/show_bug.cgi?id=754947 The checking function was returning incorrect args in some cases Adds the test to teh reg suite and fix one of the existing test output

51f02b0a

2015-09-15T16:50:32

Fix a bug on name parsing at the end of current input buffer For https://bugzilla.gnome.org/show_bug.cgi?id=754946 When hitting the end of the current input buffer while parsing a name we could end up loosing the beginning of the name, which led to various issues.

ef709ce2

2015-09-10T19:41:41

Fix the spurious ID already defined error For https://bugzilla.gnome.org/show_bug.cgi?id=737840 the fix for 724903 introduced a regression on external entities carrying IDs, revert that patch in part and add a specific test to avoid readding it

2fab235d

2015-03-16T08:38:36

Fix support for except in nameclasses For https://bugzilla.gnome.org/show_bug.cgi?id=565219 The code was imply missing even if simple, added a few regression tests.

02b252d7

2015-03-08T17:00:37

Regression test for bug #695699

342658a1

2015-03-08T16:46:04

Add a couple of XPath tests

f6aaabce

2015-03-08T16:05:26

Allow attributes on descendant-or-self axis If the context node is an attribute, the attribute itself is on the descendant-or-self axis. The principal node type of this axis is element, so the only node test that can return the attribute is "node()". In other words, "@attr/descendant-or-self::node()" is equivalent to "@attr". This matches the behavior of Saxon-CE.

df23f584

2014-10-23T13:52:47

Adding example from bugs 738805 to regression tests For https://bugzilla.gnome.org/show_bug.cgi?id=738805 Tortuous test case provided by pierre.labastie@neuf.fr

6473a41a

2013-10-23T14:51:33

Implement choice for name classes on attributes https://bugzilla.gnome.org/show_bug.cgi?id=710744

dcc19503

2013-05-22T22:56:45

Fix a parsing bug on non-ascii element and CR/LF usage https://bugzilla.gnome.org/show_bug.cgi?id=698550 Somehow the behaviour of the internal parser routine changed slightly when encountering CR/LF, which led to a bug when parsing document with non-ascii Names

483272f3

2013-03-27T13:37:14

Added a regression tests from bug 694228 data Provided by Mark Rowe <mrowe@apple.com>

a3f1e3e5

2013-03-11T13:57:53

Avoid extra processing on entities If an entity has already been checked for correctness no need to check it on every reference

a7982ce2

2012-10-25T15:39:39

Adding streaming validation to runtest checks

kc3-lang/libxml2/result

result

Log