parser.c


Log

Author Commit Date CI Message
Nick Wellnhofer b90d8989 2017-09-19T15:45:35 Fix regression with librsvg Instead of using xmlCreateIOParserCtxt, librsvg pushes its own xmlParserInput on top of a memory push parser. This incorrect use of the API confuses several parser checks and, since 2.9.5, completely breaks documents with internal subsets. Work around the problem with internal subsets. Thanks to Petr Sumbera for the report: https://mail.gnome.org/archives/xml/2017-September/msg00011.html Also see https://bugzilla.gnome.org/show_bug.cgi?id=787895
Nick Wellnhofer abbda93c 2017-09-11T01:14:16 Handle more invalid entity values in recovery mode In attribute content, don't emit entity references if there are problems with the entity value. Otherwise some illegal entity values like <!ENTITY a '&#38;#x123456789;'> would later cause problems like integer overflow. Make xmlStringLenDecodeEntities return NULL on more error conditions including invalid char refs and errors from recursive calls. Remove some fragile error checks based on lastError that shouldn't be needed now. Clear the entity content in xmlParseAttValueComplex if an error was found. Found by OSS-Fuzz. Should fix bug 783052. Also see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3343
Nick Wellnhofer 0fcab658 2017-09-07T18:25:11 Handle illegal entity values in recovery mode Make xmlParseEntityValue always return NULL on error. Otherwise some illegal entity values like <!ENTITY e '&%#4294967298;'> would later cause problems like integer overflow. Found by OSS-Fuzz. Should fix bug 783052. Also see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=592 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2732
Nick Wellnhofer 69936b12 2017-08-30T14:16:01 Revert "Print error messages for truncated UTF-8 sequences" This reverts commit 79c8a6b which caused a serious regression in streaming mode. Also reverts part of commit 52ceced "Fix infinite loops with push parser in recovery mode". Fixes bug 786554.
Stéphane Michaut 454e397e 2017-08-28T14:30:43 Porting libxml2 on zOS encoding of code First set of patches for zOS - entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c: ask conversion of code to ISO Latin 1 to avoid having the compiler assume EBCDIC codepoint for characters. - xmlmodule.c: make sure we have support for modules - xmlIO.c: zOS path names are special avoid dsome of the expectstions from Unix/Windows
Nick Wellnhofer 899a5d9f 2017-07-25T14:59:49 Detect infinite recursion in parameter entities When expanding a parameter entity in a DTD, infinite recursion could lead to an infinite loop or memory exhaustion. Thanks to Wei Lei for the first of many reports. Fixes bug 759579.
Nick Wellnhofer 52ceced6 2017-07-01T17:49:30 Fix infinite loops with push parser in recovery mode Make sure that the input pointer advances in case of errors. Otherwise, the push parser can loop infinitely. Found with libFuzzer.
Nick Wellnhofer 3eef3f39 2017-06-20T16:13:57 Fix NULL deref in xmlParseExternalEntityPrivate If called from xmlParseExternalEntity, oldctxt is NULL which leads to a NULL deref if an error occurs. This only affects external code that calls xmlParseExternalEntity. Patch from David Kilzer with minor changes. Fixes bug 780159.
Nick Wellnhofer 872fea94 2017-06-19T00:24:12 Get rid of "blanks wrapper" for parameter entities Now that replacement of parameter entities goes exclusively through xmlSkipBlankChars, we can account for the surrounding space characters there and remove the "blanks wrapper" hack.
Nick Wellnhofer d9e43c7d 2017-06-19T18:01:23 Make sure not to call IS_BLANK_CH when parsing the DTD This is required to get rid of the "blanks wrapper" hack. Checking the return value of xmlSkipBlankChars is more efficient, too.
Nick Wellnhofer 453dff1e 2017-06-19T17:55:20 Remove unnecessary calls to xmlPopInput It's enough if xmlPopInput is called from xmlSkipBlankChars. Since the replacement text of a parameter entity is surrounded with space characters, that's the only place where the replacement can end in a well-formed document. This is also required to get rid of the "blanks wrapper" hack.
Nick Wellnhofer aa267cd1 2017-06-18T23:29:51 Simplify handling of parameter entity references There are only two places where parameter entity references must be handled. For the internal subset in xmlParseInternalSubset. For the external subset or content from other external PEs in xmlSkipBlankChars. Make sure that xmlSkipBlankChars skips over sequences of PEs and whitespace. Rely on xmlSkipBlankChars instead of calling xmlParsePEReference directly when in the external subset or a conditional section. xmlParserHandlePEReference is unused now.
Nick Wellnhofer 24246c76 2017-06-20T12:56:36 Fix xmlHaltParser Pop all extra input streams before resetting the input. Otherwise, a call to xmlPopInput could make input available again. Also set input->end to input->cur. Changes the test output for some error tests. Unfortunately, some fuzzed test cases were added to the test suite without manual cleanup. This makes it almost impossible to review the impact of later changes on the test output.
Nick Wellnhofer 8bbe4508 2017-06-17T16:15:09 Spelling and grammar fixes Fixes bug 743172, bug 743489, bug 769632, bug 782400 and a few other misspellings.
Nick Wellnhofer 5f440d8c 2017-06-12T14:32:34 Rework entity boundary checks Make sure to finish all entities in the internal subset. Nevertheless, readd a sanity check in xmlParseStartTag2 that was lost in my previous commit. Also add a sanity check in xmlPopInput. Popping an input unexpectedly was the source of many recent memory bugs. The check doesn't mitigate such issues but helps with diagnosis. Always base entity boundary checks on the input ID, not the input pointer. The pointer could have been reallocated to the old address. Always throw a well-formedness error if a boundary check fails. In a few places, a validity error was thrown. Fix a few error codes and improve indentation.
Nick Wellnhofer 46dc9890 2017-06-08T02:24:56 Don't switch encoding for internal parameter entities This is only needed for external entities. Trying to switch the encoding for internal entities could also cause a memory leak in recovery mode.
Nick Wellnhofer 03904159 2017-06-05T21:16:00 Merge duplicate code paths handling PE references xmlParsePEReference is essentially a subset of xmlParserHandlePEReference, so make xmlParserHandlePEReference call xmlParsePEReference. The code paths in these functions differed slighty, but the code from xmlParserHandlePEReference seems more solid and tested.
David Kilzer 3f0627a1 2017-06-16T21:30:42 Fix duplicate SAX callbacks for entity content Reset 'was_checked' to prevent entity from being parsed twice and SAX callbacks being invoked twice if XML_PARSE_NOENT was set. This regressed in version 2.9.3 and caused problems with WebKit. Fixes bug 760367.
Nick Wellnhofer fb2f518c 2017-06-10T17:06:16 Fix potential infinite loop in xmlStringLenDecodeEntities Make sure that xmlParseStringPEReference advances the "str" pointer even if the parser was stopped. Otherwise xmlStringLenDecodeEntities can loop infinitely.
Nick Wellnhofer 4ba8cc85 2017-06-10T02:33:58 Remove useless check in xmlParseAttributeListDecl Since we already successfully parsed the attribute name and other items, it is guaranteed that we made progress in the input stream. Comparing the input pointer to a previous value also looks fragile to me. What if the input buffer was reallocated and the new "cur" pointer happens to be the same as the old one? There are a couple of similar checks which also take "consumed" into account. This seems to be safer but I'm not convinced that it couldn't lead to false alarms in rare situations.
Nick Wellnhofer bedbef80 2017-06-09T15:10:13 Fix memory leak in xmlParseEntityDecl error path When parsing the entity value, it can happen that an external entity with an unsupported encoding is loaded and the parser is stopped. This would lead to a memory leak. A custom SAX callback could also stop the parser. Found with libFuzzer and ASan.
Nick Wellnhofer 030b1f7a 2017-06-06T15:53:42 Revert "Add an XML_PARSE_NOXXE flag to block all entities loading even local" This reverts commit 2304078555896cf1638c628f50326aeef6f0e0d0. The new flag doesn't work and the change even broke the XML_PARSE_NONET option.
Nick Wellnhofer e2663054 2017-06-05T15:37:17 Fix handling of parameter-entity references There were two bugs where parameter-entity references could lead to an unexpected change of the input buffer in xmlParseNameComplex and xmlDictLookup being called with an invalid pointer. Percent sign in DTD Names ========================= The NEXTL macro used to call xmlParserHandlePEReference. When parsing "complex" names inside the DTD, this could result in entity expansion which created a new input buffer. The fix is to simply remove the call to xmlParserHandlePEReference from the NEXTL macro. This is safe because no users of the macro require expansion of parameter entities. - xmlParseNameComplex - xmlParseNCNameComplex - xmlParseNmtoken The percent sign is not allowed in names, which are grammatical tokens. - xmlParseEntityValue Parameter-entity references in entity values are expanded but this happens in a separate step in this function. - xmlParseSystemLiteral Parameter-entity references are ignored in the system literal. - xmlParseAttValueComplex - xmlParseCharDataComplex - xmlParseCommentComplex - xmlParsePI - xmlParseCDSect Parameter-entity references are ignored outside the DTD. - xmlLoadEntityContent This function is only called from xmlStringLenDecodeEntities and entities are replaced in a separate step immediately after the function call. This bug could also be triggered with an internal subset and double entity expansion. This fixes bug 766956 initially reported by Wei Lei and independently by Chromium's ClusterFuzz, Hanno Böck, and Marco Grassi. Thanks to everyone involved. xmlParseNameComplex with XML_PARSE_OLD10 ======================================== When parsing Names inside an expanded parameter entity with the XML_PARSE_OLD10 option, xmlParseNameComplex would call xmlGROW via the GROW macro if the input buffer was exhausted. At the end of the parameter entity's replacement text, this function would then call xmlPopInput which invalidated the input buffer. There should be no need to invoke GROW in this situation because the buffer is grown periodically every XML_PARSER_CHUNK_SIZE characters and, at least for UTF-8, in xmlCurrentChar. This also matches the code path executed when XML_PARSE_OLD10 is not set. This fixes bugs 781205 (CVE-2017-9049) and 781361 (CVE-2017-9050). Thanks to Marcel Böhme and Thuan Pham for the report. Additional hardening ==================== A separate check was added in xmlParseNameComplex to validate the buffer size.
Nick Wellnhofer 855c19ef 2017-06-01T01:04:08 Avoid reparsing in xmlParseStartTag2 The code in xmlParseStartTag2 must handle the case that the input buffer was grown and reallocated which can invalidate pointers to attribute values. Before, this was handled by detecting changes of the input buffer "base" pointer and, in case of a change, jumping back to the beginning of the function and reparsing the start tag. The major problem of this approach is that whether an input buffer is reallocated is nondeterministic, resulting in seemingly random test failures. See the mailing list thread "runtest mystery bug: name2.xml error case regression test" from 2012, for example. If a reallocation was detected, the code also made no attempts to continue parsing in case of errors which makes a difference in the lax "recover" mode. Now we store the current input buffer "base" pointer for each (not separately allocated) attribute in the namespace URI field, which isn't used until later. After the whole start tag was parsed, the pointers to the attribute values are reconstructed using the offset between the new and the old input buffer. This relies on arithmetic on dangling pointers which is technically undefined behavior. But it seems like the easiest and most efficient fix and a similar approach is used in xmlParserInputGrow. This changes the error output of several tests, typically making it more verbose because we try harder to continue parsing in case of errors. (Another possible solution is to check not only the "base" pointer but the size of the input buffer as well. But this would result in even more reparsing.)
Nick Wellnhofer 07b7428b 2017-06-01T00:19:14 Simplify control flow in xmlParseStartTag2 Remove some goto labels and deduplicate a bit of code after handling namespaces. Before: loop { parseAttribute if (ok) { if (defaultNamespace) { handleDefaultNamespace if (error) goto skip_default_ns; handleDefaultNamespace skip_default_ns: freeAttr nextAttr continue; } if (namespace) { handleNamespace if (error) goto skip_ns; handleNamespace skip_ns: freeAttr nextAttr; continue; } handleAttr } else { freeAttr } nextAttr } After: loop { parseAttribute if (!ok) goto next_attr; if (defaultNamespace) { handleDefaultNamespace if (error) goto next_attr; handleDefaultNamespace } else if (namespace) { handleNamespace if (error) goto next_attr; handleNamespace } else { handleAttr } next_attr: freeAttr nextAttr }
Nick Wellnhofer 47496724 2017-05-31T16:46:39 Avoid spurious UBSan errors in parser.c If available, use a C99 flexible array member to avoid spurious UBSan errors.
Nick Wellnhofer 8627e4ed 2017-05-23T18:11:08 Fix memory leak in parser error path Triggered in mixed content ELEMENT declarations if there's an invalid name after the first valid name: <!ELEMENT para (#PCDATA|a|<invalid>)*> Found with libFuzzer and ASan.
Neel Mehta 90ccb582 2017-04-07T17:43:02 Prevent unwanted external entity reference For https://bugzilla.gnome.org/show_bug.cgi?id=780691 * parser.c: add a specific check to avoid PE reference
Doran Moppert 23040785 2017-04-07T16:45:56 Add an XML_PARSE_NOXXE flag to block all entities loading even local For https://bugzilla.gnome.org/show_bug.cgi?id=772726 * include/libxml/parser.h: Add a new parser flag XML_PARSE_NOXXE * elfgcchack.h, xmlIO.h, xmlIO.c: associated loading routine * include/libxml/xmlerror.h: new error raised * xmllint.c: adds --noxxe flag to activate the option
Daniel Veillard bdd66182 2016-05-23T12:27:58 Avoid building recursive entities For https://bugzilla.gnome.org/show_bug.cgi?id=762100 When we detect a recusive entity we should really not build the associated data, moreover if someone bypass libxml2 fatal errors and still tries to serialize a broken entity make sure we don't risk to get ito a recursion * parser.c: xmlParserEntityCheck() don't build if entity loop were found and remove the associated text content * tree.c: xmlStringGetNodeList() avoid a potential recursion
David Kilzer 00906759 2016-01-26T16:57:03 Heap-based buffer-underreads due to xmlParseName For https://bugzilla.gnome.org/show_bug.cgi?id=759573 * parser.c: (xmlParseElementDecl): Return early on invalid input to fix non-minimized test case (759573-2.xml). Otherwise the parser gets into a bad state in SKIP(3) at the end of the function. (xmlParseConditionalSections): Halt parsing when hitting invalid input that would otherwise caused xmlParserHandlePEReference() to recurse unexpectedly. This fixes the minimized test case (759573.xml). * result/errors/759573-2.xml: Add. * result/errors/759573-2.xml.err: Add. * result/errors/759573-2.xml.str: Add. * result/errors/759573.xml: Add. * result/errors/759573.xml.err: Add. * result/errors/759573.xml.str: Add. * test/errors/759573-2.xml: Add. * test/errors/759573.xml: Add.
Pranjal Jumde 38eae571 2016-03-07T14:04:08 Heap use-after-free in xmlSAX2AttributeNs For https://bugzilla.gnome.org/show_bug.cgi?id=759020 * parser.c: (xmlParseStartTag2): Attribute strings are only valid if the base does not change, so add another check where the base may change. Make sure to set 'attvalue' to NULL after freeing it. * result/errors/759020.xml: Added. * result/errors/759020.xml.err: Added. * result/errors/759020.xml.str: Added. * test/errors/759020.xml: Added test case.
David Kilzer 4472c3a5 2016-05-13T15:13:17 Fix some format string warnings with possible format string vulnerability For https://bugzilla.gnome.org/show_bug.cgi?id=761029 Decorate every method in libxml2 with the appropriate LIBXML_ATTR_FORMAT(fmt,args) macro and add some cleanups following the reports.
Daniel Veillard b1d34de4 2016-03-14T17:19:44 Fix inappropriate fetch of entities content For https://bugzilla.gnome.org/show_bug.cgi?id=761430 libfuzzer regression testing exposed another case where the parser would fetch content of an external entity while not in validating mode. Plug that hole
Pranjal Jumde 45752d2c 2016-03-03T11:50:34 Bug 759398: Heap use-after-free in xmlDictComputeFastKey <https://bugzilla.gnome.org/show_bug.cgi?id=759398> * parser.c: (xmlParseNCNameComplex): Store start position instead of a pointer to the name since the underlying buffer may change, resulting in a stale pointer being used. * result/errors/759398.xml: Added. * result/errors/759398.xml.err: Added. * result/errors/759398.xml.str: Added. * test/errors/759398.xml: Added test case.
David Kilzer db07dd61 2016-02-12T09:58:29 Bug 758588: Heap-based buffer overread in xmlParserPrintFileContextInternal <https://bugzilla.gnome.org/show_bug.cgi?id=758588> * parser.c: (xmlParseEndTag2): Add bounds checks before dereferencing ctxt->input->cur past the end of the buffer, or incrementing the pointer past the end of the buffer. * result/errors/758588.xml: Add test result. * result/errors/758588.xml.err: Ditto. * result/errors/758588.xml.str: Ditto. * test/errors/758588.xml: Add regression test.
Peter Simons 8f30bdff 2016-04-15T11:56:55 Add missing increments of recursion depth counter to XML parser. For https://bugzilla.gnome.org/show_bug.cgi?id=765207 CVE-2016-3705 The functions xmlParserEntityCheck() and xmlParseAttValueComplex() used to call xmlStringDecodeEntities() in a recursive context without incrementing the 'depth' counter in the parser context. Because of that omission, the parser failed to detect attribute recursions in certain documents before running out of stack space.
Jan Pokorný bb654feb 2016-04-13T16:56:07 Fix typos: dictio{ nn -> n }ar{y,ies} Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
David Kilzer 4f8606c1 2016-01-05T13:38:09 Bug 760183: REGRESSION (v2.9.3): XML push parser fails with bogus UTF-8 encoding error when multi-byte character in large CDATA section is split across buffer <https://bugzilla.gnome.org/show_bug.cgi?id=760183> * parser.c: (xmlCheckCdataPush): Add 'complete' argument to describe whether the buffer passed in is the whole CDATA buffer, or if there is more data to parse. If there is more data to parse, don't return a negative value for an invalid multi-byte UTF-8 character that is split between buffers. (xmlParseTryOrFinish): Pass 'complete' argument to xmlCheckCdataPush() as appropriate. * result/cdata-2-byte-UTF-8.xml: Added. * result/cdata-2-byte-UTF-8.xml.rde: Added. * result/cdata-2-byte-UTF-8.xml.rdr: Added. * result/cdata-2-byte-UTF-8.xml.sax: Added. * result/cdata-2-byte-UTF-8.xml.sax2: Added. * result/cdata-3-byte-UTF-8.xml: Added. * result/cdata-3-byte-UTF-8.xml.rde: Added. * result/cdata-3-byte-UTF-8.xml.rdr: Added. * result/cdata-3-byte-UTF-8.xml.sax: Added. * result/cdata-3-byte-UTF-8.xml.sax2: Added. * result/cdata-4-byte-UTF-8.xml: Added. * result/cdata-4-byte-UTF-8.xml.rde: Added. * result/cdata-4-byte-UTF-8.xml.rdr: Added. * result/cdata-4-byte-UTF-8.xml.sax: Added. * result/cdata-4-byte-UTF-8.xml.sax2: Added. * result/noent/cdata-2-byte-UTF-8.xml: Added. * result/noent/cdata-3-byte-UTF-8.xml: Added. * result/noent/cdata-4-byte-UTF-8.xml: Added. * test/cdata-2-byte-UTF-8.xml: Added. * test/cdata-3-byte-UTF-8.xml: Added. * test/cdata-4-byte-UTF-8.xml: Added. - Add tests and results. Only 'make Readertests XMLPushtests' fails prior to the fix.
Daniel Veillard a7a94612 2016-02-09T12:55:29 Heap-based buffer overread in xmlNextChar For https://bugzilla.gnome.org/show_bug.cgi?id=759671 when the end of the internal subset isn't properly detected xmlParseInternalSubset should just return instead of trying to process input further.
Daniel Veillard f1063fdb 2015-11-20T16:06:59 CVE-2015-7500 Fix memory access error due to incorrect entities boundaries For https://bugzilla.gnome.org/show_bug.cgi?id=756525 handle properly the case where we popped out of the current entity while processing a start tag Reported by Kostya Serebryany @ Google This slightly modifies the output of 754946 in regression tests
Daniel Veillard 3bd6ae14 2015-11-20T15:06:02 Fix some loop issues embedding NEXT Next can switch the parser back to XML_PARSER_EOF state, we need to consider those in loops consuming input
Daniel Veillard 35bcb1d7 2015-11-20T15:04:09 Detect incoherency on GROW the current pointer to the input has to be between the base and end if not stop everything we have an internal state error.
Daniel Veillard e3b15974 2015-11-20T14:59:30 Reuse xmlHaltParser() where it makes sense Unify the various place where either xmlStopParser was called (which resets the error as a side effect) and places where we used ctxt->instate = XML_PARSER_EOF to stop further processing
Daniel Veillard 28cd9cb7 2015-11-20T14:55:30 Add xmlHaltParser() to stop the parser The problem is doing it in a consistent and safe fashion It's more complex than just setting ctxt->instate = XML_PARSER_EOF Update the public function to reuse that new internal routine
David Drysdale 69030714 2015-11-20T11:13:45 CVE-2015-5312 Another entity expansion issue For https://bugzilla.gnome.org/show_bug.cgi?id=756733 It is one case where the code in place to detect entities expansions failed to exit when the situation was detected, leading to DoS Problem reported by Kostya Serebryany @ Google Patch provided by David Drysdale @ Google
Daniel Veillard 53ac9c96 2015-11-09T18:16:00 xmlStopParser reset errNo I had used it in contexts where that information ought to be preserved
Daniel Veillard afd27c21 2015-11-09T18:07:18 Avoid processing entities after encoding conversion failures For https://bugzilla.gnome.org/show_bug.cgi?id=756527 and was also raised by Chromium team in the past When we hit a convwersion failure when switching encoding it is bestter to stop parsing there, this was treated as a fatal error but the parser was continuing to process to extract more errors, unfortunately that makes little sense as the data is obviously corrupt and can potentially lead to unexpected behaviour.
Hugh Davenport ab2b9a93 2015-11-03T20:40:49 Avoid extra processing of MarkupDecl when EOF For https://bugzilla.gnome.org/show_bug.cgi?id=756263 One place where ctxt->instate == XML_PARSER_EOF whic was set up by entity detection issues doesn't get noticed, and even overrided
Daniel Veillard 41ac9049 2015-10-27T10:53:44 Fix an error in previous Conditional section patch an off by one mistake in the change, led to error on correct document where the end of the included entity was exactly the end of the conditional section, leading to regtest failure
Daniel Veillard bd0526e6 2015-10-23T19:02:28 Another variation of overflow in Conditional sections Which happen after the previous fix to https://bugzilla.gnome.org/show_bug.cgi?id=756456 But stopping the parser and exiting we didn't pop the intermediary entities and doing the SKIP there applies on an input which may be too small
Gaurav Gupta cf77e605 2015-09-30T14:46:29 Add missing Null check in xmlParseExternalEntityPrivate For https://bugzilla.gnome.org/show_bug.cgi?id=755857 a case where we check for NULL but not everywhere
Daniel Veillard 4a5d80ad 2015-09-18T15:06:46 Fix a bug in CData error handling in the push parser For https://bugzilla.gnome.org/show_bug.cgi?id=754947 The checking function was returning incorrect args in some cases Adds the test to teh reg suite and fix one of the existing test output
Daniel Veillard 51f02b0a 2015-09-15T16:50:32 Fix a bug on name parsing at the end of current input buffer For https://bugzilla.gnome.org/show_bug.cgi?id=754946 When hitting the end of the current input buffer while parsing a name we could end up loosing the beginning of the name, which led to various issues.
Daniel Veillard 709a9521 2015-06-29T16:10:26 Fail parsing early on if encoding conversion failed For https://bugzilla.gnome.org/show_bug.cgi?id=751631 If we fail conversing the current input stream while processing the encoding declaration of the XMLDecl then it's safer to just abort there and not try to report further errors.
Daniel Veillard 9aa37588 2015-06-29T09:08:25 Do not process encoding values if the declaration if broken For https://bugzilla.gnome.org/show_bug.cgi?id=751603 If the string is not properly terminated do not try to convert to the given encoding.
Daniel Veillard 9b851233 2015-02-23T11:29:20 Cleanup conditional section error handling For https://bugzilla.gnome.org/show_bug.cgi?id=744980 The error handling of Conditional Section also need to be straightened as the structure of the document can't be guessed on a failure there and it's better to stop parsing as further errors are likely to be irrelevant.
Daniel Veillard a7dfab74 2015-02-23T11:17:35 Stop parsing on entities boundaries errors For https://bugzilla.gnome.org/show_bug.cgi?id=744980 There are times, like on unterminated entities that it's preferable to stop parsing, even if that means less error reporting. Entities are feeding the parser on further processing, and if they are ill defined then it's possible to get the parser to bug. Also do the same on Conditional Sections if the input is broken, as the structure of the document can't be guessed.
Daniel Veillard 72a46a51 2014-10-23T11:35:36 Fix missing entities after CVE-2014-3660 fix For https://bugzilla.gnome.org/show_bug.cgi?id=738805 The fix for CVE-2014-3660 introduced a regression in some case where entity substitution is required and the entity is used first in anotther entity referenced from an attribute value
Daniel Veillard f65128f3 2014-10-17T17:13:41 Revert "Missing initialization for the catalog module" This reverts commit 054c716ea1bf001544127a4ab4f4346d1b9947e7. As this break xmlcatalog command https://bugzilla.redhat.com/show_bug.cgi?id=1153753
Daniel Veillard be2a7eda 2014-10-16T13:59:47 Fix for CVE-2014-3660 Issues related to the billion laugh entity expansion which happened to escape the initial set of fixes
Bart De Schuymer 500c54ef 2014-10-16T12:17:20 fix memory leak xml header encoding field with XML_PARSE_IGNORE_ENC When the xml parser encounters an xml encoding in an xml header while configured with option XML_PARSE_IGNORE_ENC, it fails to free memory allocated for storing the encoding. The patch below fixes this. How to reproduce: 1. Change doc/examples/parse4.c to add xmlCtxtUseOptions(ctxt, XML_PARSE_IGNORE_ENC); after the call to xmlCreatePushParserCtxt. 2. Rebuild 3. run the following command from the top libxml2 directory: LD_LIBRARY_PATH=.libs/ valgrind --leak-check=full ./doc/examples/.libs/parse4 ./test.xml , where test.xml contains following input: <?xml version="1.0" encoding="UTF-81" ?><hi/> valgrind will report: ==1964== 10 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==1964== at 0x4C272DB: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1964== by 0x4E88497: xmlParseEncName (parser.c:10224) ==1964== by 0x4E888FE: xmlParseEncodingDecl (parser.c:10295) ==1964== by 0x4E89630: xmlParseXMLDecl (parser.c:10534) ==1964== by 0x4E8B737: xmlParseTryOrFinish (parser.c:11293) ==1964== by 0x4E8E775: xmlParseChunk (parser.c:12283) Signed-off-by: Bart De Schuymer <bart at amplidata com>
Daniel Veillard 7cf57380 2014-10-08T16:09:56 Parser error on repeated recursive entity expansion containing &lt; For https://bugzilla.gnome.org/show_bug.cgi?id=736417 basically a weird side effect and a failure to properly parenthesize a boolean expression led to this bug
Dennis Filder 7e9bbdf8 2014-10-06T20:34:14 parser bug on misformed namespace attributes For https://bugzilla.gnome.org/show_bug.cgi?id=672539 Reported by Axel Miller <axel.miller@ppi.de> Consider the following start-tag: <x xmlns=""version=""> The start-tag does not conform to the rule [40] STag ::= '<' Name (S Attribute)* S? '>' since there is no whitespace in front of the attribute "version". Thus, libxml2 should reject the start-tag. But it doesn't: $ echo '<x xmlns=""version=""/>' | xmllint - <?xml version="1.0"?> <x xmlns="" version=""/> The error seems to happen only if there is a namespace declaration in front of the attribute. A missing whitespace between other attributes is handled correctly: $ echo '<x someattr=""version=""/>' | xmllint - -:1: parser error : attributes construct error <x someattr=""version=""/> ^ [...]
Juergen Keil 24fb4c32 2014-10-06T18:19:12 wrong error column in structured error when parsing end tag For https://bugzilla.gnome.org/show_bug.cgi?id=734283 libxml2 reports wrong error column numbers (field int2 in xmlError) in structured error handler, after parsing an end tag.
Juergen Keil 33f658c9 2014-08-07T17:30:36 wrong error column in structured error when parsing attribute values For https://bugzilla.gnome.org/show_bug.cgi?id=734280 libxml2 reports wrong error column numbers (field int2 in xmlError) in structured error handler, after parsing XML attribute values. Example XML: <?xml version="1.0" encoding="UTF-8"?> <root xmlns="urn:colbug">&</root> <!-- 1 2 3 4 1234567890123456789012345678901234567890 --> Expected location of the error would be line 3, column 21. The actual location of the error is line 3, column 9: $ ./xmlparse colbug2.xml colbug2.xml:3:9: xmlParseEntityRef: no name The 12 characters of the xmlns attribute value "urn:colbug" are not accounted for in the error column value.
Juergen Keil 5d4310af 2014-08-07T16:28:09 wrong error column in structured error when skipping whitespace in xml decl For https://bugzilla.gnome.org/show_bug.cgi?id=734276 libxml2 reports wrong error column numbers (field int2 in xmlError) in structured error handler, after an XML declaration containing whitespace. Example XML: <?xml version="1.0" encoding="UTF-8" ?><root>&</root> <!-- 1 2 3 4 5 6 123456789012345678901234567890123456789012345678901234567890 --> Expected location of the error would be line 1, column 53. The actual location of the error is line 1, column 44: $ ./xmlparse colbug1.xml colbug1.xml:1:44: xmlParseEntityRef: no name
Daniel Veillard 2f9b126a 2014-07-26T20:29:36 typo in error messages "colon are forbidden from..." For https://bugzilla.gnome.org/show_bug.cgi?id=731511 Pointed byt vincent Lefevre
Daniel Veillard c836ba66 2014-07-14T16:39:50 Fix a potential NULL dereference For https://bugzilla.gnome.org/show_bug.cgi?id=733040 xmlDictLookup() may return NULL in case of allocation error, though very unlikely it need to be checked.
Daniel Veillard dd8367da 2014-06-11T16:54:32 Fix regressions introduced by CVE-2014-0191 patch A number of issues have been raised after the fix, and this patch tries to correct all of them, though most were related to postvalidation. https://bugzilla.gnome.org/show_bug.cgi?id=730290 and other reports on list, off-list and on Red Hat bugzilla
Daniel Veillard 9cd1c3cf 2014-04-22T15:30:56 Do not fetch external parameter entities Unless explicitely asked for when validating or replacing entities with their value. Problem pointed out by Daniel Berrange <berrange@redhat.com>
Daniel Veillard 6faa126f 2014-03-21T17:05:51 Fix xmlParseInNodeContext() if node is not element We really need to have ctxt->instate == XML_PARSER_CONTENT when jumping in content parsing Bug reported by Frank Gross
Longstreth Jon 190a0b89 2014-02-06T10:58:17 Fix a portability issue on Windows Apparently an verflow when comparing macro and unsigned long
Daniel Veillard 054c716e 2014-01-26T15:02:25 Missing initialization for the catalog module
Daniel Veillard 4e1476c5 2013-12-09T15:23:40 adding init calls to xml and html Read parsing entry points As pointed out by "Tassyns, Bram <BramT@enfocus.com>" on the list some call had it other didn't, clean it up and add to all missing ones
Jan Pokorný 9a85d40c 2013-11-29T23:26:25 Fix incorrect spelling entites->entities Partially, a follow-up of 81d7a8245cf9a31a49499a5a195c2b89e6f91180. Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Daniel Veillard dcc19503 2013-05-22T22:56:45 Fix a parsing bug on non-ascii element and CR/LF usage https://bugzilla.gnome.org/show_bug.cgi?id=698550 Somehow the behaviour of the internal parser routine changed slightly when encountering CR/LF, which led to a bug when parsing document with non-ascii Names
Daniel Veillard 63588f47 2013-05-10T14:01:46 Fix a regression in xmlGetDocCompressMode() The switch to xzlib had for consequence that the compression level of the input was not gathered anymore in ctxt->input->buf, then the parser compression flags was left to -1 and propagated to the resulting document. Fix the I/O layer to get compression detection in xzlib, then carry it in the input buffer and the resulting document This should fix https://lsbbugs.linuxfoundation.org/show_bug.cgi?id=3456
Nikolay Sivov d4a5d981 2013-04-30T17:45:36 Cast encoding name to char pointer to match arg type
Alexander Pastukhov 704d8c5e 2013-04-23T13:02:11 Fix an error in xmlCleanupParser https://bugzilla.gnome.org/show_bug.cgi?id=698582 xmlCleanupParser calls xmlCleanupGlobals() and then xmlResetLastError() but the later reallocate the global data freed by previous call. Just swap the two calls.
Jüri Aedla 9ca816b3 2013-04-16T22:00:13 Fix a couple of return without value Error introduced in previous commit !
Daniel Veillard e50ba816 2013-04-11T15:54:51 Improve handling of xmlStopParser() Add a specific parser error Try to stop parsing as quickly as possible
Daniel Veillard cff2546f 2013-03-11T15:57:55 Cache presence of '<' in entities content slightly modify how ent->checked is used, and use the lowest bit to keep the information
Daniel Veillard a3f1e3e5 2013-03-11T13:57:53 Avoid extra processing on entities If an entity has already been checked for correctness no need to check it on every reference
Daniel Veillard 23f05e0c 2013-02-19T10:21:49 Detect excessive entities expansion upon replacement If entities expansion in the XML parser is asked for, it is possble to craft relatively small input document leading to excessive on-the-fly content generation. This patch accounts for those replacement and stop parsing after a given threshold. it can be bypassed as usual with the HUGE parser option.
Daniel Veillard bf058dce 2013-02-13T18:19:42 Fix the flushing out of raw buffers on encoding conversions https://bugzilla.gnome.org/show_bug.cgi?id=692915 the new set of converting functions tried to limit the encoding conversion of the raw buffer to the consumption one to work in a more progressive fashion. Unfortunately this was bad for performances and led to errors on progressive parsing when a very large chunk was close to the end of the document. Fix the new internal function and switch back to the old way of converting. Fix another bug in the process.
Daniel Veillard de0cc20c 2013-02-12T16:55:34 Fix some buffer conversion issues https://bugzilla.gnome.org/show_bug.cgi?id=690202 Buffer overflow errors originating from xmlBufGetInputBase in 2.9.0 The pointers from the context input were not properly reset after that call which can do reallocations.
Patrick Gansterer 9c8eaabe 2013-01-04T12:41:53 Fix compiler warning after 153cf15905cf4ec080612ada6703757d10caba1e Add missing cast for xmlNop to silence a compiler warning.
Dan Winship cf8f0424 2012-12-21T11:13:31 Fix an error in the progressive DTD parsing code For https://bugzilla.gnome.org/show_bug.cgi?id=689958 We were looking for the wrong character in the input stream
Michael Wood fb27e2cd 2012-09-28T08:59:33 Fix spelling of "length".
Daniel Veillard 6a36fbe3 2012-10-29T10:39:55 Fix potential out of bound access
Daniel Veillard 153cf159 2012-10-26T13:50:47 Fix large parse of file from memory https://bugzilla.redhat.com/show_bug.cgi?id=862969 The new code trying to detect excessive input lookup would just get wrong sometimes in the case of very large file parsed directly from memory.
Daniel Veillard 711b15d5 2012-10-25T19:23:26 Fix a bug in the nsclean option of the parser Raised as a side effect of: https://bugzilla.gnome.org/show_bug.cgi?id=663844
Daniel Veillard 6c91aa38 2012-10-25T15:33:59 Fix a regression in 2.9.0 breaking validation while streaming https://bugzilla.gnome.org/show_bug.cgi?id=684774 with help from Kjell Ahlstedt <kjell.ahlstedt@bredband.net>
Jan Pokorný 81d7a824 2012-09-13T15:56:51 Fix typos in parser comments Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Daniel Veillard f8e3db04 2012-09-11T13:26:36 Big space and tab cleanup Remove all space before tabs and space and tabs at end of lines.
Daniel Veillard 28f5e1a2 2012-09-04T11:18:39 Fix potential crash on entities errors Related to https://bugs.launchpad.net/lxml/+bug/502959 Basically the core of the issue is that if an entity references another entity, then in case we are replacing entities content, we should always do so by copying the referenced content as long as the reference is done within the entity. Otherwise, if for some reason there is a later parsing error that entity content may be freed. Complex scenario exposed by command: thinkpad:~/XML/diveintopython-5.4/xml -> valgrind --db-attach=yes ../../xmllint --loaddtd --noout --noent diveintopython.xml Document references &a; a references &b; we references b content directly in by linking in the a content a has an error further down we free a, freeing the chunk from b Document references &b; after &a; we try to copy b content, but it was freed already => segfault * parser.c: never reference directly entity content without copying if we aren't in the document main entity
Daniel Veillard 1f972e9f 2012-08-15T10:16:37 Cleanup some of the parser code Prefetching assumptions about the amount of data read in GROW should be backed up with test for 0 termination when at the end of the buffer.
Daniel Veillard 968a03a2 2012-08-13T12:41:33 Add support for big line numbers in error reporting Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com> * parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser option not switch on by default, it's an opt-in * SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers in the psvi field of text nodes * tree.c: expand xmlGetLineNo to extract those informations, also make sure we can't fail on recursive behaviour * error.c: in __xmlRaiseError, if a node is provided, call xmlGetLineNo() if we can't get a valid line number. * xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint
Daniel Veillard 5353bbf7 2012-08-03T12:03:31 More fixups on the push parser behaviour