parser.c


Log

Author Commit Date CI Message
David Kilzer 44e9118c 2022-04-08T12:33:17 Prevent integer-overflow in htmlSkipBlankChars() and xmlSkipBlankChars() * HTMLparser.c: (htmlSkipBlankChars): * parser.c: (xmlSkipBlankChars): - Cap the return value at INT_MAX. - The commit range that OSS-Fuzz listed for the fix didn't make any changes to xmlSkipBlankChars(), so it seems like this issue may still exist. Found by OSS-Fuzz Issue 44803.
David Kilzer 21561e83 2016-05-20T15:21:43 Mark more static data as `const` Similar to 8f5710379, mark more static data structures with `const` keyword. Also fix placement of `const` in encoding.c. Original patch by Sarah Wilkin.
Nick Wellnhofer 92bff866 2022-03-29T14:18:31 Fix calls to deprecated init/cleanup functions Only use xmlInitParser/xmlCleanupParser.
Nick Wellnhofer 96849544 2022-03-22T19:10:51 Revert "Continue to parse entity refs in recovery mode" This reverts commit 84823b86344fb530790a8787b80abf62715ea885 which exposed several other, potentially serious bugs. Fixes #356.
Nick Wellnhofer 7d02c729 2022-03-06T00:49:02 Fix parser progress checks Testing the current input pointer for modification is unreliable since the input buffer could have been freed and realloced. Check whether the input id and the up-to-date number of bytes consumed match.
Nick Wellnhofer 84823b86 2022-03-05T22:48:11 Continue to parse entity refs in recovery mode There doesn't seem to be a good reason to abort in xmlParseReference if a well-formedness error was detected. Removing this check allows to parse entity references after an error in recovery mode. Fixes #270.
Nick Wellnhofer d99ddd9b 2022-03-05T21:46:40 Improve buffer allocation scheme In most places, we really need the double-it scheme to avoid quadratic behavior. The hybrid scheme still can cause many reallocations and the bounded scheme doesn't seem to provide meaningful protection in xmlreader.c.
Nick Wellnhofer ebb17970 2022-03-04T02:31:59 Remove unneeded #includes
Nick Wellnhofer 776d15d3 2022-03-02T00:29:17 Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h
Nick Wellnhofer 89d9ef3e 2022-03-01T15:14:00 Reset last error in xmlCleanupGlobals Before, we tried to reset the last error in xmlCleanupParser. But if xmlCleanupParser wasn't called from the main thread, this would reset the thread-local error object. xmlCleanupGlobals has access to the error object of the main thread and can reset it reliably.
Nick Wellnhofer 2489c1d0 2022-02-28T22:42:10 Remove useless __CYGWIN__ checks From what I can tell, some really early Cygwin versions from around 1998-2000 used to erroneously define _WIN32. This was eventually fixed, but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is unnecessary. Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether to use __declspec.
Nick Wellnhofer c41bc10d 2022-02-22T19:57:12 Fix unused variable warnings with disabled features
Nick Wellnhofer 346c3a93 2022-02-20T18:46:42 Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.
Nick Wellnhofer 9edc20c1 2022-02-07T20:38:30 Fix double counting of CRLF in comments Fixes #151.
Nick Wellnhofer 96535657 2022-02-07T15:26:33 Make sure to grow input buffer in xmlParseMisc Otherwise, large amount of whitespace could lead to documents not being parsed correctly. Fixes #299.
Nick Wellnhofer d85245f9 2022-01-16T21:39:04 Fix regression with PEs in external DTD Fix a regression introduced with commit a28f7d87. In some cases, parameter entity references in external DTDs wouldn't be expanded. Fixes #306.
Yulin Li 46c658b0 2021-08-06T08:48:24 move current position before possible calling of ctxt->sax->characters.
David King fe564967 2021-07-14T14:35:17 Fix memory leak in xmlCreateIOParserCtxt Found by Coverity. https://bugzilla.redhat.com/show_bug.cgi?id=1938806
Mike Dalessio a7b9f3eb 2021-05-20T13:38:54 fix: avoid segfault at exit when using custom memory functions This extends the fix introduced by 956534e to Windows processes dynamically loading libxml2. Closes #256.
Daniel Veillard 8598060b 2021-05-13T14:55:12 Patch for security issue CVE-2021-3541 This is relapted to parameter entities expansion and following the line of the billion laugh attack. Somehow in that path the counting of parameters was missed and the normal algorithm based on entities "density" was useless.
Nick Wellnhofer bfd2f430 2021-05-09T18:56:57 Fix null deref in legacy SAX1 parser Always call nameNsPush instead of namePush. The latter is unused now and should probably be removed from the public API. I can't see how it could be used reasonably from client code and the unprefixed name has always polluted the global namespace. Fixes a null pointer dereference introduced with de5b624f when parsing in SAX1 mode. Found by OSS-Fuzz.
Nick Wellnhofer ce00c36e 2021-05-08T21:20:05 Store per-element parser state in a struct Make the parser context's "pushTab" point to an array of structs instead of void pointers. This avoids casting unrelated types to void pointers, improving readability and portability, and allows for more efficient packing. Ultimately, the struct could be extended to include the contents of "nameTab" and "spaceTab", further simplifying the code. Historically, "pushTab" was only used by the push parser (hence the name), so the change to the public headers should be safe. Also remove an unused parameter from xmlParseEndTag2.
Nick Wellnhofer de5b624f 2021-05-08T20:21:29 Fix handling of unexpected EOF in xmlParseContent Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was removed in commit 62150ed2. This commit also introduced a regression for direct users of xmlParseContent. Unclosed tags weren't checked.
Nick Wellnhofer 3e80560d 2021-05-07T10:51:38 Fix line numbers in error messages for mismatched tags Commit 62150ed2 introduced a small regression in the error messages for mismatched tags. This typically only affected messages after the first mismatch, but with custom SAX handlers all line numbers would be off. This also fixes line numbers in the SAX push parser which were never handled correctly.
Nick Wellnhofer babe7503 2021-05-01T16:53:33 Propagate error in xmlParseElementChildrenContentDeclPriv Check return value of recursive calls to xmlParseElementChildrenContentDeclPriv and return immediately in case of errors. Otherwise, struct xmlElementContent could contain unexpected null pointers, leading to a null deref when post-validating documents which aren't well-formed and parsed in recovery mode. Fixes #243.
Nick Wellnhofer c3fd8c42 2021-03-13T17:19:32 Fix exponential behavior with recursive entities Fix another case where only recursion depth was limited, but entities would still be expanded over and over again. The test case discovered by fuzzing only affected parsing in recovery mode with XML_PARSE_RECOVER. Found by OSS-Fuzz.
Mike Dalessio afad3721 2021-01-31T09:53:56 parser.c: shrink the input buffer when appropriate Fixes GNOME/libxml2#200 Also see discussions at: - GNOME/libxml2#192 - https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e - https://github.com/sparklemotion/nokogiri/issues/2132
Nick Wellnhofer 79301d3d 2020-12-18T12:50:21 Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.
Nick Wellnhofer 45da175c 2020-12-18T12:14:52 Fix memory leak in xmlParseElementMixedContentDecl Free parsed content if malloc fails to avoid a memory leak. Found with libFuzzer.
Mike Dalessio c0c26ff2 2020-10-11T16:33:07 parser.c: xmlParseCharData peek behavior fixed wrt newlines Previously, xmlParseCharData and xmlParseComment would consider 0xA to be unhandleable when seen as the first byte of an input chunk, and fall back to xmlParseCharDataComplex and xmlParseCommentComplex, which have different memory and performance characteristics. Fixes GNOME/libxml2#192
yanjinjq 7929f057 2020-08-30T10:34:01 Fix SEGV in xmlSAXParseFileWithData Fixes #181.
Nick Wellnhofer 99fc048d 2020-08-14T14:18:50 Don't use SAX1 if all element handlers are NULL Running xmllint with "--sax --noout" installs a SAX2 handler with all callbacks set to NULL. In this case or similar situations, we don't want to switch to SAX1 parsing.
Nick Wellnhofer b82fa3dd 2020-08-09T14:50:46 Fix column number accounting in xmlParse*NameAndCompare Thanks to Frederic Vancraeyveldt for the report.
Nick Wellnhofer 438e595a 2020-08-09T14:43:53 Stop counting nbChars in parser context The value was inaccurate and never used.
Nick Wellnhofer 956534e0 2020-08-04T19:27:13 Check for custom free function in global destructor Calling a custom deallocation function in the global destructor could cause all kinds of unexpected problems. See for example https://github.com/sparklemotion/nokogiri/issues/2059 Only clean up if memory is managed with malloc/free.
David Kilzer 0e5c4fec 2020-07-13T15:20:45 Reset XML parser input before reporting errors Apply changes to htmlParseChunk() in 13ba5b61 and 3f18e748 to xmlParseChunk().
Martin Vidner 43a8836c 2020-05-31T18:46:21 Fix rebuilding docs, by hiding __attribute__((...)) behind a macro. When enabled via `./configure --enable-rebuild-docs`, `make -C doc libxml2-api.xml` will invoke apibuild.py to rebuild libxml2-api.xml from the sources. But the code added in 9fa3200cb366c726f7c8ef234282603bb9e8816d made it error out with ``` Parsing ../parser.c Parse Error: parsing type : expecting a name ('Got token ', ('sep', '(')) ('Last token: ', ('sep', '(')) ('Token queue: ', [('name', 'destructor'), ('sep', ')'), ('sep', ')')]) ('Line 14689 end: ', '') ```
Nick Wellnhofer a28f7d87 2020-06-10T13:41:13 Never expand parameter entities in text declaration When parsing the text declaration of external DTDs or entities, make sure that parameter entities are not expanded. This also fixes a memory leak in certain error cases. The change to xmlSkipBlankChars assumes that the parser state is maintained correctly when parsing external DTDs or parameter entities, and might expose bugs in the code that were hidden previously. Found by OSS-Fuzz.
Nick Wellnhofer 2e8cc66d 2020-05-30T15:40:08 xmlParseBalancedChunkMemory must not be called with NULL doc There is no way to avoid memory leaks without a document to hold the namespace list.
Nick Wellnhofer a0a8059b 2020-05-30T15:33:03 Revert "Fix memory leak in xmlParseBalancedChunkMemoryRecover" This reverts commit 5a02583c7e683896d84878bd90641d8d9b0d0549. Fixes #161.
Samuel Thibault 9fa3200c 2020-03-31T23:18:25 Call xmlCleanupParser on ELF destruction Fixes #153.
Nick Wellnhofer 20c60886 2020-03-08T17:19:42 Fix typos Resolves #133.
Nick Wellnhofer 1a3e584a 2020-01-21T22:12:42 Merge code paths loading external entities Merge xmlParseCtxtExternalEntity into xmlParseExternalEntityPrivate.
Nick Wellnhofer f9ea1a24 2020-02-11T16:17:34 Fix copying of entities in xmlParseReference Before, reader mode would end up in a branch that didn't handle entities with multiple children and failed to update ent->last, so the hack copying the "extra" reader data wouldn't trigger. Consequently, some empty nodes in entities are correctly detected now in the test suite. (The detection of empty nodes in entities is still buggy, though.)
Kevin Puetz c7c526d6 2020-01-13T18:49:01 Fix memory leak when shared libxml.dll is unloaded When a multiple modules (process/plugins) all link to libxml2.dll they will in fact share a single loaded instance of it. It is unsafe for any of them to call xmlCleanupParser, as this would deinitialize the shared state and break others that might still have ongoing use. However, on windows atexit is per-module (rather process-wide), so if used *within* libxml2 it is possible to register a clean up when all users are done and libxml2.dll is about to actually unload. This allows multiple plugins to link with and share libxml2 without a premature cleanup if one is unloaded, while still cleaning up if *all* such callers are themselves unloaded.
Nick Wellnhofer 9bd7abfb 2020-01-02T14:14:48 Remove useless comparisons Found by lgtm.com
Zhipeng Xie 0e1a49c8 2019-12-12T17:30:55 Fix infinite loop in xmlStringLenDecodeEntities When ctxt->instate == XML_PARSER_EOF,xmlParseStringEntityRef return NULL which cause a infinite loop in xmlStringLenDecodeEntities Found with libFuzzer. Signed-off-by: Zhipeng Xie <xiezhipeng1@huawei.com>
Nick Wellnhofer 9737ec07 2019-10-29T16:19:37 Another fix for conditional sections at end of document The previous fix introduced an uninitialized read.
Nick Wellnhofer c1035664 2019-10-23T11:40:34 Fix for conditional sections at end of document Parsing conditional sections would fail if the final ']]>' was at the end of the document. Short-lived regression caused by commit c51e38cb.
Jared Yanovich 2a350ee9 2019-09-30T17:04:54 Large batch of typo fixes Closes #109.
Nick Wellnhofer c2f209c0 2019-09-30T14:13:21 Disallow conditional sections in internal subset Conditional sections are only allowed in *external* parameter entities referenced from the internal subset.
Nick Wellnhofer c51e38cb 2019-09-30T13:50:02 Make xmlParseConditionalSections non-recursive Avoid call stack overflow in deeply nested conditional sections. Found by OSS-Fuzz.
Nick Wellnhofer 62150ed2 2019-09-23T14:46:41 Make xmlParseContent and xmlParseElement non-recursive Split xmlParseElement into subfunctions. Use nameNsPush to store prefix, URI and nsNr on the heap, similar to the push parser. Closes #84.
Nick Wellnhofer a28bc751 2019-09-20T13:46:58 Fix integer overflow in entity recursion check
Nick Wellnhofer e91cbcf6 2019-09-20T12:44:17 Don't read external entities or XIncludes from stdin The file input callbacks try to read from stdin if "-" is passed as URL. This should never be done when loading indirect resources like external entities or XIncludes. Unfortunately, the stdin substitution happens deep inside the IO code, so we simply replace "-" with "./-" in specific locations. This issue also affects other users of the library like libxslt. Ideally, stdin should only be substituted on explicit request. But more intrusive changes could break existing code. Closes #90 and #102.
Zhipeng Xie 5a02583c 2019-08-07T17:39:17 Fix memory leak in xmlParseBalancedChunkMemoryRecover When doc is NULL, namespace created in xmlTreeEnsureXMLDecl is bind to newDoc->oldNs, in this case, set newDoc->oldNs to NULL and free newDoc will cause a memory leak. Found with libFuzzer. Closes #82.
Stephen Chenney 87125732 2019-07-08T12:54:21 Switched from unsigned long to ptrdiff_t in parser.c Using unsigned long instead of ptrdiff_t results in non-zero pointer deltas being stored as zero delta, giving incorrect offsets into arrays and hence out of bounds reads. This patch fixes the issue in all places in parser.c and adds a macro to reduce the chances of cut-and-paste errors. Only affects platforms where 'sizeof(long) < sizeof(size_t)' like 64-bit Windows. See https://bugs.chromium.org/p/chromium/issues/detail?id=894933 Closes #44.
Nick Wellnhofer 01ea9c5a 2019-07-08T11:29:40 Fix another code path in xmlParseQName Check for buffer errors in another code path missed in the previous commit. Found by OSS-Fuzz.
Nick Wellnhofer 5ccac8ce 2019-06-27T10:23:36 Make sure that xmlParseQName returns NULL in error case If there's an error growing the input buffer when recovering from invalid QNames, make sure to return NULL. Otherwise, callers could be confused. In xmlParseStartTag2, for example, `tlen` could become negative. Found by OSS-Fuzz.
Nick Wellnhofer f9fce963 2019-05-16T21:16:01 Fix unsigned integer overflow It's defined behavior but -fsanitize=unsigned-integer-overflow is useful to discover bugs.
David Warring 3c0d62b4 2019-05-13T07:15:44 Fix parser termination from "Double hyphen within comment" error The patch fixes the parser not halting immediately when the error handler attempts to stop the parser. Rather it was running on and continuing to reference the freed buffer in the while loop termination test. This is only a problem if xmlStopParser is called from an error handler. Probably caused by commit 123234f2. Fixes #58.
Nick Wellnhofer b48226f7 2019-01-07T17:58:32 Fix memory leaks in xmlParseStartTag2 error paths Found by OSS-Fuzz.
Nick Wellnhofer 8919885f 2019-01-01T16:30:38 Fix -Wformat-truncation warnings (GCC 8)
Nick Wellnhofer 123234f2 2018-09-11T14:52:07 Free input buffer in xmlHaltParser This avoids miscalculation of available bytes. Thanks to Yunho Kim for the report. Closes: #26
Nick Wellnhofer 707ad080 2018-01-23T16:37:54 Fix xmlParserEntityCheck A previous commit removed the check for XML_ERR_ENTITY_LOOP which is required to abort early in case of excessive entity recursion.
Nick Wellnhofer ab362ab0 2018-01-22T15:40:05 Halt parser in case of encoding error Should fix crbug.com/793715, although I wasn't able to reproduce the issue.
Nick Wellnhofer 60dded12 2018-01-22T15:04:58 Clear entity content in case of errors This only affects recovery mode and avoids integer overflow in xmlStringGetNodeList and possibly other nasty surprises. See bug 783052 and https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3874 https://bugs.chromium.org/p/chromium/issues/detail?id=796804
Nick Wellnhofer 132af1a0 2018-01-08T18:48:01 Fix buffer over-read in xmlParseNCNameComplex Calling GROW can halt the parser if the buffer grows too large. This will set the buffer to an empty string. Return immediately in this case, otherwise the "current" pointer is advanced leading to a buffer over-read. Found with OSS-Fuzz. See https://oss-fuzz.com/testcase?key=6683819592646656 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5031
Daniel Veillard ad88b54f 2017-12-08T09:42:31 Improve handling of context input_id For https://bugzilla.gnome.org/show_bug.cgi?id=772726 This was used in xmlsec to detect issues with accessing external entities and prevent them, but was unreliable, based on a patch from Aleksey Sanin * parser.c: make sure input_id is incremented when creating sub-entities for parsing or when parsing out of context
Nick Wellnhofer cb5541c9 2017-11-13T17:08:38 Fix libz and liblzma detection If libz or liblzma are detected with pkg-config, AC_CHECK_HEADERS must not be run because the correct CPPFLAGS aren't set. It is actually not required have separate checks for LIBXML_ZLIB_ENABLED and HAVE_ZLIB_H. Only check for LIBXML_ZLIB_ENABLED and remove HAVE_ZLIB_H macro. Fixes bug 764657, bug 787041.
Nick Wellnhofer e03f0a19 2017-11-09T16:42:47 Fix hash callback signatures Make sure that all parameters and return values of hash callback functions exactly match the callback function type. This is required to pass clang's Control Flow Integrity checks and to allow compilation to asm.js with Emscripten. Fixes bug 784861.
Vlad Tsyrklevich 28f52fe8 2017-08-10T15:08:48 Refactor name and type signature for xmlNop Update xmlNop's name to xmlInputReadCallbackNop and its type signature to match xmlInputReadCallback. Fixes bug 786134.
Nick Wellnhofer e3890546 2017-10-09T00:20:01 Fix the Windows header mess Don't include windows.h and wsockcompat.h from config.h but only when needed. Don't define _WINSOCKAPI_ manually. This was apparently done to stop windows.h from including winsock.h which is a problem if winsock2.h wasn't included first. But on MinGW, this causes compiler warnings. Define WIN32_LEAN_AND_MEAN instead which has the same effect. Always use the compiler-defined _WIN32 macro instead of WIN32.
Nick Wellnhofer d422b954 2017-10-09T13:37:42 Fix pointer/int cast warnings on 64-bit Windows On 64-bit Windows, `long` is 32 bits wide and can't hold a pointer. Switch to ptrdiff_t instead which should be the same size as a pointer on every somewhat sane platform without requiring C99 types like intptr_t. Fixes bug 788312. Thanks to J. Peter Mugaas for the report and initial patch.
Nick Wellnhofer b90d8989 2017-09-19T15:45:35 Fix regression with librsvg Instead of using xmlCreateIOParserCtxt, librsvg pushes its own xmlParserInput on top of a memory push parser. This incorrect use of the API confuses several parser checks and, since 2.9.5, completely breaks documents with internal subsets. Work around the problem with internal subsets. Thanks to Petr Sumbera for the report: https://mail.gnome.org/archives/xml/2017-September/msg00011.html Also see https://bugzilla.gnome.org/show_bug.cgi?id=787895
Nick Wellnhofer abbda93c 2017-09-11T01:14:16 Handle more invalid entity values in recovery mode In attribute content, don't emit entity references if there are problems with the entity value. Otherwise some illegal entity values like <!ENTITY a '&#38;#x123456789;'> would later cause problems like integer overflow. Make xmlStringLenDecodeEntities return NULL on more error conditions including invalid char refs and errors from recursive calls. Remove some fragile error checks based on lastError that shouldn't be needed now. Clear the entity content in xmlParseAttValueComplex if an error was found. Found by OSS-Fuzz. Should fix bug 783052. Also see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3343
Nick Wellnhofer 0fcab658 2017-09-07T18:25:11 Handle illegal entity values in recovery mode Make xmlParseEntityValue always return NULL on error. Otherwise some illegal entity values like <!ENTITY e '&%#4294967298;'> would later cause problems like integer overflow. Found by OSS-Fuzz. Should fix bug 783052. Also see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=592 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2732
Nick Wellnhofer 69936b12 2017-08-30T14:16:01 Revert "Print error messages for truncated UTF-8 sequences" This reverts commit 79c8a6b which caused a serious regression in streaming mode. Also reverts part of commit 52ceced "Fix infinite loops with push parser in recovery mode". Fixes bug 786554.
Stéphane Michaut 454e397e 2017-08-28T14:30:43 Porting libxml2 on zOS encoding of code First set of patches for zOS - entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c: ask conversion of code to ISO Latin 1 to avoid having the compiler assume EBCDIC codepoint for characters. - xmlmodule.c: make sure we have support for modules - xmlIO.c: zOS path names are special avoid dsome of the expectstions from Unix/Windows
Nick Wellnhofer 899a5d9f 2017-07-25T14:59:49 Detect infinite recursion in parameter entities When expanding a parameter entity in a DTD, infinite recursion could lead to an infinite loop or memory exhaustion. Thanks to Wei Lei for the first of many reports. Fixes bug 759579.
Nick Wellnhofer 52ceced6 2017-07-01T17:49:30 Fix infinite loops with push parser in recovery mode Make sure that the input pointer advances in case of errors. Otherwise, the push parser can loop infinitely. Found with libFuzzer.
Nick Wellnhofer 3eef3f39 2017-06-20T16:13:57 Fix NULL deref in xmlParseExternalEntityPrivate If called from xmlParseExternalEntity, oldctxt is NULL which leads to a NULL deref if an error occurs. This only affects external code that calls xmlParseExternalEntity. Patch from David Kilzer with minor changes. Fixes bug 780159.
Nick Wellnhofer 872fea94 2017-06-19T00:24:12 Get rid of "blanks wrapper" for parameter entities Now that replacement of parameter entities goes exclusively through xmlSkipBlankChars, we can account for the surrounding space characters there and remove the "blanks wrapper" hack.
Nick Wellnhofer d9e43c7d 2017-06-19T18:01:23 Make sure not to call IS_BLANK_CH when parsing the DTD This is required to get rid of the "blanks wrapper" hack. Checking the return value of xmlSkipBlankChars is more efficient, too.
Nick Wellnhofer 453dff1e 2017-06-19T17:55:20 Remove unnecessary calls to xmlPopInput It's enough if xmlPopInput is called from xmlSkipBlankChars. Since the replacement text of a parameter entity is surrounded with space characters, that's the only place where the replacement can end in a well-formed document. This is also required to get rid of the "blanks wrapper" hack.
Nick Wellnhofer aa267cd1 2017-06-18T23:29:51 Simplify handling of parameter entity references There are only two places where parameter entity references must be handled. For the internal subset in xmlParseInternalSubset. For the external subset or content from other external PEs in xmlSkipBlankChars. Make sure that xmlSkipBlankChars skips over sequences of PEs and whitespace. Rely on xmlSkipBlankChars instead of calling xmlParsePEReference directly when in the external subset or a conditional section. xmlParserHandlePEReference is unused now.
Nick Wellnhofer 24246c76 2017-06-20T12:56:36 Fix xmlHaltParser Pop all extra input streams before resetting the input. Otherwise, a call to xmlPopInput could make input available again. Also set input->end to input->cur. Changes the test output for some error tests. Unfortunately, some fuzzed test cases were added to the test suite without manual cleanup. This makes it almost impossible to review the impact of later changes on the test output.
Nick Wellnhofer 8bbe4508 2017-06-17T16:15:09 Spelling and grammar fixes Fixes bug 743172, bug 743489, bug 769632, bug 782400 and a few other misspellings.
Nick Wellnhofer 5f440d8c 2017-06-12T14:32:34 Rework entity boundary checks Make sure to finish all entities in the internal subset. Nevertheless, readd a sanity check in xmlParseStartTag2 that was lost in my previous commit. Also add a sanity check in xmlPopInput. Popping an input unexpectedly was the source of many recent memory bugs. The check doesn't mitigate such issues but helps with diagnosis. Always base entity boundary checks on the input ID, not the input pointer. The pointer could have been reallocated to the old address. Always throw a well-formedness error if a boundary check fails. In a few places, a validity error was thrown. Fix a few error codes and improve indentation.
Nick Wellnhofer 46dc9890 2017-06-08T02:24:56 Don't switch encoding for internal parameter entities This is only needed for external entities. Trying to switch the encoding for internal entities could also cause a memory leak in recovery mode.
Nick Wellnhofer 03904159 2017-06-05T21:16:00 Merge duplicate code paths handling PE references xmlParsePEReference is essentially a subset of xmlParserHandlePEReference, so make xmlParserHandlePEReference call xmlParsePEReference. The code paths in these functions differed slighty, but the code from xmlParserHandlePEReference seems more solid and tested.
David Kilzer 3f0627a1 2017-06-16T21:30:42 Fix duplicate SAX callbacks for entity content Reset 'was_checked' to prevent entity from being parsed twice and SAX callbacks being invoked twice if XML_PARSE_NOENT was set. This regressed in version 2.9.3 and caused problems with WebKit. Fixes bug 760367.
Nick Wellnhofer fb2f518c 2017-06-10T17:06:16 Fix potential infinite loop in xmlStringLenDecodeEntities Make sure that xmlParseStringPEReference advances the "str" pointer even if the parser was stopped. Otherwise xmlStringLenDecodeEntities can loop infinitely.
Nick Wellnhofer 4ba8cc85 2017-06-10T02:33:58 Remove useless check in xmlParseAttributeListDecl Since we already successfully parsed the attribute name and other items, it is guaranteed that we made progress in the input stream. Comparing the input pointer to a previous value also looks fragile to me. What if the input buffer was reallocated and the new "cur" pointer happens to be the same as the old one? There are a couple of similar checks which also take "consumed" into account. This seems to be safer but I'm not convinced that it couldn't lead to false alarms in rare situations.
Nick Wellnhofer bedbef80 2017-06-09T15:10:13 Fix memory leak in xmlParseEntityDecl error path When parsing the entity value, it can happen that an external entity with an unsupported encoding is loaded and the parser is stopped. This would lead to a memory leak. A custom SAX callback could also stop the parser. Found with libFuzzer and ASan.
Nick Wellnhofer 030b1f7a 2017-06-06T15:53:42 Revert "Add an XML_PARSE_NOXXE flag to block all entities loading even local" This reverts commit 2304078555896cf1638c628f50326aeef6f0e0d0. The new flag doesn't work and the change even broke the XML_PARSE_NONET option.
Nick Wellnhofer e2663054 2017-06-05T15:37:17 Fix handling of parameter-entity references There were two bugs where parameter-entity references could lead to an unexpected change of the input buffer in xmlParseNameComplex and xmlDictLookup being called with an invalid pointer. Percent sign in DTD Names ========================= The NEXTL macro used to call xmlParserHandlePEReference. When parsing "complex" names inside the DTD, this could result in entity expansion which created a new input buffer. The fix is to simply remove the call to xmlParserHandlePEReference from the NEXTL macro. This is safe because no users of the macro require expansion of parameter entities. - xmlParseNameComplex - xmlParseNCNameComplex - xmlParseNmtoken The percent sign is not allowed in names, which are grammatical tokens. - xmlParseEntityValue Parameter-entity references in entity values are expanded but this happens in a separate step in this function. - xmlParseSystemLiteral Parameter-entity references are ignored in the system literal. - xmlParseAttValueComplex - xmlParseCharDataComplex - xmlParseCommentComplex - xmlParsePI - xmlParseCDSect Parameter-entity references are ignored outside the DTD. - xmlLoadEntityContent This function is only called from xmlStringLenDecodeEntities and entities are replaced in a separate step immediately after the function call. This bug could also be triggered with an internal subset and double entity expansion. This fixes bug 766956 initially reported by Wei Lei and independently by Chromium's ClusterFuzz, Hanno Böck, and Marco Grassi. Thanks to everyone involved. xmlParseNameComplex with XML_PARSE_OLD10 ======================================== When parsing Names inside an expanded parameter entity with the XML_PARSE_OLD10 option, xmlParseNameComplex would call xmlGROW via the GROW macro if the input buffer was exhausted. At the end of the parameter entity's replacement text, this function would then call xmlPopInput which invalidated the input buffer. There should be no need to invoke GROW in this situation because the buffer is grown periodically every XML_PARSER_CHUNK_SIZE characters and, at least for UTF-8, in xmlCurrentChar. This also matches the code path executed when XML_PARSE_OLD10 is not set. This fixes bugs 781205 (CVE-2017-9049) and 781361 (CVE-2017-9050). Thanks to Marcel Böhme and Thuan Pham for the report. Additional hardening ==================== A separate check was added in xmlParseNameComplex to validate the buffer size.
Nick Wellnhofer 855c19ef 2017-06-01T01:04:08 Avoid reparsing in xmlParseStartTag2 The code in xmlParseStartTag2 must handle the case that the input buffer was grown and reallocated which can invalidate pointers to attribute values. Before, this was handled by detecting changes of the input buffer "base" pointer and, in case of a change, jumping back to the beginning of the function and reparsing the start tag. The major problem of this approach is that whether an input buffer is reallocated is nondeterministic, resulting in seemingly random test failures. See the mailing list thread "runtest mystery bug: name2.xml error case regression test" from 2012, for example. If a reallocation was detected, the code also made no attempts to continue parsing in case of errors which makes a difference in the lax "recover" mode. Now we store the current input buffer "base" pointer for each (not separately allocated) attribute in the namespace URI field, which isn't used until later. After the whole start tag was parsed, the pointers to the attribute values are reconstructed using the offset between the new and the old input buffer. This relies on arithmetic on dangling pointers which is technically undefined behavior. But it seems like the easiest and most efficient fix and a similar approach is used in xmlParserInputGrow. This changes the error output of several tests, typically making it more verbose because we try harder to continue parsing in case of errors. (Another possible solution is to check not only the "base" pointer but the size of the input buffer as well. But this would result in even more reparsing.)
Nick Wellnhofer 07b7428b 2017-06-01T00:19:14 Simplify control flow in xmlParseStartTag2 Remove some goto labels and deduplicate a bit of code after handling namespaces. Before: loop { parseAttribute if (ok) { if (defaultNamespace) { handleDefaultNamespace if (error) goto skip_default_ns; handleDefaultNamespace skip_default_ns: freeAttr nextAttr continue; } if (namespace) { handleNamespace if (error) goto skip_ns; handleNamespace skip_ns: freeAttr nextAttr; continue; } handleAttr } else { freeAttr } nextAttr } After: loop { parseAttribute if (!ok) goto next_attr; if (defaultNamespace) { handleDefaultNamespace if (error) goto next_attr; handleDefaultNamespace } else if (namespace) { handleNamespace if (error) goto next_attr; handleNamespace } else { handleAttr } next_attr: freeAttr nextAttr }
Nick Wellnhofer 47496724 2017-05-31T16:46:39 Avoid spurious UBSan errors in parser.c If available, use a C99 flexible array member to avoid spurious UBSan errors.