parser.c


Log

Author Commit Date CI Message
Nick Wellnhofer f8852184 2023-02-14T13:03:13 malloc-fail: Fix memory leak in xmlParseEntityDecl Found with libFuzzer, see #344.
Nick Wellnhofer e6d22f92 2023-01-23T01:48:37 malloc-fail: Fix reallocation in inputPush Store xmlRealloc result in temporary variable to avoid null deref in error handler. Found with libFuzzer, see #344.
Nick Wellnhofer 6fd89041 2023-01-22T19:42:41 malloc-fail: Fix use-after-free in xmlParseStartTag2 Fix error handling in xmlCtxtGrowAttrs. Found with libFuzzer, see #344.
Nick Wellnhofer d1b87856 2023-01-22T17:42:09 malloc-fail: Fix infinite loop in xmlParseTextDecl Memory errors can set `instate` to `XML_PARSER_EOF` which results in `NEXT` making no progress. Found with libFuzzer, see #344.
Nick Wellnhofer bd9de3a3 2023-01-22T16:52:39 malloc-fail: Fix null deref in xmlAddDefAttrs Found with libFuzzer, see #344.
Nick Wellnhofer 33d4a0fe 2023-01-22T15:41:00 parser: Fix progress check in xmlParseExternalSubset Avoid infinite loop. Short-lived regression from f61b8a62. Found with libFuzzer.
Nick Wellnhofer 74aa61e0 2023-01-22T13:09:03 parser: Halt parser on DTD errors If we try to continue parsing after an error in the internal or external subset, entity expansion accounting gets more complicated. Simply halt the parser. Found with libFuzzer.
Nick Wellnhofer d320a683 2023-01-17T13:50:51 parser: Fix entity check in attributes Don't set the "checked" flag when checking entities in default attribute values. These entities could reference other entities which weren't defined yet, so the check isn't reliable. This fixes a short-lived regression which could lead to a call stack overflow later in xmlStringGetNodeList.
Nick Wellnhofer 59b33661 2022-12-27T14:15:51 error: Limit number of parser errors Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.
Nick Wellnhofer 66e9fd66 2022-12-25T21:26:17 parser: Fix infinite loop with push parser in recovery mode Short-lived regression from commit b1f9c193. Found by OSS-Fuzz.
Nick Wellnhofer 49b54d7e 2022-12-25T15:06:51 parser: Fix null deref in xmlStringDecodeEntitiesInt Short-lived regression.
Nick Wellnhofer 1865668b 2022-12-23T22:44:40 parser: Fix accounting of consumed input bytes Only add consumed bytes if - we're not parsing an entity - we're parsing external parameter entities for the first time. Always ignore internal parameter entities.
Nick Wellnhofer bc18f4a6 2022-12-23T21:55:38 parser: Lower entity nesting limit with XML_PARSE_HUGE The old limit of 1024 could lead to excessively deep call stacks. This could probably be set much lower without causing issues.
Nick Wellnhofer dd62e541 2022-12-23T21:53:30 parser: Don't increase depth twice when parsing internal entities Fix xmlParseBalancedChunkMemoryInternal.
Nick Wellnhofer a41b09c7 2022-12-23T21:29:28 parser: Improve detection of entity loops Set a flag to detect entity loops at once instead of processing until the depth limit is exceeded.
Nick Wellnhofer d972393f 2022-12-23T21:01:20 parser: Only report a single entity error Don't report errors multiple times for nested entity references.
Nick Wellnhofer 077df27e 2022-12-22T15:22:01 parser: Fix integer overflow of input ID Applies a patch from Chromium. Also stop incrementing input ID of subcontexts. This isn't necessary. Fixes #465.
David Kilzer 0bd4e4e0 2022-12-21T19:21:30 xmlParseStartTag2() contains typo when checking for default definitions for an attribute in a namespace * parser.c: (xmlParseStartTag2): - Fix index into defaults->values. It is only correct the first time through the loop when i == 0. Fixes #467.
Nick Wellnhofer b47ebf04 2022-12-21T00:02:47 parser: Deprecate xmlString*DecodeEntities These are internal functions.
Nick Wellnhofer ec6633af 2022-12-20T03:09:11 parser: Remove useless ent->etype test in xmlParseReference If ent->etype is invalid, ret can't equal XML_ERR_OK.
Nick Wellnhofer 7ee7f036 2022-12-20T02:06:38 parser: Remove useless ent->children tests in xmlParseReference The if-block before always returns if ent->children == NULL.
Nick Wellnhofer ce76ebfd 2022-12-19T20:56:23 entities: Stop counting entities This was only used in the old version of xmlParserEntityCheck.
Nick Wellnhofer a3c8b180 2022-12-19T20:51:52 entities: Add entity flag for loop check
Nick Wellnhofer 463bbeec 2022-12-19T18:39:45 entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.
Nick Wellnhofer 7e3f469b 2022-12-19T15:59:49 entities: Use flags to store '<' check results Instead of abusing the LSB of the "checked" member, store the result of testing for occurrence of '<' character in "flags". Also use the flags in xmlParseStringEntityRef instead of rescanning every time.
Nick Wellnhofer 481d79d4 2022-12-19T15:26:46 entities: Add XML_ENT_PARSED flag To check whether an entity was already parsed, the code previously tested whether "checked" was non-zero or "children" was non-null. The "children" check could be unreliable because an empty entity also results in an empty (NULL) node list. Use a separate flag to make this check more reliable.
Alex Richardson 4b959ee1 2022-12-01T13:23:09 Remove hacky heuristic from b2dc5675e94aa6b5557ba63f7d66b0f08dd17e4d Checking whether the context is close to the parent context by hardcoding 250 is not portable (I noticed tests were failing on Morello since the value is 288 there due to pointers being 128 bits). Instead we should ensure that the XML_VCTXT_USE_PCTXT flag is not set in cases where the user data is not actually a parser context (or ideally add a separate field but that would be an ABI break. From what I can see in the source, the XML_VCTXT_USE_PCTXT is only set if the userData field points to a valid context, and if this is not the case the flag should be cleared when changing userData rather than relying on the offset between the two. Looking at the history, I think d7cb33cf44aa688f24215c9cd398c1a26f0d25ff fixed most of the need for this workaround, but it looks like there are a few more locations that need updating; This commit changes two more places to set/clear/copy the XML_VCTXT_USE_PCTXT flag, so this heuristic should not be needed anymore. I've also drop two = NULL assignment in xmllint since this is not needed after a call to memset(). There was also an uninitialized vctxt.flags (and other fields) in `xmlShellValidate()`, which I've fixed by adding a memset() call.
Alex Richardson c62c0d82 2022-12-01T12:58:11 Correctly relocate internal pointers after realloc() Adding an offset to a deallocated pointer and assuming that it can be dereferenced is undefined behaviour. When running libxml2 on CHERI-enabled systems such as Arm Morello this results in the creation of an out-of-bounds pointer that cannot be dereferenced and therefore crashes at runtime. The effect of this UB is not just limited to architectures such as CHERI, incorrect relocation of pointers after realloc can in fact cause FORTIFY_SOURCE errors with recent GCC: https://developers.redhat.com/articles/2022/09/17/gccs-new-fortification-level
Nick Wellnhofer c16fd705 2022-11-25T14:52:37 xpath: Make init function private
Nick Wellnhofer 53ab3840 2022-11-25T14:26:59 encoding: Make init function private
Nick Wellnhofer 05c3a458 2022-11-25T14:15:43 tests: Check that xmlInitParser doesn't allocate memory
Nick Wellnhofer 78c0391b 2022-11-25T13:55:39 parser: Register atexit handler in locked section
Nick Wellnhofer ed053c50 2022-11-25T12:27:14 dict: Make init/cleanup functions private
Nick Wellnhofer 7010d877 2022-11-25T12:06:27 threads: Rework initialization Make init/cleanup functions private. Merge xmlOnceInit into xmlInitThreadsInternal.
Nick Wellnhofer 9dbf1374 2022-11-24T20:52:57 parser: Make some module init/cleanup functions private
Nick Wellnhofer cecd364d 2022-11-24T16:38:47 parser: Don't call *DefaultSAXHandlerInit from xmlInitParser Change the default handler definitions to match the result after calling the initialization functions. This makes sure that no thread-local variables are accessed when calling xmlInitParser.
Nick Wellnhofer b1f9c193 2022-11-22T21:39:01 parser: Fix push parser with unterminated CDATA sections Short-lived regression found by OSS-Fuzz.
Nick Wellnhofer 0e193f0d 2022-11-21T22:09:19 parser: Remove dangerous check in xmlParseCharData If this check succeeds, xmlParseCharData could be called over and over again without making progress, resulting in an infinite loop. It's only important to check for XML_PARSER_EOF which is done later. Related to #441.
Nick Wellnhofer 94ca36c2 2022-11-21T22:07:11 parser: Restore parser state in xmlParseCDSect Fixes #441.
Nick Wellnhofer a8b31e68 2022-11-21T21:35:01 parser: Fix progress check when parsing character data Skip over zero bytes to guarantee progress. Short-lived regression.
Nick Wellnhofer c63900fb 2022-11-21T20:11:35 parser: Check terminate flag when push parsing CDATA sections Found by OSS-Fuzz.
Nick Wellnhofer a781ee33 2022-11-21T20:10:42 Revert "parser: Add overflow checks to xmlParseLookup functions" This reverts commit bfc55d688427972d093be010a8c2ef265375fcb2. It's better to fix the root cause.
Nick Wellnhofer bfc55d68 2022-11-21T18:29:54 parser: Add overflow checks to xmlParseLookup functions Short-lived regression found by OSS-Fuzz.
Nick Wellnhofer 9e4a46ac 2022-11-20T22:03:08 parser: Merge misc, prolog and epilog cases in push parser
Nick Wellnhofer 55fb8f72 2022-11-20T15:35:49 parser: Fix push parser with 1-3 byte initial chunk Make sure that ctxt->charset is initialized properly.
Nick Wellnhofer 68a6518c 2022-11-15T18:23:33 parser: Rewrite push parser boundary checks Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.
Nick Wellnhofer 2059df53 2022-11-14T22:27:58 buf: Deprecate static/immutable buffers
Nick Wellnhofer 4955e0c9 2022-11-14T20:16:22 io: Don't shrink memory input buffers
Nick Wellnhofer 117bab22 2022-11-14T20:15:59 parser: Don't call xmlSHRINK from push parser xmlSHRINK also calls xmlParserInputGrow which isn't needed in the push parser.
Nick Wellnhofer f00739c1 2022-11-14T00:18:39 parser: Ignore cdata argument in xmlParseCharData It never could be used to parse CDATA sections.
Nick Wellnhofer e4f56a72 2022-11-13T23:42:10 parser: Simplify xmlParseConditionalSections
Nick Wellnhofer 3582b07b 2022-11-13T22:57:32 parser: Fix content parser progress checks This is another attempt at fixing parser progress checks. Instead of relying on in->consumed, which could overflow, change some content parser functions to make guaranteed progress on certain byte sequences.
Nick Wellnhofer f7ad338e 2022-11-13T21:59:23 parser: Fix attribute parser progress checks This is another attempt at fixing parser progress checks. Instead of relying on in->consumed, which could overflow, make the attribute parser functions return a NULL name only if they don't make progress.
Nick Wellnhofer f61b8a62 2022-11-13T21:47:03 parser: Fix DTD parser progress checks This is another attempt at fixing parser progress checks. Instead of relying on in->consumed, which could overflow, change some DTD parser functions to make guaranteed progress on certain byte sequences.
Nick Wellnhofer 46cd7d22 2022-11-13T16:30:46 io: Remove xmlInputReadCallbackNop In some cases, for example when using encoders, the read callback was set to NULL, in other cases it was set to xmlInputReadCallbackNop. xmlGROW only tested for xmlInputReadCallbackNop, resulting in errors when parsing large encoded content from memory. Always use a NULL callback for memory buffers to avoid ambiguities. Fixes #262.
Nick Wellnhofer a70f7d47 2022-11-04T14:03:31 parser: Fix error message in xmlParseCommentComplex Fixes #421.
Nick Wellnhofer afc7e3a7 2022-11-02T16:11:00 malloc-fail: Fix memory leak in xmlParseReference Found with libFuzzer, see #344.
Nick Wellnhofer e129c1d1 2022-11-02T16:02:39 malloc-fail: Fix infinite loop in xmlSkipBlankChars Found with libFuzzer, see #344.
Nick Wellnhofer 865e142c 2022-11-02T15:46:11 malloc-fail: Fix memory leak in xmlCreatePushParserCtxt Found with libFuzzer, see #344.
Nick Wellnhofer ffaec758 2022-08-25T17:43:08 Fix integer overflows with XML_PARSE_HUGE Also impose size limits when XML_PARSE_HUGE is set. Limit size of names to XML_MAX_TEXT_LENGTH (10 million bytes) and other content to XML_MAX_HUGE_LENGTH (1 billion bytes). Move some the length checks to the end of the respective loop to make them strict. xmlParseEntityValue didn't have a length limitation at all. But without XML_PARSE_HUGE, this should eventually trigger an error in xmlGROW. Thanks to Maddie Stone working with Google Project Zero for the report!
Nick Wellnhofer 1a2d8ddc 2022-10-11T13:02:47 parser: Fix potential memory leak in xmlParseAttValueInternal Fix memory leak in case xmlParseAttValueInternal is called with a NULL `len` a non-NULL `alloc` argument. This static function is never called with such arguments internally, but the misleading code should be fixed nevertheless. Fixes #422.
Nick Wellnhofer a9669679 2022-09-09T01:44:00 error: Don't use initGenericErrorDefaultFunc The code in xmlInitParser did only set the error handler if it was NULL which should never happen.
Nick Wellnhofer 59f2f60e 2022-09-02T00:27:57 Remove "runtime debugging" This doesn't seem useful as configuration option.
Nick Wellnhofer 884e142d 2022-09-01T22:44:02 Fix --with-schemas --without-xpath build xmlXPathInit must be called for schemas.
Nick Wellnhofer 6843fc72 2022-09-01T02:58:00 Remove or annotate char casts
Nick Wellnhofer 2cac6269 2022-09-01T03:14:13 Don't use sizeof(xmlChar) or sizeof(char)
Nick Wellnhofer ad338ca7 2022-09-01T01:18:30 Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.
Nick Wellnhofer 0f568c0b 2022-08-26T01:22:33 Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.
Nick Wellnhofer 48f84ea8 2022-08-25T21:31:08 Remove internal macros from parserInternals.h Replace MOVETO_ENDTAG with code that updates line and column numbers.
Nick Wellnhofer 58fc89e8 2022-08-25T20:57:30 Deprecate internal parser functions
Nick Wellnhofer 34a050cd 2022-08-24T16:35:58 Move some HTML functions to correct header file
Nick Wellnhofer fd85b566 2022-08-24T15:12:24 Mark more parser functions as deprecated No compiler warnings generated yet.
Nick Wellnhofer 0e49f882 2022-08-24T05:25:37 Mark most SAX1 functions as deprecated No compiler warnings generated yet.
Nick Wellnhofer 9a82b94a 2022-08-24T04:21:58 Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt Add API functions to create a parser context with a custom SAX handler without having to mess with ctxt->sax manually.
Nick Wellnhofer 5b2d07a7 2022-08-20T17:00:50 Use xmlStrlen in *CtxtReadDoc xmlStrlen handles buffers larger than INT_MAX more gracefully.
Nick Wellnhofer 4ad71c2d 2022-08-20T16:19:34 Fix xmlCtxtReadDoc with encoding xmlCtxtReadDoc used to create an input stream involving xmlNewStringInputStream. This would create a stream without an input buffer, causing problems with encodings (see #34). After commit aab584dc3, an error was returned even with UTF-8 encodings which happened to work before. Make xmlCtxtReadDoc call xmlCtxtReadMemory which doesn't suffer from these issues. Also fix htmlCtxtReadDoc. Fixes #397.
Nick Wellnhofer 5930fe01 2022-07-18T20:59:45 Reset nsNr in xmlCtxtReset
Nick Wellnhofer ca2c91f1 2022-06-28T19:24:14 Fix memory leak in xmlLoadEntityContent error path Free the input stream if pushing it fails. Found by OSS-Fuzz. https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43743
Nick Wellnhofer ecba4cbd 2022-06-28T19:22:31 Avoid double-free if malloc fails in inputPush It's the caller's responsibility to free the input stream if this function fails.
Nick Wellnhofer 3e7b4f37 2022-05-20T23:28:25 Avoid calling xmlSetTreeDoc Create text nodes with xmlNewDocText or set the document directly to avoid xmlSetTreeDoc being called when the node is inserted.
David Kilzer 44e9118c 2022-04-08T12:33:17 Prevent integer-overflow in htmlSkipBlankChars() and xmlSkipBlankChars() * HTMLparser.c: (htmlSkipBlankChars): * parser.c: (xmlSkipBlankChars): - Cap the return value at INT_MAX. - The commit range that OSS-Fuzz listed for the fix didn't make any changes to xmlSkipBlankChars(), so it seems like this issue may still exist. Found by OSS-Fuzz Issue 44803.
David Kilzer 21561e83 2016-05-20T15:21:43 Mark more static data as `const` Similar to 8f5710379, mark more static data structures with `const` keyword. Also fix placement of `const` in encoding.c. Original patch by Sarah Wilkin.
Nick Wellnhofer 92bff866 2022-03-29T14:18:31 Fix calls to deprecated init/cleanup functions Only use xmlInitParser/xmlCleanupParser.
Nick Wellnhofer 96849544 2022-03-22T19:10:51 Revert "Continue to parse entity refs in recovery mode" This reverts commit 84823b86344fb530790a8787b80abf62715ea885 which exposed several other, potentially serious bugs. Fixes #356.
Nick Wellnhofer 7d02c729 2022-03-06T00:49:02 Fix parser progress checks Testing the current input pointer for modification is unreliable since the input buffer could have been freed and realloced. Check whether the input id and the up-to-date number of bytes consumed match.
Nick Wellnhofer 84823b86 2022-03-05T22:48:11 Continue to parse entity refs in recovery mode There doesn't seem to be a good reason to abort in xmlParseReference if a well-formedness error was detected. Removing this check allows to parse entity references after an error in recovery mode. Fixes #270.
Nick Wellnhofer d99ddd9b 2022-03-05T21:46:40 Improve buffer allocation scheme In most places, we really need the double-it scheme to avoid quadratic behavior. The hybrid scheme still can cause many reallocations and the bounded scheme doesn't seem to provide meaningful protection in xmlreader.c.
Nick Wellnhofer ebb17970 2022-03-04T02:31:59 Remove unneeded #includes
Nick Wellnhofer 776d15d3 2022-03-02T00:29:17 Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h
Nick Wellnhofer 89d9ef3e 2022-03-01T15:14:00 Reset last error in xmlCleanupGlobals Before, we tried to reset the last error in xmlCleanupParser. But if xmlCleanupParser wasn't called from the main thread, this would reset the thread-local error object. xmlCleanupGlobals has access to the error object of the main thread and can reset it reliably.
Nick Wellnhofer 2489c1d0 2022-02-28T22:42:10 Remove useless __CYGWIN__ checks From what I can tell, some really early Cygwin versions from around 1998-2000 used to erroneously define _WIN32. This was eventually fixed, but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is unnecessary. Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether to use __declspec.
Nick Wellnhofer c41bc10d 2022-02-22T19:57:12 Fix unused variable warnings with disabled features
Nick Wellnhofer 346c3a93 2022-02-20T18:46:42 Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.
Nick Wellnhofer 9edc20c1 2022-02-07T20:38:30 Fix double counting of CRLF in comments Fixes #151.
Nick Wellnhofer 96535657 2022-02-07T15:26:33 Make sure to grow input buffer in xmlParseMisc Otherwise, large amount of whitespace could lead to documents not being parsed correctly. Fixes #299.
Nick Wellnhofer d85245f9 2022-01-16T21:39:04 Fix regression with PEs in external DTD Fix a regression introduced with commit a28f7d87. In some cases, parameter entity references in external DTDs wouldn't be expanded. Fixes #306.
Yulin Li 46c658b0 2021-08-06T08:48:24 move current position before possible calling of ctxt->sax->characters.
David King fe564967 2021-07-14T14:35:17 Fix memory leak in xmlCreateIOParserCtxt Found by Coverity. https://bugzilla.redhat.com/show_bug.cgi?id=1938806
Mike Dalessio a7b9f3eb 2021-05-20T13:38:54 fix: avoid segfault at exit when using custom memory functions This extends the fix introduced by 956534e to Windows processes dynamically loading libxml2. Closes #256.
Daniel Veillard 8598060b 2021-05-13T14:55:12 Patch for security issue CVE-2021-3541 This is relapted to parameter entities expansion and following the line of the billion laugh attack. Somehow in that path the counting of parameters was missed and the normal algorithm based on entities "density" was useless.