parser.c


Log

Author Commit Date CI Message
Nick Wellnhofer 3ffcc03b 2023-03-13T19:38:41 parser: Deprecate more internal functions
Nick Wellnhofer 250faf3c 2023-04-20T12:35:21 parser: Fix regression in xmlParserNodeInfo accounting Commit 62150ed2 broke begin_pos and begin_line when extra node info was recorded. Fixes #523.
Nick Wellnhofer 9282b084 2023-04-19T21:55:24 parser: Fix regression in memory pull parser with encoding Revert another change from commit 98840d40. Decode the whole buffer when reading from memory and switching to the initial encoding. Add some comments about potential improvements.
David Kilzer 86105c04 2023-04-15T18:04:03 Fix use-after-free in xmlParseContentInternal() * parser.c: (xmlParseCharData): - Check if the parser has stopped before advancing `ctxt->input->cur`. This only occurs if a custom SAX error handler calls xmlStopParser() on fatal errors. Fixes #518.
Nick Wellnhofer b4d46cee 2023-04-12T15:10:01 parser: Remove first line handling in xmlParseChunk After reworking EBCDIC detection, this isn't necessary.
Nick Wellnhofer 98840d40 2023-03-21T19:07:12 parser: Rework EBCDIC code page detection To detect EBCDIC code pages, we used to switch the encoding twice and had to be very careful not to decode data after the XML declaration before the second switch. This relied on a hard-coded expected size of the XML declaration and was complicated and unreliable. Now we convert the first 200 bytes to EBCDIC-US and parse the encoding declaration manually.
Nick Wellnhofer 3eb9f5ca 2023-03-21T13:19:31 parser: Limit name length in xmlParseEncName
Nick Wellnhofer 04d1bedd 2023-03-21T13:08:44 parser: Rework shrinking of input buffers Don't try to grow the input buffer in xmlParserShrink. This makes sure that no memory allocations are made and the function always succeeds. Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of DTD parsing loops. Shrink before growing.
Nick Wellnhofer 067986fa 2023-03-18T14:44:28 parser: Fix regressions from previous commits - Fix memory leak in xmlParseNmtoken. - Fix buffer overread after htmlParseCharDataInternal.
Nick Wellnhofer 3e85d7b7 2023-03-17T13:15:35 parser: Rely on CUR_CHAR/NEXT to grow the input buffer The input buffer is now grown reliably when calling CUR_CHAR (xmlCurrentChar) or NEXT (xmlNextChar). This allows to remove many other invocations of GROW.
Nick Wellnhofer c81d0d04 2023-03-17T12:39:35 malloc-fail: Add more error checks when parsing names xmlParseName and similar functions must return NULL if an error occurs. Found by OSS-Fuzz, see #344.
Nick Wellnhofer b167c731 2023-03-14T14:42:36 parser: Fix short-lived regression causing infinite loops Fix 3eb6bf03. We really have to halt the parser, so the input buffer gets reset.
Nick Wellnhofer 2099441f 2023-03-13T17:51:13 parser: Stop calling xmlParserInputShrink Introduce xmlParserShrink which takes a parser context to simplify error handling.
Nick Wellnhofer cabde70f 2023-03-12T19:07:23 parser: Simplify calculation of available buffer space
Nick Wellnhofer b75976e0 2023-03-12T19:06:19 parser: Use size_t when subtracting input buffer pointers Avoid integer overflows.
Nick Wellnhofer 9a6ca816 2023-03-12T19:03:11 parser: Check for integer overflow when updating checkIndex Unfortunately, checkIndex is a long, not a size_t. Check for integer overflow before updating the value.
Nick Wellnhofer bd63d730 2023-03-12T17:40:55 html: Impose some length limits Impose length limits on names, attribute values, PIs and comments, similar to the XML parser.
Nick Wellnhofer 3eb6bf03 2023-03-12T16:47:15 parser: Stop calling xmlParserInputGrow Introduce xmlParserGrow which takes a parser context to simplify error handling.
Nick Wellnhofer 207ebdfd 2023-03-12T14:43:01 malloc-fail: Fix out-of-bounds read in xmlGROW Short-lived regression from 56cc2211.
Nick Wellnhofer 56cc2211 2023-03-09T22:27:58 parser: Merge xmlParserInputGrow into xmlGROW Simplifies the code and makes error handling easier.
Nick Wellnhofer 14604a44 2023-03-09T22:10:44 malloc-fail: Fix out-of-bounds read in xmlCurrentChar Found by OSS-Fuzz.
Nick Wellnhofer 3f69fc80 2023-03-08T13:58:49 parser: Tighten expansion limits - Lower the amount of expansion which is always allowed from 10MB to 1MB. - Lower the maximum amplification factor from 10 to 5. - Lower the "fixed cost" from 50 to 20.
Nick Wellnhofer 5d55315e 2023-02-18T17:29:07 parser: Fix OOB read when formatting error message Don't try to print characters beyond the end of the buffer. Found by OSS-Fuzz.
Nick Wellnhofer f8852184 2023-02-14T13:03:13 malloc-fail: Fix memory leak in xmlParseEntityDecl Found with libFuzzer, see #344.
Nick Wellnhofer e6d22f92 2023-01-23T01:48:37 malloc-fail: Fix reallocation in inputPush Store xmlRealloc result in temporary variable to avoid null deref in error handler. Found with libFuzzer, see #344.
Nick Wellnhofer 6fd89041 2023-01-22T19:42:41 malloc-fail: Fix use-after-free in xmlParseStartTag2 Fix error handling in xmlCtxtGrowAttrs. Found with libFuzzer, see #344.
Nick Wellnhofer d1b87856 2023-01-22T17:42:09 malloc-fail: Fix infinite loop in xmlParseTextDecl Memory errors can set `instate` to `XML_PARSER_EOF` which results in `NEXT` making no progress. Found with libFuzzer, see #344.
Nick Wellnhofer bd9de3a3 2023-01-22T16:52:39 malloc-fail: Fix null deref in xmlAddDefAttrs Found with libFuzzer, see #344.
Nick Wellnhofer 33d4a0fe 2023-01-22T15:41:00 parser: Fix progress check in xmlParseExternalSubset Avoid infinite loop. Short-lived regression from f61b8a62. Found with libFuzzer.
Nick Wellnhofer 74aa61e0 2023-01-22T13:09:03 parser: Halt parser on DTD errors If we try to continue parsing after an error in the internal or external subset, entity expansion accounting gets more complicated. Simply halt the parser. Found with libFuzzer.
Nick Wellnhofer d320a683 2023-01-17T13:50:51 parser: Fix entity check in attributes Don't set the "checked" flag when checking entities in default attribute values. These entities could reference other entities which weren't defined yet, so the check isn't reliable. This fixes a short-lived regression which could lead to a call stack overflow later in xmlStringGetNodeList.
Nick Wellnhofer 59b33661 2022-12-27T14:15:51 error: Limit number of parser errors Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.
Nick Wellnhofer 66e9fd66 2022-12-25T21:26:17 parser: Fix infinite loop with push parser in recovery mode Short-lived regression from commit b1f9c193. Found by OSS-Fuzz.
Nick Wellnhofer 49b54d7e 2022-12-25T15:06:51 parser: Fix null deref in xmlStringDecodeEntitiesInt Short-lived regression.
Nick Wellnhofer 1865668b 2022-12-23T22:44:40 parser: Fix accounting of consumed input bytes Only add consumed bytes if - we're not parsing an entity - we're parsing external parameter entities for the first time. Always ignore internal parameter entities.
Nick Wellnhofer bc18f4a6 2022-12-23T21:55:38 parser: Lower entity nesting limit with XML_PARSE_HUGE The old limit of 1024 could lead to excessively deep call stacks. This could probably be set much lower without causing issues.
Nick Wellnhofer dd62e541 2022-12-23T21:53:30 parser: Don't increase depth twice when parsing internal entities Fix xmlParseBalancedChunkMemoryInternal.
Nick Wellnhofer a41b09c7 2022-12-23T21:29:28 parser: Improve detection of entity loops Set a flag to detect entity loops at once instead of processing until the depth limit is exceeded.
Nick Wellnhofer d972393f 2022-12-23T21:01:20 parser: Only report a single entity error Don't report errors multiple times for nested entity references.
Nick Wellnhofer 077df27e 2022-12-22T15:22:01 parser: Fix integer overflow of input ID Applies a patch from Chromium. Also stop incrementing input ID of subcontexts. This isn't necessary. Fixes #465.
David Kilzer 0bd4e4e0 2022-12-21T19:21:30 xmlParseStartTag2() contains typo when checking for default definitions for an attribute in a namespace * parser.c: (xmlParseStartTag2): - Fix index into defaults->values. It is only correct the first time through the loop when i == 0. Fixes #467.
Nick Wellnhofer b47ebf04 2022-12-21T00:02:47 parser: Deprecate xmlString*DecodeEntities These are internal functions.
Nick Wellnhofer ec6633af 2022-12-20T03:09:11 parser: Remove useless ent->etype test in xmlParseReference If ent->etype is invalid, ret can't equal XML_ERR_OK.
Nick Wellnhofer 7ee7f036 2022-12-20T02:06:38 parser: Remove useless ent->children tests in xmlParseReference The if-block before always returns if ent->children == NULL.
Nick Wellnhofer ce76ebfd 2022-12-19T20:56:23 entities: Stop counting entities This was only used in the old version of xmlParserEntityCheck.
Nick Wellnhofer a3c8b180 2022-12-19T20:51:52 entities: Add entity flag for loop check
Nick Wellnhofer 463bbeec 2022-12-19T18:39:45 entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.
Nick Wellnhofer 7e3f469b 2022-12-19T15:59:49 entities: Use flags to store '<' check results Instead of abusing the LSB of the "checked" member, store the result of testing for occurrence of '<' character in "flags". Also use the flags in xmlParseStringEntityRef instead of rescanning every time.
Nick Wellnhofer 481d79d4 2022-12-19T15:26:46 entities: Add XML_ENT_PARSED flag To check whether an entity was already parsed, the code previously tested whether "checked" was non-zero or "children" was non-null. The "children" check could be unreliable because an empty entity also results in an empty (NULL) node list. Use a separate flag to make this check more reliable.
Alex Richardson 4b959ee1 2022-12-01T13:23:09 Remove hacky heuristic from b2dc5675e94aa6b5557ba63f7d66b0f08dd17e4d Checking whether the context is close to the parent context by hardcoding 250 is not portable (I noticed tests were failing on Morello since the value is 288 there due to pointers being 128 bits). Instead we should ensure that the XML_VCTXT_USE_PCTXT flag is not set in cases where the user data is not actually a parser context (or ideally add a separate field but that would be an ABI break. From what I can see in the source, the XML_VCTXT_USE_PCTXT is only set if the userData field points to a valid context, and if this is not the case the flag should be cleared when changing userData rather than relying on the offset between the two. Looking at the history, I think d7cb33cf44aa688f24215c9cd398c1a26f0d25ff fixed most of the need for this workaround, but it looks like there are a few more locations that need updating; This commit changes two more places to set/clear/copy the XML_VCTXT_USE_PCTXT flag, so this heuristic should not be needed anymore. I've also drop two = NULL assignment in xmllint since this is not needed after a call to memset(). There was also an uninitialized vctxt.flags (and other fields) in `xmlShellValidate()`, which I've fixed by adding a memset() call.
Alex Richardson c62c0d82 2022-12-01T12:58:11 Correctly relocate internal pointers after realloc() Adding an offset to a deallocated pointer and assuming that it can be dereferenced is undefined behaviour. When running libxml2 on CHERI-enabled systems such as Arm Morello this results in the creation of an out-of-bounds pointer that cannot be dereferenced and therefore crashes at runtime. The effect of this UB is not just limited to architectures such as CHERI, incorrect relocation of pointers after realloc can in fact cause FORTIFY_SOURCE errors with recent GCC: https://developers.redhat.com/articles/2022/09/17/gccs-new-fortification-level
Nick Wellnhofer 53ab3840 2022-11-25T14:26:59 encoding: Make init function private
Nick Wellnhofer 05c3a458 2022-11-25T14:15:43 tests: Check that xmlInitParser doesn't allocate memory
Nick Wellnhofer c16fd705 2022-11-25T14:52:37 xpath: Make init function private
Nick Wellnhofer 78c0391b 2022-11-25T13:55:39 parser: Register atexit handler in locked section
Nick Wellnhofer ed053c50 2022-11-25T12:27:14 dict: Make init/cleanup functions private
Nick Wellnhofer 7010d877 2022-11-25T12:06:27 threads: Rework initialization Make init/cleanup functions private. Merge xmlOnceInit into xmlInitThreadsInternal.
Nick Wellnhofer 9dbf1374 2022-11-24T20:52:57 parser: Make some module init/cleanup functions private
Nick Wellnhofer cecd364d 2022-11-24T16:38:47 parser: Don't call *DefaultSAXHandlerInit from xmlInitParser Change the default handler definitions to match the result after calling the initialization functions. This makes sure that no thread-local variables are accessed when calling xmlInitParser.
Nick Wellnhofer b1f9c193 2022-11-22T21:39:01 parser: Fix push parser with unterminated CDATA sections Short-lived regression found by OSS-Fuzz.
Nick Wellnhofer 0e193f0d 2022-11-21T22:09:19 parser: Remove dangerous check in xmlParseCharData If this check succeeds, xmlParseCharData could be called over and over again without making progress, resulting in an infinite loop. It's only important to check for XML_PARSER_EOF which is done later. Related to #441.
Nick Wellnhofer 94ca36c2 2022-11-21T22:07:11 parser: Restore parser state in xmlParseCDSect Fixes #441.
Nick Wellnhofer a8b31e68 2022-11-21T21:35:01 parser: Fix progress check when parsing character data Skip over zero bytes to guarantee progress. Short-lived regression.
Nick Wellnhofer c63900fb 2022-11-21T20:11:35 parser: Check terminate flag when push parsing CDATA sections Found by OSS-Fuzz.
Nick Wellnhofer a781ee33 2022-11-21T20:10:42 Revert "parser: Add overflow checks to xmlParseLookup functions" This reverts commit bfc55d688427972d093be010a8c2ef265375fcb2. It's better to fix the root cause.
Nick Wellnhofer bfc55d68 2022-11-21T18:29:54 parser: Add overflow checks to xmlParseLookup functions Short-lived regression found by OSS-Fuzz.
Nick Wellnhofer 9e4a46ac 2022-11-20T22:03:08 parser: Merge misc, prolog and epilog cases in push parser
Nick Wellnhofer 55fb8f72 2022-11-20T15:35:49 parser: Fix push parser with 1-3 byte initial chunk Make sure that ctxt->charset is initialized properly.
Nick Wellnhofer 68a6518c 2022-11-15T18:23:33 parser: Rewrite push parser boundary checks Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.
Nick Wellnhofer 2059df53 2022-11-14T22:27:58 buf: Deprecate static/immutable buffers
Nick Wellnhofer 4955e0c9 2022-11-14T20:16:22 io: Don't shrink memory input buffers
Nick Wellnhofer 117bab22 2022-11-14T20:15:59 parser: Don't call xmlSHRINK from push parser xmlSHRINK also calls xmlParserInputGrow which isn't needed in the push parser.
Nick Wellnhofer f00739c1 2022-11-14T00:18:39 parser: Ignore cdata argument in xmlParseCharData It never could be used to parse CDATA sections.
Nick Wellnhofer e4f56a72 2022-11-13T23:42:10 parser: Simplify xmlParseConditionalSections
Nick Wellnhofer 3582b07b 2022-11-13T22:57:32 parser: Fix content parser progress checks This is another attempt at fixing parser progress checks. Instead of relying on in->consumed, which could overflow, change some content parser functions to make guaranteed progress on certain byte sequences.
Nick Wellnhofer f7ad338e 2022-11-13T21:59:23 parser: Fix attribute parser progress checks This is another attempt at fixing parser progress checks. Instead of relying on in->consumed, which could overflow, make the attribute parser functions return a NULL name only if they don't make progress.
Nick Wellnhofer f61b8a62 2022-11-13T21:47:03 parser: Fix DTD parser progress checks This is another attempt at fixing parser progress checks. Instead of relying on in->consumed, which could overflow, change some DTD parser functions to make guaranteed progress on certain byte sequences.
Nick Wellnhofer 46cd7d22 2022-11-13T16:30:46 io: Remove xmlInputReadCallbackNop In some cases, for example when using encoders, the read callback was set to NULL, in other cases it was set to xmlInputReadCallbackNop. xmlGROW only tested for xmlInputReadCallbackNop, resulting in errors when parsing large encoded content from memory. Always use a NULL callback for memory buffers to avoid ambiguities. Fixes #262.
Nick Wellnhofer a70f7d47 2022-11-04T14:03:31 parser: Fix error message in xmlParseCommentComplex Fixes #421.
Nick Wellnhofer afc7e3a7 2022-11-02T16:11:00 malloc-fail: Fix memory leak in xmlParseReference Found with libFuzzer, see #344.
Nick Wellnhofer e129c1d1 2022-11-02T16:02:39 malloc-fail: Fix infinite loop in xmlSkipBlankChars Found with libFuzzer, see #344.
Nick Wellnhofer 865e142c 2022-11-02T15:46:11 malloc-fail: Fix memory leak in xmlCreatePushParserCtxt Found with libFuzzer, see #344.
Nick Wellnhofer ffaec758 2022-08-25T17:43:08 Fix integer overflows with XML_PARSE_HUGE Also impose size limits when XML_PARSE_HUGE is set. Limit size of names to XML_MAX_TEXT_LENGTH (10 million bytes) and other content to XML_MAX_HUGE_LENGTH (1 billion bytes). Move some the length checks to the end of the respective loop to make them strict. xmlParseEntityValue didn't have a length limitation at all. But without XML_PARSE_HUGE, this should eventually trigger an error in xmlGROW. Thanks to Maddie Stone working with Google Project Zero for the report!
Nick Wellnhofer 1a2d8ddc 2022-10-11T13:02:47 parser: Fix potential memory leak in xmlParseAttValueInternal Fix memory leak in case xmlParseAttValueInternal is called with a NULL `len` a non-NULL `alloc` argument. This static function is never called with such arguments internally, but the misleading code should be fixed nevertheless. Fixes #422.
Nick Wellnhofer a9669679 2022-09-09T01:44:00 error: Don't use initGenericErrorDefaultFunc The code in xmlInitParser did only set the error handler if it was NULL which should never happen.
Nick Wellnhofer 59f2f60e 2022-09-02T00:27:57 Remove "runtime debugging" This doesn't seem useful as configuration option.
Nick Wellnhofer 884e142d 2022-09-01T22:44:02 Fix --with-schemas --without-xpath build xmlXPathInit must be called for schemas.
Nick Wellnhofer 6843fc72 2022-09-01T02:58:00 Remove or annotate char casts
Nick Wellnhofer 2cac6269 2022-09-01T03:14:13 Don't use sizeof(xmlChar) or sizeof(char)
Nick Wellnhofer ad338ca7 2022-09-01T01:18:30 Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.
Nick Wellnhofer 0f568c0b 2022-08-26T01:22:33 Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.
Nick Wellnhofer 48f84ea8 2022-08-25T21:31:08 Remove internal macros from parserInternals.h Replace MOVETO_ENDTAG with code that updates line and column numbers.
Nick Wellnhofer 58fc89e8 2022-08-25T20:57:30 Deprecate internal parser functions
Nick Wellnhofer 34a050cd 2022-08-24T16:35:58 Move some HTML functions to correct header file
Nick Wellnhofer fd85b566 2022-08-24T15:12:24 Mark more parser functions as deprecated No compiler warnings generated yet.
Nick Wellnhofer 0e49f882 2022-08-24T05:25:37 Mark most SAX1 functions as deprecated No compiler warnings generated yet.
Nick Wellnhofer 9a82b94a 2022-08-24T04:21:58 Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt Add API functions to create a parser context with a custom SAX handler without having to mess with ctxt->sax manually.
Nick Wellnhofer 5b2d07a7 2022-08-20T17:00:50 Use xmlStrlen in *CtxtReadDoc xmlStrlen handles buffers larger than INT_MAX more gracefully.
Nick Wellnhofer 4ad71c2d 2022-08-20T16:19:34 Fix xmlCtxtReadDoc with encoding xmlCtxtReadDoc used to create an input stream involving xmlNewStringInputStream. This would create a stream without an input buffer, causing problems with encodings (see #34). After commit aab584dc3, an error was returned even with UTF-8 encodings which happened to work before. Make xmlCtxtReadDoc call xmlCtxtReadMemory which doesn't suffer from these issues. Also fix htmlCtxtReadDoc. Fixes #397.
Nick Wellnhofer 5930fe01 2022-07-18T20:59:45 Reset nsNr in xmlCtxtReset