result


Log

Author Commit Date CI Message
Nick Wellnhofer 45fe9924 2024-04-22T17:12:54 parser: Don't create reference in xmlLookupGeneralEntity This should only be done in xmlParseReference. The handling of undeclared entities is still somewhat inconsistent. In element content we create references even if entity substitution is enabled. In attribute values undeclared entities are always ignored.
Nick Wellnhofer b717abdd 2024-04-22T15:42:39 parser: Consolidate error handling for undeclared entities Always use XML_WAR_UNDECLARED_ENTITY with warning error level in documents with external subset or parameter entities. Use XML_ERR_UNDECLARED_ENTITY otherwise.
Nick Wellnhofer f506ec66 2024-04-15T11:27:44 parser: Always decode entities in namespace URIs Also decode entities in namespace URIs if entity substitution wasn't requested. This should fix some corner cases when comparing namespace URIs. The Namespaces in XML 1.0 spec says: > In a namespace declaration, the URI reference is the normalized value > of the attribute, so replacement of XML character and entity > references has already been done before any comparison. Make the serialization code escape special characters in namespace URIs like in attribute values. This fixes serialization if entities were substituted when parsing. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
Seiya Nakata 5bb84b47 2024-04-04T11:55:28 relaxng: Fix tree corruption in xmlRelaxNGParseNameClass Don't create cycles in tree structure. This will lead to an infinite loop or call stack overflow later. Closes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/711
Nick Wellnhofer f43197fc 2024-03-29T11:16:45 tree: Don't coalesce text nodes in xmlAdd{Prev,Next}Sibling Commit 9e1c72da from 2001 introduced a bug where xmlAddPrevSibling and xmlAddNextSibling would only try to merge text nodes with one of its new siblings. Commit 4ccd3eb8 fixed this bug but unfortunately, lxml and possibly other downstream code depend on text nodes not being merged. To avoid breaking downstream code while still having somewhat consistent API behavior, it's probably best to make these functions never coalesce text nodes.
Nick Wellnhofer 4ccd3eb8 2024-03-11T19:43:56 tree: Refactor node insertion Also fixes a text coalescing bug.
Nick Wellnhofer 186562a1 2024-03-12T19:55:33 parser: Fix detection of duplicate attributes in XML namespace Fixes a regression from commit e0dd330b, resulting in duplicate attributes in the predefined XML namespace not being detected or extraneous default attributes being passed. Fixes #704.
Nick Wellnhofer 63986c45 2024-01-22T21:02:16 parser: Report fatal error if document entity couldn't be loaded Only lower error level when loading entities. Fixes #667.
Nick Wellnhofer 29beef65 2024-01-02T21:50:38 parser: Pop inputs if parsing DTD failed This should provide some statistics in ctxt->sizeentcopy even in the error or recovery case.
Nick Wellnhofer f237e5b9 2024-01-05T15:40:23 parser: Avoid duplicate namespace errors Don't report an extra attribute uniqueness error if a namespace is undeclared. This matches old behavior.
Nick Wellnhofer 07c05546 2024-01-04T02:48:02 error: Make xmlFormatError public This is a useful function to get a verbose error report. Allows to remove duplicated code from runtest.c. Also reactivate check for schema parser failures.
Nick Wellnhofer d0eb5a7e 2024-01-03T18:12:29 parser: Remove xmlErrEncodingInt Convert the last user to xmlFatalErr.
Nick Wellnhofer e8fb3d63 2024-01-02T17:45:54 parser: Convert some "internal errors" to meaningful codes
Nick Wellnhofer 37c6618b 2023-12-30T02:50:34 parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.
Nick Wellnhofer f0dc52d0 2023-12-29T06:00:20 parser: Move cleanup of element stacks to xmlParseContent
Nick Wellnhofer d025cfbb 2023-12-27T03:53:24 parser: Always copy content from entity to target. Make sure that references from IDs are updated. Note that if there are IDs with the same value in a document, the last one will now be returned. IDs should be unique, but maybe this should be addressed.
Nick Wellnhofer 4ecc85d2 2023-12-27T00:44:16 parser: Push general entity input streams on the stack This allows the error handler to give more context.
Nick Wellnhofer d944a415 2023-12-26T02:10:35 parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.
Nick Wellnhofer b8313b58 2023-12-26T21:59:08 xpath: Rewrite substring-before and substring-after Don't use buffers. Check malloc failures.
Nick Wellnhofer f3fa34dc 2023-12-26T22:37:26 parser: Fix general entity parsing Clear namespace database. Ignore non-fatal errors.
Nick Wellnhofer ecfbcc8a 2023-12-25T04:33:00 parser: Rework general entity parsing Don't create a new parser context but reuse the existing one. This exposes bug #601 in a more obvious way.
Nick Wellnhofer 6e3a2ac6 2023-12-22T21:38:50 xinclude: Rework xml:base fixup The xml:base fixup was broken in more complex cases. Also avoid parsing and building the included URI multiple times.
Nick Wellnhofer f0df3e6d 2023-12-21T14:35:18 tests: Try to fix RelaxNG test cases These were added recently in ea695ac0 and 8074b881 but were a total mess of symbolic links and apparently mixed up files. Symbolic links don't work on Windows. Try to salvage one of the tests.
Nick Wellnhofer 8d0aaf4b 2023-12-19T20:47:36 parser: Remove xmlErrEncoding Use xmlFatalErr or xmlCtxtErrIO.
Nick Wellnhofer 7e511f35 2023-12-19T15:41:37 io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile This allows to report the reason why opening a file failed to the parser context and improve error messages. Now we can also remove the stat call before opening a file.
Nick Wellnhofer 83c6aeef 2023-12-18T21:12:29 relaxng: Improve error handling Pass RelaxNG structured error handler to XML parser. Handle malloc failure from xmlRaiseError. Remove argument from memory error handler. Use xmlRaiseMemoryError. Don't use xmlGenericError. Remove TODO macro.
Nick Wellnhofer 157df344 2023-12-10T18:23:53 xmlreader: Report malloc failures Fix many places where malloc failures aren't reported. Introduce a new API function xmlTextReaderGetLastError.
Nick Wellnhofer e58ea29f 2023-12-10T18:10:42 SAX2: Report malloc failures Fix many places where malloc failures aren't reported. Improve error handling when parsing entity declarations. Fixes #308.
Nick Wellnhofer a1f7ecae 2023-12-10T15:25:42 entities: Report malloc failures Fix places where malloc failures aren't reported. Introduce new API function xmlAddEntity that returns separate error codes. Don't invoke global error handler for low-level errors which should be handled by higher layers. Invalid redelcaration warnings will be fixed later.
Nick Wellnhofer 7d446e97 2023-12-08T12:13:49 parser: Fix namespaces redefined from default attributes This regressed in commit e0dd330b. Also fixes a long-standing issue where namespaces from default attributes weren't added if they match an existing namespace. Fixes #643.
Nick Wellnhofer e3959461 2023-11-30T16:15:46 html: Reenable buggy detection of XML declarations Switch to UTF-8 if a document starts with '<?xm' to match old behavior. Also enable this check in the push parser. Fixes #637.
Nick Wellnhofer 43b511fa 2023-11-26T14:31:39 parser: Make CRLF increment line number Partial revert of cb927e85 fixing CRLFs not incrementing the line number. This requires to rework xmlParseQNameHashed. The original implementation prompted the change to xmlCurrentChar which really shouldn't modify the 'cur' pointer as side effect. But the NEXTL macro relies on this behavior. Ultimately, we should reintroduce the change to xmlCurrentChar and fix the NEXTL macro. This will lead to single CRs incrementing the line number as well which seems more consistent. Fixes #628.
Nick Wellnhofer a2b5c90a 2023-11-21T14:35:54 hash: Fix deletion of entries during scan Functions like xmlCleanSpecialAttr scan a hash table and possibly delete entries in the callback. xmlHashScanFull must detect such deletions and rescan the entry. This regressed when rewriting the hash table code in 4a513d56. Fixes #626.
Nick Wellnhofer 7a2d412f 2023-10-31T20:15:38 parser: Copy default namespace in xmlParseBalancedChunkMemory
Nick Wellnhofer e0c2f14d 2023-10-31T13:53:15 parser: Copy namespaces in xmlParseBalancedChunkMemory Reenable copying of namespaces but don't set SAX data. This should match the old behavior.
Nick Wellnhofer b76d81da 2023-10-06T11:50:29 parser: Fix regression when push parsing parameter entities Short-lived regression from 834b8123. Also shrink parameter entity buffers when push parsing.
Nick Wellnhofer 134d2ad8 2023-10-06T00:31:44 parser: Protect against quadratic default attribute expansion
Nick Wellnhofer 0ba22c05 2023-10-05T22:05:04 parser: Support encoded external PEs in entity values Corner case which was never supported.
Nick Wellnhofer 6337a14a 2023-10-06T10:44:38 tests: Handle entities in SAX tests
Nick Wellnhofer e48f3d8e 2023-09-27T16:47:37 tests: Add more tests for redefined attributes
Nick Wellnhofer a873191c 2023-09-25T14:51:35 parser: Introduce xmlParseQNameHashed
Nick Wellnhofer 53050b1d 2023-08-29T20:06:43 parser: More fixes to push parser error handling
Nick Wellnhofer bbd918b2 2023-08-29T15:56:37 parser: Fix detection of null bytes Also suppress misleading extra errors. Fixes #122.
Nick Wellnhofer c6083a32 2023-08-29T16:30:22 parser: Improve error handling in push parser - Report errors earlier - Align error messages with pull parser
Nick Wellnhofer 855818bd 2023-08-08T15:21:37 parser: Check for truncated multi-byte sequences When decoding input data, check whether the "raw" buffer is empty after parsing the document. Otherwise, the input ends with a truncated multi-byte sequence which shouldn't be silently ignored.
Nick Wellnhofer 0ffc2d82 2023-04-30T20:28:47 runtest: Skip element name in schema error messages This makes sure that memory and streaming tests will report the same messages.
Nick Wellnhofer e4f85f1b 2023-04-07T11:46:35 [CVE-2023-28484] Fix null deref in xmlSchemaFixupComplexType Fix a null pointer dereference when parsing (invalid) XML schemas. Thanks to Robby Simpson for the report! Fixes #491.
David Kilzer cb1b8b85 2023-04-10T13:06:18 xmlValidatePopElement() can return invalid value (-1) Covered by: test/VC/ElementValid5 This only affects XML Reader API with LIBXML_REGEXP_ENABLED and LIBXML_VALID_ENABLED turned on. * result/VC/ElementValid5.rdr: - Update result to add missing error message. * python/tests/reader2.py: * result/VC/ElementValid6.rdr: * result/VC/ElementValid7.rdr: * result/valid/781333.xml.err.rdr: - Update result to fix grammar issue. * valid.c: (xmlValidatePopElement): - Check return value of xmlRegExecPushString() to handle -1, and assign 'ret = 0;' to return 0 from xmlValidatePopElement(). This change affects xmlTextReaderValidatePop() from xmlreader.c. - Fix grammar of error message by changing 'child' to 'children'.
Nick Wellnhofer d7d0bc65 2023-03-31T16:47:48 SAX2: Ignore namespaces in HTML documents In commit 21ca8829, we started to ignore namespaces in HTML element names but we still called xmlSplitQName, effectively stripping the namespace prefix. This would cause elements like <o:p> being parsed as <p>. Now we leave the name untouched. Fixes #508.
Nick Wellnhofer e20f4d7a 2023-02-13T14:38:05 xinclude: Fix quadratic behavior in xmlXIncludeLoadTxt Also make text inclusions work with memory buffers, for example when using a custom entity loader, and fix a memory leak in case of invalid characters. Fixes #483.
Nick Wellnhofer be0ec005 2023-02-03T14:37:49 xinclude: Abort immediately if max depth was exceeded Avoids resource exhaustion if the maximum recursion depth was exceeded. Note that the XInclude engine offers no protection against other "billion laughs"-style amplification attacks as long as they stay below the maximum depth.
Nick Wellnhofer 74aa61e0 2023-01-22T13:09:03 parser: Halt parser on DTD errors If we try to continue parsing after an error in the internal or external subset, entity expansion accounting gets more complicated. Simply halt the parser. Found with libFuzzer.
Nick Wellnhofer 608c65bb 2023-01-18T15:15:41 xpath: number('-') should return NaN Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/81
Nick Wellnhofer d320a683 2023-01-17T13:50:51 parser: Fix entity check in attributes Don't set the "checked" flag when checking entities in default attribute values. These entities could reference other entities which weren't defined yet, so the check isn't reliable. This fixes a short-lived regression which could lead to a call stack overflow later in xmlStringGetNodeList.
Nick Wellnhofer a41b09c7 2022-12-23T21:29:28 parser: Improve detection of entity loops Set a flag to detect entity loops at once instead of processing until the depth limit is exceeded.
Nick Wellnhofer d972393f 2022-12-23T21:01:20 parser: Only report a single entity error Don't report errors multiple times for nested entity references.
Nick Wellnhofer ae0c9cfa 2022-12-12T23:54:39 uri: Fix handling of port numbers Allow port number without host, real fix for #71. Also compare port numbers in xmlBuildRelativeURI. Fix handling of port numbers in xmlUriEscape.
Nick Wellnhofer 76c6da42 2022-12-04T23:01:00 error: Make sure that error messages are valid UTF-8 This has caused issues with the Python bindings for a long time. Should fix #64.
Nick Wellnhofer 9c63cea5 2022-11-20T15:36:41 test: Add test for push parser boundaries
Nick Wellnhofer 68a6518c 2022-11-15T18:23:33 parser: Rewrite push parser boundary checks Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.
Nick Wellnhofer 76d6b0d7 2022-11-14T21:02:15 html: Don't escape ASCII chars in href attributes In several cases, href attributes can contain ASCII characters which are illegal in URIs. Escaping them often does more harm than good. Fixes #321.
Nick Wellnhofer f61b8a62 2022-11-13T21:47:03 parser: Fix DTD parser progress checks This is another attempt at fixing parser progress checks. Instead of relying on in->consumed, which could overflow, change some DTD parser functions to make guaranteed progress on certain byte sequences.
Nick Wellnhofer b456e3bb 2022-10-30T20:28:20 xinclude: Always allow XPtr expressions in external documents
Nick Wellnhofer eef0a739 2022-10-30T12:21:20 xinclude: Implement "streaming" mode When using xmlreader, XPointer expressions in XIncludes simply cannot work. Expressions can reference nodes which weren't parsed yet or which were already deleted. After fixing nested XIncludes, we reference includes which were parsed previously. When streaming, these nodes could have been deleted, leading to use-after-free errors. Disallow XPointer expressions and truncate the include table in streaming mode.
Nick Wellnhofer 20e2fb4c 2022-10-23T17:52:29 xinclude: Avoid creation of subcontexts Don't create subcontext in xmlXIncludeRecurseDoc. Save and restore 'doc' and 'incTab' instead. Make xmlXIncludeLoadFallback call xmlXIncludeCopyNode which seems safer than xmlXIncludeDoProcess since the latter may modify the document. This should also be more performant since we need to copy the whole fallback subtree anyway. Also make sure to avoid replacements in fallback elements in xmlXIncludeDoProcess.
Nick Wellnhofer d2ed1e4f 2022-10-22T16:50:18 xinclude: Limit recursion depth This avoids call stack overflows.
Nick Wellnhofer ea7c9fb5 2022-10-22T16:48:58 xinclude: Don't create result doc for test with errors
Nick Wellnhofer 34496f26 2022-10-22T16:09:21 xinclude: Test for inclusion loops
Nick Wellnhofer bc267cb9 2022-10-22T02:19:22 xinclude: Expand includes in xmlXIncludeCopyNode This should make nested includes work reliably. Fixes #424.
Nick Wellnhofer c99cde3f 2022-10-22T16:59:35 xinclude: Also test error messages The reader interface with XIncludes is somewhat broken and can generate different error messages. Start to move tests which are sketchy with reader to a separate directory.
Nick Wellnhofer 938105b5 2022-10-21T15:56:12 Revert "xinclude: Fix regression with nested includes" This reverts commit 7f04e297318b1b908cec20711f74f75625afed7f which caused memory errors. See #424.
Nick Wellnhofer 7f04e297 2022-10-18T18:40:00 xinclude: Fix regression with nested includes This reverts commits 74dcc10b and 87d20b55. Fixes #424.
Nick Wellnhofer 1d4f5d24 2022-09-13T16:40:31 schemas: Fix null-pointer-deref in xmlSchemaCheckCOSSTDerivedOK Found by OSS-Fuzz.
Nick Wellnhofer c7149792 2022-09-01T23:15:35 Fix --with-valid --without-regexps build This build config resulted in segfaults in 'runtest' because a special xmlElementContentPtr showed up in a few places. I'm not sure if this is the right fix. An error message was changed to conform to the --with-regexps build. There are still a few missing validity errors, so the tests don't pass.
Nick Wellnhofer e986d09c 2022-07-15T14:02:26 Skip incorrectly opened HTML comments Commit 4fd69f3e fixed handling of '<' characters not followed by an ASCII letter. But a '<!' sequence followed by invalid characters should be treated as bogus comment and skipped. Fixes #380.
Nick Wellnhofer 14517012 2022-04-23T19:19:33 Fix parsing of subtracted regex character classes Fixes #370.
Nick Wellnhofer 4612ce30 2022-04-21T03:52:52 Implement xpath1() XPointer scheme See https://www.w3.org/2005/04/xpointer-schemes/
Nick Wellnhofer 41afa89f 2022-04-10T14:09:29 Fix short-lived regression in xmlStaticCopyNode Commit 7618a3b1 didn't account for coalesced text nodes. I think it would be better if xmlStaticCopyNode didn't try to coalesce text nodes at all. This code path can only be triggered if some other code doesn't coalesce text nodes properly. In this case, OSS-Fuzz found such behavior in xinclude.c.
Nick Wellnhofer 4de7f2ac 2022-04-04T03:28:21 Remove unused result files
Nick Wellnhofer f1c32b4c 2020-07-09T03:19:13 Allow missing result files in runtest Treat missing files as empty.
Nick Wellnhofer 95c7f315 2022-04-03T21:39:14 Move SVG tests to runtest.c Also update the test results for the first time since 2000.
Nick Wellnhofer 48b03c84 2022-04-03T20:36:38 Remove major parts of old test suite Remove all the parts of the old test suite which are covered by runtest.c for quite some time. The following test programs are removed: - testC14N - testHTML - testReader - testRelax - testSAX - testSchemas - testURI - testXPath This also removes a few results of unimportant tests only run by the old test suite.
Nick Wellnhofer 57b81c20 2022-03-05T18:20:29 Normalize XPath strings in-place Simplify the code and fix a potential memory leak. Fixes #343.
Nick Wellnhofer bc06a522 2022-03-02T02:57:49 Fix recursion check in xinclude.c Compare the included URL with the document's URL to detect local inclusions. Fixes #348.
Mike Dalessio d7b287b9 2021-07-17T14:36:53 htmlParseComment: handle abruptly-closed comments See guidance provided on abrutply-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment
Mike Dalessio 24cdc890 2021-07-17T14:06:49 test coverage for abruptly-closed comments These establish baseline behavior so that the subsequent commit is clear about the behavior it will modify.
Nick Wellnhofer ea6e8f99 2021-12-20T00:34:58 Fix certain combinations of regex range quantifiers Fix regex transitions that have both min/max and a counter. In this case, we want to save the regex state before incrementing the counter. Fixes #301 and the issue reported here: https://mail.gnome.org/archives/xml/2016-April/msg00017.html
Nick Wellnhofer 382fb056 2021-12-20T00:31:41 Fix range quantifier on subregex Make sure to add counted exit transitions before other counter transitions. Otherwise, we won't backtrack correctly. Fixes #65.
Nick Wellnhofer ce0871e1 2022-02-20T16:44:41 Only warn on invalid redeclarations of predefined entities Downgrade the error message to a warning since the error was ignored, anyway. Also print the name of redeclared entity. For a proper fix that also shows filename and line number of the invalid redeclaration, we'd have to - pass the parser context to the entity functions somehow, or - make these functions return distinct error codes. Partial fix for #308.
Nick Wellnhofer 652dd12a 2022-02-08T03:29:24 [CVE-2022-23308] Use-after-free of ID and IDREF attributes If a document is parsed with XML_PARSE_DTDVALID and without XML_PARSE_NOENT, the value of ID attributes has to be normalized after potentially expanding entities in xmlRemoveID. Otherwise, later calls to xmlGetID can return a pointer to previously freed memory. ID attributes which are empty or contain only whitespace after entity expansion are affected in a similar way. This is fixed by not storing such attributes in the ID table. The test to detect streaming mode when validating against a DTD was broken. In connection with the defects above, this could result in a use-after-free when using the xmlReader interface with validation. Fix detection of streaming mode to avoid similar issues. (This changes the expected result of a test case. But as far as I can tell, using the XML reader with XIncludes referencing the root document never worked properly, anyway.) All of these issues can result in denial of service. Using xmlReader with validation could result in disclosure of memory via the error channel, typically stderr. The security impact of xmlGetID returning a pointer to freed memory depends on the application. The typical use case of calling xmlGetID on an unmodified document is not affected.
Nick Wellnhofer 9edc20c1 2022-02-07T20:38:30 Fix double counting of CRLF in comments Fixes #151.
Nick Wellnhofer 5408c10c 2022-02-04T14:00:09 Don't normalize namespace URIs in XPointer xmlns() scheme Namespace URIs should be compared without escaping or unescaping: https://www.w3.org/TR/REC-xml-names/#NSNameComparison Fixes #289.
Nick Wellnhofer 1c7d91ab 2022-02-03T23:31:19 Fix handling of XSD with empty namespace An empty namespace means no default namespace. Fixes #303.
Nick Wellnhofer f480f750 2022-02-03T14:43:17 Update NewsML DTD in test suite Switch to version 1.2 which has a clearer license. Fixes #291.
Nick Wellnhofer d85245f9 2022-01-16T21:39:04 Fix regression with PEs in external DTD Fix a regression introduced with commit a28f7d87. In some cases, parameter entity references in external DTDs wouldn't be expanded. Fixes #306.
David Kilzer 03bb9293 2021-07-07T18:23:18 Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8(). * encoding.c: (UTF16LEToUTF8): - Fix comment to describe what the code does. (UTF16BEToUTF8): - Fix undefined behavior which was applied to UTF16LEToUTF8() in 2f9382033e. - Add bounds check to while() loop which was applied to UTF16LEToUTF8() in be803967db. - Do not return -2 when (in >= inend) to fix the bug. This was applied to UTF16LEToUTF8() in 496a1cf592. - Inline (<< 8) statements to match UTF16LEToUTF8(). Add the following tests and results: test/text-4-byte-UTF-16-BE-offset.xml test/text-4-byte-UTF-16-BE.xml test/text-4-byte-UTF-16-LE-offset.xml test/text-4-byte-UTF-16-LE.xml
Nick Wellnhofer 2732b234 2022-01-10T13:32:14 Fix regression parsing public IDs literals in HTML Fix regression introduced when reworking htmlParsePubidLiteral in commit 93ce33c2. Fixes #318.
Nick Wellnhofer de5b624f 2021-05-08T20:21:29 Fix handling of unexpected EOF in xmlParseContent Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was removed in commit 62150ed2. This commit also introduced a regression for direct users of xmlParseContent. Unclosed tags weren't checked.
Nick Wellnhofer 3e80560d 2021-05-07T10:51:38 Fix line numbers in error messages for mismatched tags Commit 62150ed2 introduced a small regression in the error messages for mismatched tags. This typically only affected messages after the first mismatch, but with custom SAX handlers all line numbers would be off. This also fixes line numbers in the SAX push parser which were never handled correctly.
Nick Wellnhofer 01411e7c 2021-02-08T20:58:32 Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.