result

Branch


Log

Author Commit Date CI Message
Nick Wellnhofer 149c04c0 2025-08-02T14:59:02 html: Escape < and > when serializing attributes This reverts the change in cdaf657f. Coincidentally, the HTML spec just changed to mandate the old escaping behavior: https://github.com/whatwg/html/issues/6235 Fixes #957.
Nick Wellnhofer 0c948334 2025-07-10T11:23:44 html: Add newline to error message
Nick Wellnhofer bc0bb67b 2025-07-10T11:20:22 html: Don't abort on encoding errors Always enable recovery mode when parsing HTML, so we don't raise fatal errors. Regressed with 462bf0b7. Fixes #947.
Nick Wellnhofer 71e1e8af 2025-07-04T14:28:26 schematron: Fix memory safety issues in xmlSchematronReportOutput Fix use-after-free (CVE-2025-49794) and type confusion (CVE-2025-49796) in xmlSchematronReportOutput. Fixes #931. Fixes #933.
Nick Wellnhofer 24d7e159 2025-07-04T12:19:20 schematron: Complete fix for CVE-2025-49795 - Fix memory leaks - Fix tests
Michael Mann 499bcb78 2025-06-21T12:11:30 Schematron: Fix null pointer dereference leading to DoS (CVE-2025-49795) Fixes #932
Michael Mann 069bcda1 2025-06-20T23:05:00 Fix potential buffer overflows of interactive shell CVE-2025-6170 Fixes #941
Omar Siam 9760a14f 2025-06-30T13:47:33 relaxng: In the simplification step also unlink notAllowed refs from choice This fixes false reports of non allowed content compared to notAllowed as tag within the choice tag.
Nick Wellnhofer ad0f5d27 2025-06-24T13:02:13 tree: Fix xmlGetNodePath - Fix quadratic behavior - Don't truncate names Fixes #715.
Nick Wellnhofer ab06bfa1 2025-05-26T15:03:07 parser: Fix error return in xmlParseElementContentDecl Avoid internal error later in xmlValidBuildAContentModel after 2a60ca06c. Also avoids some unnecessary error messages.
Nick Wellnhofer 5ec83f77 2025-05-20T03:21:27 valid: Remove duplicate #FIXED check for namespaces Unlike the comment indicates, this is already checked.
Nick Wellnhofer 7c10fff2 2025-05-20T22:48:25 valid: Don't validate twice in xmlAddAttributeDecl This should only be done in xmlValidateAttributeDecl.
Nick Wellnhofer 2f3655c9 2025-05-20T19:40:06 parser: Pop PEs that start markup declarations explicitly We currently only handle "Validity constraint: Proper Declaration/PE Nesting", but we must detect "Well-formedness constraint: PE Between Declarations" separately: > The replacement text of a parameter entity reference in a DeclSep must > match the production extSubsetDecl. PEs in DeclSeps are PEs that start with a full markup declaration (or another PE). These are handled in xmParse{Internal|External}Subset. We set a flag on these PEs and don't close them implicitly in xmlSkipBlankCharsPE. This will make unterminated declarations in such PEs cause a parser error. The PEs are closed explicitly in xmParse{Internal|External}Subset, the only location where they are allowed to end.
Nick Wellnhofer dd1961e0 2025-05-20T16:37:18 valid: Skip more validity checks if not validating
Nick Wellnhofer 3a68d0b7 2025-05-19T18:59:51 SAX2: Handle xml:id errors separately
Nick Wellnhofer 87087def 2025-05-13T16:19:42 tests: Remove result files committed by accident
Nick Wellnhofer f0983199 2025-05-12T13:00:20 html: Map some encodings according to HTML5 Windows-1252 is a superset of ISO-8859-1 and should be used instead. Same for ASCII. Also map UCS-2 and UTF-16 to UTF-16LE.
Nick Wellnhofer 825f3a9d 2025-05-11T21:38:16 html: Always serialize attributes with double quotes Align with HTML5.
Nick Wellnhofer cdaf657f 2025-05-09T23:02:32 html: Don't escape < and > when serializing attribute values Align with HTML5. This will break some test suites.
Nick Wellnhofer c8cea39d 2025-05-09T21:31:07 save: Fix serialization of attribute defaults containing &lt; Long-standing bug that produced invalid XML.
Nick Wellnhofer 46f05ea4 2025-05-09T00:21:47 html: Rework meta charset handling Don't use encoding from meta tags when serializing. Only use the value in `doc->encoding`, matching the XML serializer. This is the actual encoding used when parsing. Stop modifying the input document by setting meta tags before serializing. Meta tags are now injected during serialization. Add full support for <meta charset=""> which is also used when adding meta tags. Align with HTML5 and implement the "algorithm for extracting a character encoding from a meta element". Only modify the encoding substring in Content-Type meta tags. Only switch encoding once when parsing. Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading UTF-8 charset. Fixes #909.
Nick Wellnhofer f3a080bc 2025-05-07T14:32:42 html: Ignore U+0000 in body text Align with HTML5. Fixes #908.
Nick Wellnhofer 6896f478 2025-04-18T17:22:36 Revert "valid: Remove duplicate error messages when streaming" This reverts commit cd220b93d8ffffd2fb7cac0ec792bebb7e082521. This commit broke the xmstarlet tests.
Nick Wellnhofer 69b83bb6 2025-03-10T02:18:51 encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.
Nick Wellnhofer 05bd1720 2025-03-01T10:25:29 parser: Fix parsing of DTD content Regressed in 2.11. Fixes #868.
Nick Wellnhofer 9f86dae9 2024-12-15T14:27:05 test: Add test case for UAF in xmlSchemaIDCFillNodeTables
Nick Wellnhofer 8cf6129b 2025-02-13T18:20:46 html: Stop implying <p> start tags Only <html>, <head> or <body> should be implied. Opening extra <p> tags has always been a libxml2 quirk.
Nick Wellnhofer 71122421 2025-02-13T14:04:10 html: Make implied <p> tags more deterministic libxml2's HTML parser adds <p> start tags in some situations. This behavior, which doesn't follow any standard, was added in 2000, see here: http://veillard.com/XML/messages/0655.html Text nodes that only contain whitespace don't imply a <p> tag, but the whitespace check cannot work reliably if we're parsing partial text data which can happen with both pull and push parser. The logic in `areBlanks` is hard to follow. The checks involving `CUR` depend on the position of the input pointer and seem dubious. It's also possible that the behavior changed inadvertently with a later commit. As a result, it's hard to come up with good test cases. We now process leading whitespace before creating implied tags. This is more in line with HTML5 and should avoid at least some issues with partial text data. For example, parsing the string "<head> x" used to result in: <html> <head></head> <body><p> x</p></body> </html> And now results in: <html> <head> </head> <body><p>x</p></body> </html> Except for the implied <p> tag, this matches HTML5.
Nick Wellnhofer b4d3d87e 2025-02-01T22:02:33 parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.
Nick Wellnhofer 08028572 2025-02-01T18:21:47 html: Make data parsing modes work with push parser This can't be solved with a simple scan for a terminator. Instead, we make htmlParseCharData handle incomplete data if the "partial" flag is set.
Nick Wellnhofer cd220b93 2024-12-27T14:55:43 valid: Remove duplicate error messages when streaming
Nick Wellnhofer 45914614 2024-11-05T12:05:14 xpath: Fix parsing of non-ASCII names Fix a long-standing issue where QNames starting with a non-ASCII character would be rejected. This became more visible after "streaming" XPath evaluation was disabled since the latter handled non-ASCII names correctly. Fixes #818.
Nick Wellnhofer ffb058f4 2024-10-28T20:12:52 parser: Fix detection of duplicate attributes We really need a second scan if more than one namespace clash was detected.
Nick Wellnhofer f77ec16d 2024-09-12T01:45:34 html: Optimize htmlParseCharData
Nick Wellnhofer 575be6c1 2024-09-12T01:40:07 html: Fix line numbers with CRs
Nick Wellnhofer e179f3ec 2024-09-11T17:29:59 html: Stop reporting syntax errors It doesn't make much sense to keep the old syntax error handling which doesn't conform to HTML5. Handling HTML5 parser errors is rather involved and not essential for parsers.
Nick Wellnhofer c6af1017 2024-09-08T20:45:48 html: Test tokenizer against html5lib test suite
Nick Wellnhofer 9678163f 2024-09-09T02:01:19 html: Don't check for valid XML characters
Nick Wellnhofer 4eeac309 2024-09-08T22:20:20 html: Start to fix EOF and U+0000 handling
Nick Wellnhofer 17da54c5 2024-09-08T19:16:12 html: Normalize newlines
Nick Wellnhofer 3adb396d 2024-09-07T15:18:13 html: Parse bogus comments instead of ignoring them Also treat XML processing instructions as bogus comments.
Nick Wellnhofer e1834745 2024-09-07T00:54:25 html: Add character data tests
Nick Wellnhofer f9ed30e9 2024-09-06T17:49:04 html: HTML5 character data states
Nick Wellnhofer 59511792 2024-09-03T15:52:44 html: Parse named character references according to HTML5
Nick Wellnhofer a80f8b64 2023-05-04T15:59:31 html: Allow attributes in end tags Attribute are syntactically allowed in HTML5 end tags but otherwise ignored.
Nick Wellnhofer dcb2abb2 2023-05-04T15:16:29 html: Parse tag and attribute names according to HTML5 HTML5 allows bascially all characters in tag and attribute names.
Nick Wellnhofer bd9eed46 2024-09-02T18:37:41 parser: Make unsupported encodings an error in declarations This was changed in 45157261, but in encoding declarations, unsupported encodings should raise a fatal error. Fixes #794.
Nick Wellnhofer 8ae06d52 2024-08-29T00:07:27 SAX2: Don't merge CDATA sections The Document Object Model (DOM) Level 3 Core Specification says: > Adjacent CDATASection nodes are not merged by use of the normalize > method of the Node interface. Fixes #412.
Nick Wellnhofer 322e733b 2024-07-18T19:27:43 xinclude: Fix fallback for text includes Fixes #772.
Nick Wellnhofer 842a0448 2024-07-03T11:46:06 valid: Restore ID lookup Revert a change from d025cfbb and don't overwrite ID table entries, so that the first attribute will be returned if there are duplicate IDs. This requires two other changes: - Attributes in entity content are never added to the ID table. This seems reasonable. - Remove the optimization to skip ID lookup when copying and the target document has an empty ID table. This also seems more correct since the document could have ID declarations nevertheless or we could be copying xml:ids into the document for the first time. Fixes #757.
Nick Wellnhofer 30be984a 2024-06-28T20:37:47 encoding: Rework ISO-8859-X conversion Optimize code. Pass tables as context parameter. Check for XML_ENC_ERR_SPACE.
Nick Wellnhofer 7c11da2d 2024-06-27T12:47:47 tests: Clarify licence of test/intsubset2.xml
Nick Wellnhofer b8903b9e 2024-06-22T17:55:46 runtest: Remove result handling from schemasOneTest We only care about errors.
Nick Wellnhofer e68ccfa9 2024-06-22T16:42:36 tests: Port Schematron tests to C
Nick Wellnhofer 1dd5e76a 2024-06-17T21:06:46 xinclude: Don't remove root element Don't replace include element at root with empty nodeset.
Nick Wellnhofer 52ce0d70 2024-06-17T17:35:12 tests: Add XInclude test for issue #733
Nick Wellnhofer 2608baaf 2024-06-14T19:42:40 parser: Make failure to load main document a warning Revert the change that made failures to load the main document an error. This fixes the --path option of xmllint and xsltproc. Should fix #733.
Nick Wellnhofer 669bd349 2024-06-12T18:20:01 xpointer: Remove support for XPointer locations The latest spec for what it essentially an XPath extension seems to be this working draft from 2002: https://www.w3.org/TR/xptr-xpointer/ The xpointer() scheme is listed as "being reviewed" in the XPointer registry since at least 2006. libxml2 seems to be the only modern software that tries to implement this spec, but the code has many bugs and quality issues. If you configure --with-legacy, old symbols are retained for ABI compatibility.
Nick Wellnhofer 4fefba4c 2024-05-15T17:52:20 parser: Rework handling of undeclared entities Throw an error if entity substitution was requested. Now we only downgrade to a warning if - XML_PARSE_DTDLOAD wasn't specified, and - entity aren't substituted or XML_PARSE_NO_XXE was specified. Should fix #724.
Nick Wellnhofer fdc5ff36 2024-05-02T16:23:04 parser: Always throw entity errors if external DTD is loaded When parsing with XML_PARSE_DTDLOAD, missing entities are always an error. Also consolidate behavior when validating. See b717abdd.
Nick Wellnhofer 39e5b35b 2024-05-02T22:06:19 parser: Don't create undeclared entity refs in substitution mode We never want to create entity reference nodes if entity substitution is enabled. This also applies to undeclared entities.
Nick Wellnhofer 45fe9924 2024-04-22T17:12:54 parser: Don't create reference in xmlLookupGeneralEntity This should only be done in xmlParseReference. The handling of undeclared entities is still somewhat inconsistent. In element content we create references even if entity substitution is enabled. In attribute values undeclared entities are always ignored.
Nick Wellnhofer b717abdd 2024-04-22T15:42:39 parser: Consolidate error handling for undeclared entities Always use XML_WAR_UNDECLARED_ENTITY with warning error level in documents with external subset or parameter entities. Use XML_ERR_UNDECLARED_ENTITY otherwise.
Nick Wellnhofer f506ec66 2024-04-15T11:27:44 parser: Always decode entities in namespace URIs Also decode entities in namespace URIs if entity substitution wasn't requested. This should fix some corner cases when comparing namespace URIs. The Namespaces in XML 1.0 spec says: > In a namespace declaration, the URI reference is the normalized value > of the attribute, so replacement of XML character and entity > references has already been done before any comparison. Make the serialization code escape special characters in namespace URIs like in attribute values. This fixes serialization if entities were substituted when parsing. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
Seiya Nakata 5bb84b47 2024-04-04T11:55:28 relaxng: Fix tree corruption in xmlRelaxNGParseNameClass Don't create cycles in tree structure. This will lead to an infinite loop or call stack overflow later. Closes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/711
Nick Wellnhofer f43197fc 2024-03-29T11:16:45 tree: Don't coalesce text nodes in xmlAdd{Prev,Next}Sibling Commit 9e1c72da from 2001 introduced a bug where xmlAddPrevSibling and xmlAddNextSibling would only try to merge text nodes with one of its new siblings. Commit 4ccd3eb8 fixed this bug but unfortunately, lxml and possibly other downstream code depend on text nodes not being merged. To avoid breaking downstream code while still having somewhat consistent API behavior, it's probably best to make these functions never coalesce text nodes.
Nick Wellnhofer 4ccd3eb8 2024-03-11T19:43:56 tree: Refactor node insertion Also fixes a text coalescing bug.
Nick Wellnhofer 186562a1 2024-03-12T19:55:33 parser: Fix detection of duplicate attributes in XML namespace Fixes a regression from commit e0dd330b, resulting in duplicate attributes in the predefined XML namespace not being detected or extraneous default attributes being passed. Fixes #704.
Nick Wellnhofer 63986c45 2024-01-22T21:02:16 parser: Report fatal error if document entity couldn't be loaded Only lower error level when loading entities. Fixes #667.
Nick Wellnhofer 29beef65 2024-01-02T21:50:38 parser: Pop inputs if parsing DTD failed This should provide some statistics in ctxt->sizeentcopy even in the error or recovery case.
Nick Wellnhofer f237e5b9 2024-01-05T15:40:23 parser: Avoid duplicate namespace errors Don't report an extra attribute uniqueness error if a namespace is undeclared. This matches old behavior.
Nick Wellnhofer 07c05546 2024-01-04T02:48:02 error: Make xmlFormatError public This is a useful function to get a verbose error report. Allows to remove duplicated code from runtest.c. Also reactivate check for schema parser failures.
Nick Wellnhofer d0eb5a7e 2024-01-03T18:12:29 parser: Remove xmlErrEncodingInt Convert the last user to xmlFatalErr.
Nick Wellnhofer e8fb3d63 2024-01-02T17:45:54 parser: Convert some "internal errors" to meaningful codes
Nick Wellnhofer 37c6618b 2023-12-30T02:50:34 parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.
Nick Wellnhofer f0dc52d0 2023-12-29T06:00:20 parser: Move cleanup of element stacks to xmlParseContent
Nick Wellnhofer d025cfbb 2023-12-27T03:53:24 parser: Always copy content from entity to target. Make sure that references from IDs are updated. Note that if there are IDs with the same value in a document, the last one will now be returned. IDs should be unique, but maybe this should be addressed.
Nick Wellnhofer 4ecc85d2 2023-12-27T00:44:16 parser: Push general entity input streams on the stack This allows the error handler to give more context.
Nick Wellnhofer d944a415 2023-12-26T02:10:35 parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.
Nick Wellnhofer b8313b58 2023-12-26T21:59:08 xpath: Rewrite substring-before and substring-after Don't use buffers. Check malloc failures.
Nick Wellnhofer f3fa34dc 2023-12-26T22:37:26 parser: Fix general entity parsing Clear namespace database. Ignore non-fatal errors.
Nick Wellnhofer ecfbcc8a 2023-12-25T04:33:00 parser: Rework general entity parsing Don't create a new parser context but reuse the existing one. This exposes bug #601 in a more obvious way.
Nick Wellnhofer 6e3a2ac6 2023-12-22T21:38:50 xinclude: Rework xml:base fixup The xml:base fixup was broken in more complex cases. Also avoid parsing and building the included URI multiple times.
Nick Wellnhofer f0df3e6d 2023-12-21T14:35:18 tests: Try to fix RelaxNG test cases These were added recently in ea695ac0 and 8074b881 but were a total mess of symbolic links and apparently mixed up files. Symbolic links don't work on Windows. Try to salvage one of the tests.
Nick Wellnhofer 8d0aaf4b 2023-12-19T20:47:36 parser: Remove xmlErrEncoding Use xmlFatalErr or xmlCtxtErrIO.
Nick Wellnhofer 7e511f35 2023-12-19T15:41:37 io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile This allows to report the reason why opening a file failed to the parser context and improve error messages. Now we can also remove the stat call before opening a file.
Nick Wellnhofer 83c6aeef 2023-12-18T21:12:29 relaxng: Improve error handling Pass RelaxNG structured error handler to XML parser. Handle malloc failure from xmlRaiseError. Remove argument from memory error handler. Use xmlRaiseMemoryError. Don't use xmlGenericError. Remove TODO macro.
Nick Wellnhofer 157df344 2023-12-10T18:23:53 xmlreader: Report malloc failures Fix many places where malloc failures aren't reported. Introduce a new API function xmlTextReaderGetLastError.
Nick Wellnhofer e58ea29f 2023-12-10T18:10:42 SAX2: Report malloc failures Fix many places where malloc failures aren't reported. Improve error handling when parsing entity declarations. Fixes #308.
Nick Wellnhofer a1f7ecae 2023-12-10T15:25:42 entities: Report malloc failures Fix places where malloc failures aren't reported. Introduce new API function xmlAddEntity that returns separate error codes. Don't invoke global error handler for low-level errors which should be handled by higher layers. Invalid redelcaration warnings will be fixed later.
Nick Wellnhofer 7d446e97 2023-12-08T12:13:49 parser: Fix namespaces redefined from default attributes This regressed in commit e0dd330b. Also fixes a long-standing issue where namespaces from default attributes weren't added if they match an existing namespace. Fixes #643.
Nick Wellnhofer e3959461 2023-11-30T16:15:46 html: Reenable buggy detection of XML declarations Switch to UTF-8 if a document starts with '<?xm' to match old behavior. Also enable this check in the push parser. Fixes #637.
Nick Wellnhofer 43b511fa 2023-11-26T14:31:39 parser: Make CRLF increment line number Partial revert of cb927e85 fixing CRLFs not incrementing the line number. This requires to rework xmlParseQNameHashed. The original implementation prompted the change to xmlCurrentChar which really shouldn't modify the 'cur' pointer as side effect. But the NEXTL macro relies on this behavior. Ultimately, we should reintroduce the change to xmlCurrentChar and fix the NEXTL macro. This will lead to single CRs incrementing the line number as well which seems more consistent. Fixes #628.
Nick Wellnhofer a2b5c90a 2023-11-21T14:35:54 hash: Fix deletion of entries during scan Functions like xmlCleanSpecialAttr scan a hash table and possibly delete entries in the callback. xmlHashScanFull must detect such deletions and rescan the entry. This regressed when rewriting the hash table code in 4a513d56. Fixes #626.
Nick Wellnhofer 7a2d412f 2023-10-31T20:15:38 parser: Copy default namespace in xmlParseBalancedChunkMemory
Nick Wellnhofer e0c2f14d 2023-10-31T13:53:15 parser: Copy namespaces in xmlParseBalancedChunkMemory Reenable copying of namespaces but don't set SAX data. This should match the old behavior.
Nick Wellnhofer b76d81da 2023-10-06T11:50:29 parser: Fix regression when push parsing parameter entities Short-lived regression from 834b8123. Also shrink parameter entity buffers when push parsing.
Nick Wellnhofer 134d2ad8 2023-10-06T00:31:44 parser: Protect against quadratic default attribute expansion
Nick Wellnhofer 0ba22c05 2023-10-05T22:05:04 parser: Support encoded external PEs in entity values Corner case which was never supported.
Nick Wellnhofer 6337a14a 2023-10-06T10:44:38 tests: Handle entities in SAX tests