kmx git

Commit	Date	Message
149c04c0	2025-08-02T14:59:02	html: Escape < and > when serializing attributes This reverts the change in cdaf657f. Coincidentally, the HTML spec just changed to mandate the old escaping behavior: https://github.com/whatwg/html/issues/6235 Fixes #957.
0c948334	2025-07-10T11:23:44	html: Add newline to error message
bc0bb67b	2025-07-10T11:20:22	html: Don't abort on encoding errors Always enable recovery mode when parsing HTML, so we don't raise fatal errors. Regressed with 462bf0b7. Fixes #947.
71e1e8af	2025-07-04T14:28:26	schematron: Fix memory safety issues in xmlSchematronReportOutput Fix use-after-free (CVE-2025-49794) and type confusion (CVE-2025-49796) in xmlSchematronReportOutput. Fixes #931. Fixes #933.
24d7e159	2025-07-04T12:19:20	schematron: Complete fix for CVE-2025-49795 - Fix memory leaks - Fix tests
499bcb78	2025-06-21T12:11:30	Schematron: Fix null pointer dereference leading to DoS (CVE-2025-49795) Fixes #932
069bcda1	2025-06-20T23:05:00	Fix potential buffer overflows of interactive shell CVE-2025-6170 Fixes #941
9760a14f	2025-06-30T13:47:33	relaxng: In the simplification step also unlink notAllowed refs from choice This fixes false reports of non allowed content compared to notAllowed as tag within the choice tag.
ad0f5d27	2025-06-24T13:02:13	tree: Fix xmlGetNodePath - Fix quadratic behavior - Don't truncate names Fixes #715.
ab06bfa1	2025-05-26T15:03:07	parser: Fix error return in xmlParseElementContentDecl Avoid internal error later in xmlValidBuildAContentModel after 2a60ca06c. Also avoids some unnecessary error messages.
5ec83f77	2025-05-20T03:21:27	valid: Remove duplicate #FIXED check for namespaces Unlike the comment indicates, this is already checked.
7c10fff2	2025-05-20T22:48:25	valid: Don't validate twice in xmlAddAttributeDecl This should only be done in xmlValidateAttributeDecl.
2f3655c9	2025-05-20T19:40:06	parser: Pop PEs that start markup declarations explicitly We currently only handle "Validity constraint: Proper Declaration/PE Nesting", but we must detect "Well-formedness constraint: PE Between Declarations" separately: > The replacement text of a parameter entity reference in a DeclSep must > match the production extSubsetDecl. PEs in DeclSeps are PEs that start with a full markup declaration (or another PE). These are handled in xmParse{Internal\|External}Subset. We set a flag on these PEs and don't close them implicitly in xmlSkipBlankCharsPE. This will make unterminated declarations in such PEs cause a parser error. The PEs are closed explicitly in xmParse{Internal\|External}Subset, the only location where they are allowed to end.
dd1961e0	2025-05-20T16:37:18	valid: Skip more validity checks if not validating
3a68d0b7	2025-05-19T18:59:51	SAX2: Handle xml:id errors separately
87087def	2025-05-13T16:19:42	tests: Remove result files committed by accident
f0983199	2025-05-12T13:00:20	html: Map some encodings according to HTML5 Windows-1252 is a superset of ISO-8859-1 and should be used instead. Same for ASCII. Also map UCS-2 and UTF-16 to UTF-16LE.
825f3a9d	2025-05-11T21:38:16	html: Always serialize attributes with double quotes Align with HTML5.
cdaf657f	2025-05-09T23:02:32	html: Don't escape < and > when serializing attribute values Align with HTML5. This will break some test suites.
c8cea39d	2025-05-09T21:31:07	save: Fix serialization of attribute defaults containing < Long-standing bug that produced invalid XML.
46f05ea4	2025-05-09T00:21:47	html: Rework meta charset handling Don't use encoding from meta tags when serializing. Only use the value in `doc->encoding`, matching the XML serializer. This is the actual encoding used when parsing. Stop modifying the input document by setting meta tags before serializing. Meta tags are now injected during serialization. Add full support for <meta charset=""> which is also used when adding meta tags. Align with HTML5 and implement the "algorithm for extracting a character encoding from a meta element". Only modify the encoding substring in Content-Type meta tags. Only switch encoding once when parsing. Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading UTF-8 charset. Fixes #909.
f3a080bc	2025-05-07T14:32:42	html: Ignore U+0000 in body text Align with HTML5. Fixes #908.
6896f478	2025-04-18T17:22:36	Revert "valid: Remove duplicate error messages when streaming" This reverts commit cd220b93d8ffffd2fb7cac0ec792bebb7e082521. This commit broke the xmstarlet tests.
69b83bb6	2025-03-10T02:18:51	encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.
05bd1720	2025-03-01T10:25:29	parser: Fix parsing of DTD content Regressed in 2.11. Fixes #868.
9f86dae9	2024-12-15T14:27:05	test: Add test case for UAF in xmlSchemaIDCFillNodeTables
8cf6129b	2025-02-13T18:20:46	html: Stop implying <p> start tags Only <html>, <head> or <body> should be implied. Opening extra <p> tags has always been a libxml2 quirk.
71122421	2025-02-13T14:04:10	html: Make implied <p> tags more deterministic libxml2's HTML parser adds <p> start tags in some situations. This behavior, which doesn't follow any standard, was added in 2000, see here: http://veillard.com/XML/messages/0655.html Text nodes that only contain whitespace don't imply a <p> tag, but the whitespace check cannot work reliably if we're parsing partial text data which can happen with both pull and push parser. The logic in `areBlanks` is hard to follow. The checks involving `CUR` depend on the position of the input pointer and seem dubious. It's also possible that the behavior changed inadvertently with a later commit. As a result, it's hard to come up with good test cases. We now process leading whitespace before creating implied tags. This is more in line with HTML5 and should avoid at least some issues with partial text data. For example, parsing the string "<head> x" used to result in: <html> <head></head> <body><p> x</p></body> </html> And now results in: <html> <head> </head> <body><p>x</p></body> </html> Except for the implied <p> tag, this matches HTML5.
b4d3d87e	2025-02-01T22:02:33	parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.
08028572	2025-02-01T18:21:47	html: Make data parsing modes work with push parser This can't be solved with a simple scan for a terminator. Instead, we make htmlParseCharData handle incomplete data if the "partial" flag is set.
cd220b93	2024-12-27T14:55:43	valid: Remove duplicate error messages when streaming
45914614	2024-11-05T12:05:14	xpath: Fix parsing of non-ASCII names Fix a long-standing issue where QNames starting with a non-ASCII character would be rejected. This became more visible after "streaming" XPath evaluation was disabled since the latter handled non-ASCII names correctly. Fixes #818.
ffb058f4	2024-10-28T20:12:52	parser: Fix detection of duplicate attributes We really need a second scan if more than one namespace clash was detected.
f77ec16d	2024-09-12T01:45:34	html: Optimize htmlParseCharData
575be6c1	2024-09-12T01:40:07	html: Fix line numbers with CRs
e179f3ec	2024-09-11T17:29:59	html: Stop reporting syntax errors It doesn't make much sense to keep the old syntax error handling which doesn't conform to HTML5. Handling HTML5 parser errors is rather involved and not essential for parsers.
c6af1017	2024-09-08T20:45:48	html: Test tokenizer against html5lib test suite
9678163f	2024-09-09T02:01:19	html: Don't check for valid XML characters
4eeac309	2024-09-08T22:20:20	html: Start to fix EOF and U+0000 handling
17da54c5	2024-09-08T19:16:12	html: Normalize newlines
3adb396d	2024-09-07T15:18:13	html: Parse bogus comments instead of ignoring them Also treat XML processing instructions as bogus comments.
e1834745	2024-09-07T00:54:25	html: Add character data tests
f9ed30e9	2024-09-06T17:49:04	html: HTML5 character data states
59511792	2024-09-03T15:52:44	html: Parse named character references according to HTML5
a80f8b64	2023-05-04T15:59:31	html: Allow attributes in end tags Attribute are syntactically allowed in HTML5 end tags but otherwise ignored.
dcb2abb2	2023-05-04T15:16:29	html: Parse tag and attribute names according to HTML5 HTML5 allows bascially all characters in tag and attribute names.
bd9eed46	2024-09-02T18:37:41	parser: Make unsupported encodings an error in declarations This was changed in 45157261, but in encoding declarations, unsupported encodings should raise a fatal error. Fixes #794.
8ae06d52	2024-08-29T00:07:27	SAX2: Don't merge CDATA sections The Document Object Model (DOM) Level 3 Core Specification says: > Adjacent CDATASection nodes are not merged by use of the normalize > method of the Node interface. Fixes #412.
322e733b	2024-07-18T19:27:43	xinclude: Fix fallback for text includes Fixes #772.
842a0448	2024-07-03T11:46:06	valid: Restore ID lookup Revert a change from d025cfbb and don't overwrite ID table entries, so that the first attribute will be returned if there are duplicate IDs. This requires two other changes: - Attributes in entity content are never added to the ID table. This seems reasonable. - Remove the optimization to skip ID lookup when copying and the target document has an empty ID table. This also seems more correct since the document could have ID declarations nevertheless or we could be copying xml:ids into the document for the first time. Fixes #757.
30be984a	2024-06-28T20:37:47	encoding: Rework ISO-8859-X conversion Optimize code. Pass tables as context parameter. Check for XML_ENC_ERR_SPACE.
7c11da2d	2024-06-27T12:47:47	tests: Clarify licence of test/intsubset2.xml
b8903b9e	2024-06-22T17:55:46	runtest: Remove result handling from schemasOneTest We only care about errors.
e68ccfa9	2024-06-22T16:42:36	tests: Port Schematron tests to C
1dd5e76a	2024-06-17T21:06:46	xinclude: Don't remove root element Don't replace include element at root with empty nodeset.
52ce0d70	2024-06-17T17:35:12	tests: Add XInclude test for issue #733
2608baaf	2024-06-14T19:42:40	parser: Make failure to load main document a warning Revert the change that made failures to load the main document an error. This fixes the --path option of xmllint and xsltproc. Should fix #733.
669bd349	2024-06-12T18:20:01	xpointer: Remove support for XPointer locations The latest spec for what it essentially an XPath extension seems to be this working draft from 2002: https://www.w3.org/TR/xptr-xpointer/ The xpointer() scheme is listed as "being reviewed" in the XPointer registry since at least 2006. libxml2 seems to be the only modern software that tries to implement this spec, but the code has many bugs and quality issues. If you configure --with-legacy, old symbols are retained for ABI compatibility.
4fefba4c	2024-05-15T17:52:20	parser: Rework handling of undeclared entities Throw an error if entity substitution was requested. Now we only downgrade to a warning if - XML_PARSE_DTDLOAD wasn't specified, and - entity aren't substituted or XML_PARSE_NO_XXE was specified. Should fix #724.
fdc5ff36	2024-05-02T16:23:04	parser: Always throw entity errors if external DTD is loaded When parsing with XML_PARSE_DTDLOAD, missing entities are always an error. Also consolidate behavior when validating. See b717abdd.
39e5b35b	2024-05-02T22:06:19	parser: Don't create undeclared entity refs in substitution mode We never want to create entity reference nodes if entity substitution is enabled. This also applies to undeclared entities.
45fe9924	2024-04-22T17:12:54	parser: Don't create reference in xmlLookupGeneralEntity This should only be done in xmlParseReference. The handling of undeclared entities is still somewhat inconsistent. In element content we create references even if entity substitution is enabled. In attribute values undeclared entities are always ignored.
b717abdd	2024-04-22T15:42:39	parser: Consolidate error handling for undeclared entities Always use XML_WAR_UNDECLARED_ENTITY with warning error level in documents with external subset or parameter entities. Use XML_ERR_UNDECLARED_ENTITY otherwise.
f506ec66	2024-04-15T11:27:44	parser: Always decode entities in namespace URIs Also decode entities in namespace URIs if entity substitution wasn't requested. This should fix some corner cases when comparing namespace URIs. The Namespaces in XML 1.0 spec says: > In a namespace declaration, the URI reference is the normalized value > of the attribute, so replacement of XML character and entity > references has already been done before any comparison. Make the serialization code escape special characters in namespace URIs like in attribute values. This fixes serialization if entities were substituted when parsing. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
5bb84b47	2024-04-04T11:55:28	relaxng: Fix tree corruption in xmlRelaxNGParseNameClass Don't create cycles in tree structure. This will lead to an infinite loop or call stack overflow later. Closes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/711
f43197fc	2024-03-29T11:16:45	tree: Don't coalesce text nodes in xmlAdd{Prev,Next}Sibling Commit 9e1c72da from 2001 introduced a bug where xmlAddPrevSibling and xmlAddNextSibling would only try to merge text nodes with one of its new siblings. Commit 4ccd3eb8 fixed this bug but unfortunately, lxml and possibly other downstream code depend on text nodes not being merged. To avoid breaking downstream code while still having somewhat consistent API behavior, it's probably best to make these functions never coalesce text nodes.
4ccd3eb8	2024-03-11T19:43:56	tree: Refactor node insertion Also fixes a text coalescing bug.
186562a1	2024-03-12T19:55:33	parser: Fix detection of duplicate attributes in XML namespace Fixes a regression from commit e0dd330b, resulting in duplicate attributes in the predefined XML namespace not being detected or extraneous default attributes being passed. Fixes #704.
63986c45	2024-01-22T21:02:16	parser: Report fatal error if document entity couldn't be loaded Only lower error level when loading entities. Fixes #667.
29beef65	2024-01-02T21:50:38	parser: Pop inputs if parsing DTD failed This should provide some statistics in ctxt->sizeentcopy even in the error or recovery case.
f237e5b9	2024-01-05T15:40:23	parser: Avoid duplicate namespace errors Don't report an extra attribute uniqueness error if a namespace is undeclared. This matches old behavior.
07c05546	2024-01-04T02:48:02	error: Make xmlFormatError public This is a useful function to get a verbose error report. Allows to remove duplicated code from runtest.c. Also reactivate check for schema parser failures.
d0eb5a7e	2024-01-03T18:12:29	parser: Remove xmlErrEncodingInt Convert the last user to xmlFatalErr.
e8fb3d63	2024-01-02T17:45:54	parser: Convert some "internal errors" to meaningful codes
37c6618b	2023-12-30T02:50:34	parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.
f0dc52d0	2023-12-29T06:00:20	parser: Move cleanup of element stacks to xmlParseContent
d025cfbb	2023-12-27T03:53:24	parser: Always copy content from entity to target. Make sure that references from IDs are updated. Note that if there are IDs with the same value in a document, the last one will now be returned. IDs should be unique, but maybe this should be addressed.
4ecc85d2	2023-12-27T00:44:16	parser: Push general entity input streams on the stack This allows the error handler to give more context.
d944a415	2023-12-26T02:10:35	parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.
b8313b58	2023-12-26T21:59:08	xpath: Rewrite substring-before and substring-after Don't use buffers. Check malloc failures.
f3fa34dc	2023-12-26T22:37:26	parser: Fix general entity parsing Clear namespace database. Ignore non-fatal errors.
ecfbcc8a	2023-12-25T04:33:00	parser: Rework general entity parsing Don't create a new parser context but reuse the existing one. This exposes bug #601 in a more obvious way.
6e3a2ac6	2023-12-22T21:38:50	xinclude: Rework xml:base fixup The xml:base fixup was broken in more complex cases. Also avoid parsing and building the included URI multiple times.
f0df3e6d	2023-12-21T14:35:18	tests: Try to fix RelaxNG test cases These were added recently in ea695ac0 and 8074b881 but were a total mess of symbolic links and apparently mixed up files. Symbolic links don't work on Windows. Try to salvage one of the tests.
8d0aaf4b	2023-12-19T20:47:36	parser: Remove xmlErrEncoding Use xmlFatalErr or xmlCtxtErrIO.
7e511f35	2023-12-19T15:41:37	io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile This allows to report the reason why opening a file failed to the parser context and improve error messages. Now we can also remove the stat call before opening a file.
83c6aeef	2023-12-18T21:12:29	relaxng: Improve error handling Pass RelaxNG structured error handler to XML parser. Handle malloc failure from xmlRaiseError. Remove argument from memory error handler. Use xmlRaiseMemoryError. Don't use xmlGenericError. Remove TODO macro.
157df344	2023-12-10T18:23:53	xmlreader: Report malloc failures Fix many places where malloc failures aren't reported. Introduce a new API function xmlTextReaderGetLastError.
e58ea29f	2023-12-10T18:10:42	SAX2: Report malloc failures Fix many places where malloc failures aren't reported. Improve error handling when parsing entity declarations. Fixes #308.
a1f7ecae	2023-12-10T15:25:42	entities: Report malloc failures Fix places where malloc failures aren't reported. Introduce new API function xmlAddEntity that returns separate error codes. Don't invoke global error handler for low-level errors which should be handled by higher layers. Invalid redelcaration warnings will be fixed later.
7d446e97	2023-12-08T12:13:49	parser: Fix namespaces redefined from default attributes This regressed in commit e0dd330b. Also fixes a long-standing issue where namespaces from default attributes weren't added if they match an existing namespace. Fixes #643.
e3959461	2023-11-30T16:15:46	html: Reenable buggy detection of XML declarations Switch to UTF-8 if a document starts with '<?xm' to match old behavior. Also enable this check in the push parser. Fixes #637.
43b511fa	2023-11-26T14:31:39	parser: Make CRLF increment line number Partial revert of cb927e85 fixing CRLFs not incrementing the line number. This requires to rework xmlParseQNameHashed. The original implementation prompted the change to xmlCurrentChar which really shouldn't modify the 'cur' pointer as side effect. But the NEXTL macro relies on this behavior. Ultimately, we should reintroduce the change to xmlCurrentChar and fix the NEXTL macro. This will lead to single CRs incrementing the line number as well which seems more consistent. Fixes #628.
a2b5c90a	2023-11-21T14:35:54	hash: Fix deletion of entries during scan Functions like xmlCleanSpecialAttr scan a hash table and possibly delete entries in the callback. xmlHashScanFull must detect such deletions and rescan the entry. This regressed when rewriting the hash table code in 4a513d56. Fixes #626.
7a2d412f	2023-10-31T20:15:38	parser: Copy default namespace in xmlParseBalancedChunkMemory
e0c2f14d	2023-10-31T13:53:15	parser: Copy namespaces in xmlParseBalancedChunkMemory Reenable copying of namespaces but don't set SAX data. This should match the old behavior.
b76d81da	2023-10-06T11:50:29	parser: Fix regression when push parsing parameter entities Short-lived regression from 834b8123. Also shrink parameter entity buffers when push parsing.
134d2ad8	2023-10-06T00:31:44	parser: Protect against quadratic default attribute expansion
0ba22c05	2023-10-05T22:05:04	parser: Support encoded external PEs in entity values Corner case which was never supported.
6337a14a	2023-10-06T10:44:38	tests: Handle entities in SAX tests

149c04c0

2025-08-02T14:59:02

html: Escape < and > when serializing attributes This reverts the change in cdaf657f. Coincidentally, the HTML spec just changed to mandate the old escaping behavior: https://github.com/whatwg/html/issues/6235 Fixes #957.

0c948334

2025-07-10T11:23:44

html: Add newline to error message

bc0bb67b

2025-07-10T11:20:22

html: Don't abort on encoding errors Always enable recovery mode when parsing HTML, so we don't raise fatal errors. Regressed with 462bf0b7. Fixes #947.

71e1e8af

2025-07-04T14:28:26

schematron: Fix memory safety issues in xmlSchematronReportOutput Fix use-after-free (CVE-2025-49794) and type confusion (CVE-2025-49796) in xmlSchematronReportOutput. Fixes #931. Fixes #933.

24d7e159

2025-07-04T12:19:20

schematron: Complete fix for CVE-2025-49795 - Fix memory leaks - Fix tests

499bcb78

2025-06-21T12:11:30

Schematron: Fix null pointer dereference leading to DoS (CVE-2025-49795) Fixes #932

069bcda1

2025-06-20T23:05:00

Fix potential buffer overflows of interactive shell CVE-2025-6170 Fixes #941

9760a14f

2025-06-30T13:47:33

relaxng: In the simplification step also unlink notAllowed refs from choice This fixes false reports of non allowed content compared to notAllowed as tag within the choice tag.

ad0f5d27

2025-06-24T13:02:13

tree: Fix xmlGetNodePath - Fix quadratic behavior - Don't truncate names Fixes #715.

ab06bfa1

2025-05-26T15:03:07

parser: Fix error return in xmlParseElementContentDecl Avoid internal error later in xmlValidBuildAContentModel after 2a60ca06c. Also avoids some unnecessary error messages.

5ec83f77

2025-05-20T03:21:27

valid: Remove duplicate #FIXED check for namespaces Unlike the comment indicates, this is already checked.

7c10fff2

2025-05-20T22:48:25

valid: Don't validate twice in xmlAddAttributeDecl This should only be done in xmlValidateAttributeDecl.

2f3655c9

2025-05-20T19:40:06

parser: Pop PEs that start markup declarations explicitly We currently only handle "Validity constraint: Proper Declaration/PE Nesting", but we must detect "Well-formedness constraint: PE Between Declarations" separately: > The replacement text of a parameter entity reference in a DeclSep must > match the production extSubsetDecl. PEs in DeclSeps are PEs that start with a full markup declaration (or another PE). These are handled in xmParse{Internal|External}Subset. We set a flag on these PEs and don't close them implicitly in xmlSkipBlankCharsPE. This will make unterminated declarations in such PEs cause a parser error. The PEs are closed explicitly in xmParse{Internal|External}Subset, the only location where they are allowed to end.

dd1961e0

2025-05-20T16:37:18

valid: Skip more validity checks if not validating

3a68d0b7

2025-05-19T18:59:51

SAX2: Handle xml:id errors separately

87087def

2025-05-13T16:19:42

tests: Remove result files committed by accident

f0983199

2025-05-12T13:00:20

html: Map some encodings according to HTML5 Windows-1252 is a superset of ISO-8859-1 and should be used instead. Same for ASCII. Also map UCS-2 and UTF-16 to UTF-16LE.

825f3a9d

2025-05-11T21:38:16

html: Always serialize attributes with double quotes Align with HTML5.

cdaf657f

2025-05-09T23:02:32

html: Don't escape < and > when serializing attribute values Align with HTML5. This will break some test suites.

c8cea39d

2025-05-09T21:31:07

save: Fix serialization of attribute defaults containing < Long-standing bug that produced invalid XML.

46f05ea4

2025-05-09T00:21:47

html: Rework meta charset handling Don't use encoding from meta tags when serializing. Only use the value in `doc->encoding`, matching the XML serializer. This is the actual encoding used when parsing. Stop modifying the input document by setting meta tags before serializing. Meta tags are now injected during serialization. Add full support for <meta charset=""> which is also used when adding meta tags. Align with HTML5 and implement the "algorithm for extracting a character encoding from a meta element". Only modify the encoding substring in Content-Type meta tags. Only switch encoding once when parsing. Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading UTF-8 charset. Fixes #909.

f3a080bc

2025-05-07T14:32:42

html: Ignore U+0000 in body text Align with HTML5. Fixes #908.

6896f478

2025-04-18T17:22:36

Revert "valid: Remove duplicate error messages when streaming" This reverts commit cd220b93d8ffffd2fb7cac0ec792bebb7e082521. This commit broke the xmstarlet tests.

69b83bb6

2025-03-10T02:18:51

encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.

05bd1720

2025-03-01T10:25:29

parser: Fix parsing of DTD content Regressed in 2.11. Fixes #868.

9f86dae9

2024-12-15T14:27:05

test: Add test case for UAF in xmlSchemaIDCFillNodeTables

8cf6129b

2025-02-13T18:20:46

html: Stop implying start tags Only <html>, <head> or <body> should be implied. Opening extra tags has always been a libxml2 quirk.

71122421

2025-02-13T14:04:10

html: Make implied tags more deterministic libxml2's HTML parser adds start tags in some situations. This behavior, which doesn't follow any standard, was added in 2000, see here: http://veillard.com/XML/messages/0655.html Text nodes that only contain whitespace don't imply a tag, but the whitespace check cannot work reliably if we're parsing partial text data which can happen with both pull and push parser. The logic in `areBlanks` is hard to follow. The checks involving `CUR` depend on the position of the input pointer and seem dubious. It's also possible that the behavior changed inadvertently with a later commit. As a result, it's hard to come up with good test cases. We now process leading whitespace before creating implied tags. This is more in line with HTML5 and should avoid at least some issues with partial text data. For example, parsing the string "<head> x" used to result in: <html> <head></head> <body> x</body> </html> And now results in: <html> <head> </head> <body>x</body> </html> Except for the implied tag, this matches HTML5.

b4d3d87e

2025-02-01T22:02:33

parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.

08028572

2025-02-01T18:21:47

html: Make data parsing modes work with push parser This can't be solved with a simple scan for a terminator. Instead, we make htmlParseCharData handle incomplete data if the "partial" flag is set.

cd220b93

2024-12-27T14:55:43

valid: Remove duplicate error messages when streaming

45914614

2024-11-05T12:05:14

xpath: Fix parsing of non-ASCII names Fix a long-standing issue where QNames starting with a non-ASCII character would be rejected. This became more visible after "streaming" XPath evaluation was disabled since the latter handled non-ASCII names correctly. Fixes #818.

ffb058f4

2024-10-28T20:12:52

parser: Fix detection of duplicate attributes We really need a second scan if more than one namespace clash was detected.

f77ec16d

2024-09-12T01:45:34

html: Optimize htmlParseCharData

575be6c1

2024-09-12T01:40:07

html: Fix line numbers with CRs

e179f3ec

2024-09-11T17:29:59

html: Stop reporting syntax errors It doesn't make much sense to keep the old syntax error handling which doesn't conform to HTML5. Handling HTML5 parser errors is rather involved and not essential for parsers.

c6af1017

2024-09-08T20:45:48

html: Test tokenizer against html5lib test suite

9678163f

2024-09-09T02:01:19

html: Don't check for valid XML characters

4eeac309

2024-09-08T22:20:20

html: Start to fix EOF and U+0000 handling

17da54c5

2024-09-08T19:16:12

html: Normalize newlines

3adb396d

2024-09-07T15:18:13

html: Parse bogus comments instead of ignoring them Also treat XML processing instructions as bogus comments.

e1834745

2024-09-07T00:54:25

html: Add character data tests

f9ed30e9

2024-09-06T17:49:04

html: HTML5 character data states

59511792

2024-09-03T15:52:44

html: Parse named character references according to HTML5

a80f8b64

2023-05-04T15:59:31

html: Allow attributes in end tags Attribute are syntactically allowed in HTML5 end tags but otherwise ignored.

dcb2abb2

2023-05-04T15:16:29

html: Parse tag and attribute names according to HTML5 HTML5 allows bascially all characters in tag and attribute names.

bd9eed46

2024-09-02T18:37:41

parser: Make unsupported encodings an error in declarations This was changed in 45157261, but in encoding declarations, unsupported encodings should raise a fatal error. Fixes #794.

8ae06d52

2024-08-29T00:07:27

SAX2: Don't merge CDATA sections The Document Object Model (DOM) Level 3 Core Specification says: > Adjacent CDATASection nodes are not merged by use of the normalize > method of the Node interface. Fixes #412.

322e733b

2024-07-18T19:27:43

xinclude: Fix fallback for text includes Fixes #772.

842a0448

2024-07-03T11:46:06

valid: Restore ID lookup Revert a change from d025cfbb and don't overwrite ID table entries, so that the first attribute will be returned if there are duplicate IDs. This requires two other changes: - Attributes in entity content are never added to the ID table. This seems reasonable. - Remove the optimization to skip ID lookup when copying and the target document has an empty ID table. This also seems more correct since the document could have ID declarations nevertheless or we could be copying xml:ids into the document for the first time. Fixes #757.

30be984a

2024-06-28T20:37:47

encoding: Rework ISO-8859-X conversion Optimize code. Pass tables as context parameter. Check for XML_ENC_ERR_SPACE.

7c11da2d

2024-06-27T12:47:47

tests: Clarify licence of test/intsubset2.xml

b8903b9e

2024-06-22T17:55:46

runtest: Remove result handling from schemasOneTest We only care about errors.

e68ccfa9

2024-06-22T16:42:36

tests: Port Schematron tests to C

1dd5e76a

2024-06-17T21:06:46

xinclude: Don't remove root element Don't replace include element at root with empty nodeset.

52ce0d70

2024-06-17T17:35:12

tests: Add XInclude test for issue #733

2608baaf

2024-06-14T19:42:40

parser: Make failure to load main document a warning Revert the change that made failures to load the main document an error. This fixes the --path option of xmllint and xsltproc. Should fix #733.

669bd349

2024-06-12T18:20:01

xpointer: Remove support for XPointer locations The latest spec for what it essentially an XPath extension seems to be this working draft from 2002: https://www.w3.org/TR/xptr-xpointer/ The xpointer() scheme is listed as "being reviewed" in the XPointer registry since at least 2006. libxml2 seems to be the only modern software that tries to implement this spec, but the code has many bugs and quality issues. If you configure --with-legacy, old symbols are retained for ABI compatibility.

4fefba4c

2024-05-15T17:52:20

parser: Rework handling of undeclared entities Throw an error if entity substitution was requested. Now we only downgrade to a warning if - XML_PARSE_DTDLOAD wasn't specified, and - entity aren't substituted or XML_PARSE_NO_XXE was specified. Should fix #724.

fdc5ff36

2024-05-02T16:23:04

parser: Always throw entity errors if external DTD is loaded When parsing with XML_PARSE_DTDLOAD, missing entities are always an error. Also consolidate behavior when validating. See b717abdd.

39e5b35b

2024-05-02T22:06:19

parser: Don't create undeclared entity refs in substitution mode We never want to create entity reference nodes if entity substitution is enabled. This also applies to undeclared entities.

45fe9924

2024-04-22T17:12:54

parser: Don't create reference in xmlLookupGeneralEntity This should only be done in xmlParseReference. The handling of undeclared entities is still somewhat inconsistent. In element content we create references even if entity substitution is enabled. In attribute values undeclared entities are always ignored.

b717abdd

2024-04-22T15:42:39

parser: Consolidate error handling for undeclared entities Always use XML_WAR_UNDECLARED_ENTITY with warning error level in documents with external subset or parameter entities. Use XML_ERR_UNDECLARED_ENTITY otherwise.

f506ec66

2024-04-15T11:27:44

parser: Always decode entities in namespace URIs Also decode entities in namespace URIs if entity substitution wasn't requested. This should fix some corner cases when comparing namespace URIs. The Namespaces in XML 1.0 spec says: > In a namespace declaration, the URI reference is the normalized value > of the attribute, so replacement of XML character and entity > references has already been done before any comparison. Make the serialization code escape special characters in namespace URIs like in attribute values. This fixes serialization if entities were substituted when parsing. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106

5bb84b47

2024-04-04T11:55:28

relaxng: Fix tree corruption in xmlRelaxNGParseNameClass Don't create cycles in tree structure. This will lead to an infinite loop or call stack overflow later. Closes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/711

f43197fc

2024-03-29T11:16:45

tree: Don't coalesce text nodes in xmlAdd{Prev,Next}Sibling Commit 9e1c72da from 2001 introduced a bug where xmlAddPrevSibling and xmlAddNextSibling would only try to merge text nodes with one of its new siblings. Commit 4ccd3eb8 fixed this bug but unfortunately, lxml and possibly other downstream code depend on text nodes not being merged. To avoid breaking downstream code while still having somewhat consistent API behavior, it's probably best to make these functions never coalesce text nodes.

4ccd3eb8

2024-03-11T19:43:56

tree: Refactor node insertion Also fixes a text coalescing bug.

186562a1

2024-03-12T19:55:33

parser: Fix detection of duplicate attributes in XML namespace Fixes a regression from commit e0dd330b, resulting in duplicate attributes in the predefined XML namespace not being detected or extraneous default attributes being passed. Fixes #704.

63986c45

2024-01-22T21:02:16

parser: Report fatal error if document entity couldn't be loaded Only lower error level when loading entities. Fixes #667.

29beef65

2024-01-02T21:50:38

parser: Pop inputs if parsing DTD failed This should provide some statistics in ctxt->sizeentcopy even in the error or recovery case.

f237e5b9

2024-01-05T15:40:23

parser: Avoid duplicate namespace errors Don't report an extra attribute uniqueness error if a namespace is undeclared. This matches old behavior.

07c05546

2024-01-04T02:48:02

error: Make xmlFormatError public This is a useful function to get a verbose error report. Allows to remove duplicated code from runtest.c. Also reactivate check for schema parser failures.

d0eb5a7e

2024-01-03T18:12:29

parser: Remove xmlErrEncodingInt Convert the last user to xmlFatalErr.

e8fb3d63

2024-01-02T17:45:54

parser: Convert some "internal errors" to meaningful codes

37c6618b

2023-12-30T02:50:34

parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.

f0dc52d0

2023-12-29T06:00:20

parser: Move cleanup of element stacks to xmlParseContent

d025cfbb

2023-12-27T03:53:24

parser: Always copy content from entity to target. Make sure that references from IDs are updated. Note that if there are IDs with the same value in a document, the last one will now be returned. IDs should be unique, but maybe this should be addressed.

4ecc85d2

2023-12-27T00:44:16

parser: Push general entity input streams on the stack This allows the error handler to give more context.

d944a415

2023-12-26T02:10:35

parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.

b8313b58

2023-12-26T21:59:08

xpath: Rewrite substring-before and substring-after Don't use buffers. Check malloc failures.

f3fa34dc

2023-12-26T22:37:26

parser: Fix general entity parsing Clear namespace database. Ignore non-fatal errors.

ecfbcc8a

2023-12-25T04:33:00

parser: Rework general entity parsing Don't create a new parser context but reuse the existing one. This exposes bug #601 in a more obvious way.

6e3a2ac6

2023-12-22T21:38:50

xinclude: Rework xml:base fixup The xml:base fixup was broken in more complex cases. Also avoid parsing and building the included URI multiple times.

f0df3e6d

2023-12-21T14:35:18

tests: Try to fix RelaxNG test cases These were added recently in ea695ac0 and 8074b881 but were a total mess of symbolic links and apparently mixed up files. Symbolic links don't work on Windows. Try to salvage one of the tests.

8d0aaf4b

2023-12-19T20:47:36

parser: Remove xmlErrEncoding Use xmlFatalErr or xmlCtxtErrIO.

7e511f35

2023-12-19T15:41:37

io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile This allows to report the reason why opening a file failed to the parser context and improve error messages. Now we can also remove the stat call before opening a file.

83c6aeef

2023-12-18T21:12:29

relaxng: Improve error handling Pass RelaxNG structured error handler to XML parser. Handle malloc failure from xmlRaiseError. Remove argument from memory error handler. Use xmlRaiseMemoryError. Don't use xmlGenericError. Remove TODO macro.

157df344

2023-12-10T18:23:53

xmlreader: Report malloc failures Fix many places where malloc failures aren't reported. Introduce a new API function xmlTextReaderGetLastError.

e58ea29f

2023-12-10T18:10:42

SAX2: Report malloc failures Fix many places where malloc failures aren't reported. Improve error handling when parsing entity declarations. Fixes #308.

a1f7ecae

2023-12-10T15:25:42

entities: Report malloc failures Fix places where malloc failures aren't reported. Introduce new API function xmlAddEntity that returns separate error codes. Don't invoke global error handler for low-level errors which should be handled by higher layers. Invalid redelcaration warnings will be fixed later.

7d446e97

2023-12-08T12:13:49

parser: Fix namespaces redefined from default attributes This regressed in commit e0dd330b. Also fixes a long-standing issue where namespaces from default attributes weren't added if they match an existing namespace. Fixes #643.

e3959461

2023-11-30T16:15:46

html: Reenable buggy detection of XML declarations Switch to UTF-8 if a document starts with '<?xm' to match old behavior. Also enable this check in the push parser. Fixes #637.

43b511fa

2023-11-26T14:31:39

parser: Make CRLF increment line number Partial revert of cb927e85 fixing CRLFs not incrementing the line number. This requires to rework xmlParseQNameHashed. The original implementation prompted the change to xmlCurrentChar which really shouldn't modify the 'cur' pointer as side effect. But the NEXTL macro relies on this behavior. Ultimately, we should reintroduce the change to xmlCurrentChar and fix the NEXTL macro. This will lead to single CRs incrementing the line number as well which seems more consistent. Fixes #628.

a2b5c90a

2023-11-21T14:35:54

hash: Fix deletion of entries during scan Functions like xmlCleanSpecialAttr scan a hash table and possibly delete entries in the callback. xmlHashScanFull must detect such deletions and rescan the entry. This regressed when rewriting the hash table code in 4a513d56. Fixes #626.

7a2d412f

2023-10-31T20:15:38

parser: Copy default namespace in xmlParseBalancedChunkMemory

e0c2f14d

2023-10-31T13:53:15

parser: Copy namespaces in xmlParseBalancedChunkMemory Reenable copying of namespaces but don't set SAX data. This should match the old behavior.

b76d81da

2023-10-06T11:50:29

parser: Fix regression when push parsing parameter entities Short-lived regression from 834b8123. Also shrink parameter entity buffers when push parsing.

134d2ad8

2023-10-06T00:31:44

parser: Protect against quadratic default attribute expansion

0ba22c05

2023-10-05T22:05:04

parser: Support encoded external PEs in entity values Corner case which was never supported.

6337a14a

2023-10-06T10:44:38

tests: Handle entities in SAX tests

kc3-lang/libxml2/result

result

Log