|
149c04c0
|
2025-08-02T14:59:02
|
|
html: Escape < and > when serializing attributes
This reverts the change in cdaf657f. Coincidentally, the HTML spec just
changed to mandate the old escaping behavior:
https://github.com/whatwg/html/issues/6235
Fixes #957.
|
|
0c948334
|
2025-07-10T11:23:44
|
|
html: Add newline to error message
|
|
bc0bb67b
|
2025-07-10T11:20:22
|
|
html: Don't abort on encoding errors
Always enable recovery mode when parsing HTML, so we don't raise fatal
errors.
Regressed with 462bf0b7. Fixes #947.
|
|
71e1e8af
|
2025-07-04T14:28:26
|
|
schematron: Fix memory safety issues in xmlSchematronReportOutput
Fix use-after-free (CVE-2025-49794) and type confusion (CVE-2025-49796)
in xmlSchematronReportOutput.
Fixes #931.
Fixes #933.
|
|
24d7e159
|
2025-07-04T12:19:20
|
|
schematron: Complete fix for CVE-2025-49795
- Fix memory leaks
- Fix tests
|
|
499bcb78
|
2025-06-21T12:11:30
|
|
Schematron: Fix null pointer dereference leading to DoS
(CVE-2025-49795)
Fixes #932
|
|
069bcda1
|
2025-06-20T23:05:00
|
|
Fix potential buffer overflows of interactive shell
CVE-2025-6170
Fixes #941
|
|
9760a14f
|
2025-06-30T13:47:33
|
|
relaxng: In the simplification step also unlink notAllowed refs from choice
This fixes false reports of non allowed content compared to notAllowed as tag within the choice tag.
|
|
ad0f5d27
|
2025-06-24T13:02:13
|
|
tree: Fix xmlGetNodePath
- Fix quadratic behavior
- Don't truncate names
Fixes #715.
|
|
ab06bfa1
|
2025-05-26T15:03:07
|
|
parser: Fix error return in xmlParseElementContentDecl
Avoid internal error later in xmlValidBuildAContentModel after
2a60ca06c.
Also avoids some unnecessary error messages.
|
|
5ec83f77
|
2025-05-20T03:21:27
|
|
valid: Remove duplicate #FIXED check for namespaces
Unlike the comment indicates, this is already checked.
|
|
7c10fff2
|
2025-05-20T22:48:25
|
|
valid: Don't validate twice in xmlAddAttributeDecl
This should only be done in xmlValidateAttributeDecl.
|
|
2f3655c9
|
2025-05-20T19:40:06
|
|
parser: Pop PEs that start markup declarations explicitly
We currently only handle "Validity constraint: Proper Declaration/PE
Nesting", but we must detect "Well-formedness constraint: PE Between
Declarations" separately:
> The replacement text of a parameter entity reference in a DeclSep must
> match the production extSubsetDecl.
PEs in DeclSeps are PEs that start with a full markup declaration (or
another PE). These are handled in xmParse{Internal|External}Subset. We
set a flag on these PEs and don't close them implicitly in
xmlSkipBlankCharsPE. This will make unterminated declarations in such
PEs cause a parser error. The PEs are closed explicitly in
xmParse{Internal|External}Subset, the only location where they are
allowed to end.
|
|
dd1961e0
|
2025-05-20T16:37:18
|
|
valid: Skip more validity checks if not validating
|
|
3a68d0b7
|
2025-05-19T18:59:51
|
|
SAX2: Handle xml:id errors separately
|
|
87087def
|
2025-05-13T16:19:42
|
|
tests: Remove result files committed by accident
|
|
f0983199
|
2025-05-12T13:00:20
|
|
html: Map some encodings according to HTML5
Windows-1252 is a superset of ISO-8859-1 and should be used instead.
Same for ASCII.
Also map UCS-2 and UTF-16 to UTF-16LE.
|
|
825f3a9d
|
2025-05-11T21:38:16
|
|
html: Always serialize attributes with double quotes
Align with HTML5.
|
|
cdaf657f
|
2025-05-09T23:02:32
|
|
html: Don't escape < and > when serializing attribute values
Align with HTML5.
This will break some test suites.
|
|
c8cea39d
|
2025-05-09T21:31:07
|
|
save: Fix serialization of attribute defaults containing <
Long-standing bug that produced invalid XML.
|
|
46f05ea4
|
2025-05-09T00:21:47
|
|
html: Rework meta charset handling
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.
Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.
Add full support for <meta charset=""> which is also used when adding
meta tags.
Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.
Only switch encoding once when parsing.
Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.
Fixes #909.
|
|
f3a080bc
|
2025-05-07T14:32:42
|
|
html: Ignore U+0000 in body text
Align with HTML5. Fixes #908.
|
|
6896f478
|
2025-04-18T17:22:36
|
|
Revert "valid: Remove duplicate error messages when streaming"
This reverts commit cd220b93d8ffffd2fb7cac0ec792bebb7e082521.
This commit broke the xmstarlet tests.
|
|
69b83bb6
|
2025-03-10T02:18:51
|
|
encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
|
|
05bd1720
|
2025-03-01T10:25:29
|
|
parser: Fix parsing of DTD content
Regressed in 2.11. Fixes #868.
|
|
9f86dae9
|
2024-12-15T14:27:05
|
|
test: Add test case for UAF in xmlSchemaIDCFillNodeTables
|
|
8cf6129b
|
2025-02-13T18:20:46
|
|
html: Stop implying <p> start tags
Only <html>, <head> or <body> should be implied. Opening extra <p> tags
has always been a libxml2 quirk.
|
|
71122421
|
2025-02-13T14:04:10
|
|
html: Make implied <p> tags more deterministic
libxml2's HTML parser adds <p> start tags in some situations. This
behavior, which doesn't follow any standard, was added in 2000, see
here: http://veillard.com/XML/messages/0655.html
Text nodes that only contain whitespace don't imply a <p> tag, but the
whitespace check cannot work reliably if we're parsing partial text data
which can happen with both pull and push parser.
The logic in `areBlanks` is hard to follow. The checks involving `CUR`
depend on the position of the input pointer and seem dubious. It's also
possible that the behavior changed inadvertently with a later commit.
As a result, it's hard to come up with good test cases.
We now process leading whitespace before creating implied tags. This is
more in line with HTML5 and should avoid at least some issues with
partial text data.
For example, parsing the string "<head> x" used to result in:
<html>
<head></head>
<body><p> x</p></body>
</html>
And now results in:
<html>
<head> </head>
<body><p>x</p></body>
</html>
Except for the implied <p> tag, this matches HTML5.
|
|
b4d3d87e
|
2025-02-01T22:02:33
|
|
parser: Fix parsing of doctype declarations
Fix some long-standing issues.
Fixes #504.
|
|
08028572
|
2025-02-01T18:21:47
|
|
html: Make data parsing modes work with push parser
This can't be solved with a simple scan for a terminator. Instead, we
make htmlParseCharData handle incomplete data if the "partial" flag is
set.
|
|
cd220b93
|
2024-12-27T14:55:43
|
|
valid: Remove duplicate error messages when streaming
|
|
45914614
|
2024-11-05T12:05:14
|
|
xpath: Fix parsing of non-ASCII names
Fix a long-standing issue where QNames starting with a non-ASCII
character would be rejected. This became more visible after "streaming"
XPath evaluation was disabled since the latter handled non-ASCII names
correctly.
Fixes #818.
|
|
ffb058f4
|
2024-10-28T20:12:52
|
|
parser: Fix detection of duplicate attributes
We really need a second scan if more than one namespace clash was
detected.
|
|
f77ec16d
|
2024-09-12T01:45:34
|
|
html: Optimize htmlParseCharData
|
|
575be6c1
|
2024-09-12T01:40:07
|
|
html: Fix line numbers with CRs
|
|
e179f3ec
|
2024-09-11T17:29:59
|
|
html: Stop reporting syntax errors
It doesn't make much sense to keep the old syntax error handling which
doesn't conform to HTML5.
Handling HTML5 parser errors is rather involved and not essential for
parsers.
|
|
c6af1017
|
2024-09-08T20:45:48
|
|
html: Test tokenizer against html5lib test suite
|
|
9678163f
|
2024-09-09T02:01:19
|
|
html: Don't check for valid XML characters
|
|
4eeac309
|
2024-09-08T22:20:20
|
|
html: Start to fix EOF and U+0000 handling
|
|
17da54c5
|
2024-09-08T19:16:12
|
|
html: Normalize newlines
|
|
3adb396d
|
2024-09-07T15:18:13
|
|
html: Parse bogus comments instead of ignoring them
Also treat XML processing instructions as bogus comments.
|
|
e1834745
|
2024-09-07T00:54:25
|
|
html: Add character data tests
|
|
f9ed30e9
|
2024-09-06T17:49:04
|
|
html: HTML5 character data states
|
|
59511792
|
2024-09-03T15:52:44
|
|
html: Parse named character references according to HTML5
|
|
a80f8b64
|
2023-05-04T15:59:31
|
|
html: Allow attributes in end tags
Attribute are syntactically allowed in HTML5 end tags but otherwise
ignored.
|
|
dcb2abb2
|
2023-05-04T15:16:29
|
|
html: Parse tag and attribute names according to HTML5
HTML5 allows bascially all characters in tag and attribute names.
|
|
bd9eed46
|
2024-09-02T18:37:41
|
|
parser: Make unsupported encodings an error in declarations
This was changed in 45157261, but in encoding declarations, unsupported
encodings should raise a fatal error.
Fixes #794.
|
|
8ae06d52
|
2024-08-29T00:07:27
|
|
SAX2: Don't merge CDATA sections
The Document Object Model (DOM) Level 3 Core Specification says:
> Adjacent CDATASection nodes are not merged by use of the normalize
> method of the Node interface.
Fixes #412.
|
|
322e733b
|
2024-07-18T19:27:43
|
|
xinclude: Fix fallback for text includes
Fixes #772.
|
|
842a0448
|
2024-07-03T11:46:06
|
|
valid: Restore ID lookup
Revert a change from d025cfbb and don't overwrite ID table entries, so
that the first attribute will be returned if there are duplicate IDs.
This requires two other changes:
- Attributes in entity content are never added to the ID table. This
seems reasonable.
- Remove the optimization to skip ID lookup when copying and the target
document has an empty ID table. This also seems more correct since the
document could have ID declarations nevertheless or we could be
copying xml:ids into the document for the first time.
Fixes #757.
|
|
30be984a
|
2024-06-28T20:37:47
|
|
encoding: Rework ISO-8859-X conversion
Optimize code. Pass tables as context parameter. Check for
XML_ENC_ERR_SPACE.
|
|
7c11da2d
|
2024-06-27T12:47:47
|
|
tests: Clarify licence of test/intsubset2.xml
|
|
b8903b9e
|
2024-06-22T17:55:46
|
|
runtest: Remove result handling from schemasOneTest
We only care about errors.
|
|
e68ccfa9
|
2024-06-22T16:42:36
|
|
tests: Port Schematron tests to C
|
|
1dd5e76a
|
2024-06-17T21:06:46
|
|
xinclude: Don't remove root element
Don't replace include element at root with empty nodeset.
|
|
52ce0d70
|
2024-06-17T17:35:12
|
|
tests: Add XInclude test for issue #733
|
|
2608baaf
|
2024-06-14T19:42:40
|
|
parser: Make failure to load main document a warning
Revert the change that made failures to load the main document an error.
This fixes the --path option of xmllint and xsltproc.
Should fix #733.
|
|
669bd349
|
2024-06-12T18:20:01
|
|
xpointer: Remove support for XPointer locations
The latest spec for what it essentially an XPath extension seems to be
this working draft from 2002:
https://www.w3.org/TR/xptr-xpointer/
The xpointer() scheme is listed as "being reviewed" in the XPointer
registry since at least 2006. libxml2 seems to be the only modern
software that tries to implement this spec, but the code has many bugs
and quality issues.
If you configure --with-legacy, old symbols are retained for ABI
compatibility.
|
|
4fefba4c
|
2024-05-15T17:52:20
|
|
parser: Rework handling of undeclared entities
Throw an error if entity substitution was requested.
Now we only downgrade to a warning if
- XML_PARSE_DTDLOAD wasn't specified, and
- entity aren't substituted or XML_PARSE_NO_XXE was specified.
Should fix #724.
|
|
fdc5ff36
|
2024-05-02T16:23:04
|
|
parser: Always throw entity errors if external DTD is loaded
When parsing with XML_PARSE_DTDLOAD, missing entities are always an
error.
Also consolidate behavior when validating. See b717abdd.
|
|
39e5b35b
|
2024-05-02T22:06:19
|
|
parser: Don't create undeclared entity refs in substitution mode
We never want to create entity reference nodes if entity substitution
is enabled. This also applies to undeclared entities.
|
|
45fe9924
|
2024-04-22T17:12:54
|
|
parser: Don't create reference in xmlLookupGeneralEntity
This should only be done in xmlParseReference.
The handling of undeclared entities is still somewhat inconsistent. In
element content we create references even if entity substitution is
enabled. In attribute values undeclared entities are always ignored.
|
|
b717abdd
|
2024-04-22T15:42:39
|
|
parser: Consolidate error handling for undeclared entities
Always use XML_WAR_UNDECLARED_ENTITY with warning error level in
documents with external subset or parameter entities. Use
XML_ERR_UNDECLARED_ENTITY otherwise.
|
|
f506ec66
|
2024-04-15T11:27:44
|
|
parser: Always decode entities in namespace URIs
Also decode entities in namespace URIs if entity substitution wasn't
requested. This should fix some corner cases when comparing namespace
URIs. The Namespaces in XML 1.0 spec says:
> In a namespace declaration, the URI reference is the normalized value
> of the attribute, so replacement of XML character and entity
> references has already been done before any comparison.
Make the serialization code escape special characters in namespace URIs
like in attribute values. This fixes serialization if entities were
substituted when parsing.
Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
|
|
5bb84b47
|
2024-04-04T11:55:28
|
|
relaxng: Fix tree corruption in xmlRelaxNGParseNameClass
Don't create cycles in tree structure. This will lead to an infinite
loop or call stack overflow later.
Closes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/711
|
|
f43197fc
|
2024-03-29T11:16:45
|
|
tree: Don't coalesce text nodes in xmlAdd{Prev,Next}Sibling
Commit 9e1c72da from 2001 introduced a bug where xmlAddPrevSibling and
xmlAddNextSibling would only try to merge text nodes with one of its
new siblings. Commit 4ccd3eb8 fixed this bug but unfortunately, lxml
and possibly other downstream code depend on text nodes not being
merged.
To avoid breaking downstream code while still having somewhat
consistent API behavior, it's probably best to make these functions
never coalesce text nodes.
|
|
4ccd3eb8
|
2024-03-11T19:43:56
|
|
tree: Refactor node insertion
Also fixes a text coalescing bug.
|
|
186562a1
|
2024-03-12T19:55:33
|
|
parser: Fix detection of duplicate attributes in XML namespace
Fixes a regression from commit e0dd330b, resulting in duplicate
attributes in the predefined XML namespace not being detected or
extraneous default attributes being passed.
Fixes #704.
|
|
63986c45
|
2024-01-22T21:02:16
|
|
parser: Report fatal error if document entity couldn't be loaded
Only lower error level when loading entities.
Fixes #667.
|
|
29beef65
|
2024-01-02T21:50:38
|
|
parser: Pop inputs if parsing DTD failed
This should provide some statistics in ctxt->sizeentcopy even in the
error or recovery case.
|
|
f237e5b9
|
2024-01-05T15:40:23
|
|
parser: Avoid duplicate namespace errors
Don't report an extra attribute uniqueness error if a namespace is
undeclared. This matches old behavior.
|
|
07c05546
|
2024-01-04T02:48:02
|
|
error: Make xmlFormatError public
This is a useful function to get a verbose error report.
Allows to remove duplicated code from runtest.c. Also reactivate check
for schema parser failures.
|
|
d0eb5a7e
|
2024-01-03T18:12:29
|
|
parser: Remove xmlErrEncodingInt
Convert the last user to xmlFatalErr.
|
|
e8fb3d63
|
2024-01-02T17:45:54
|
|
parser: Convert some "internal errors" to meaningful codes
|
|
37c6618b
|
2023-12-30T02:50:34
|
|
parser: Rework parsing of attribute and entity values
Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.
Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.
Normalize attribute values in a single pass while expanding entities.
Be more lenient in recovery mode.
If no entity substitution was requested, validate entities without
expanding. Fixes #596.
Also fixes #655.
|
|
f0dc52d0
|
2023-12-29T06:00:20
|
|
parser: Move cleanup of element stacks to xmlParseContent
|
|
d025cfbb
|
2023-12-27T03:53:24
|
|
parser: Always copy content from entity to target.
Make sure that references from IDs are updated.
Note that if there are IDs with the same value in a document, the last
one will now be returned. IDs should be unique, but maybe this should be
addressed.
|
|
4ecc85d2
|
2023-12-27T00:44:16
|
|
parser: Push general entity input streams on the stack
This allows the error handler to give more context.
|
|
d944a415
|
2023-12-26T02:10:35
|
|
parser: Fix in-parameter-entity and in-external-dtd checks
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.
Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
|
|
b8313b58
|
2023-12-26T21:59:08
|
|
xpath: Rewrite substring-before and substring-after
Don't use buffers. Check malloc failures.
|
|
f3fa34dc
|
2023-12-26T22:37:26
|
|
parser: Fix general entity parsing
Clear namespace database.
Ignore non-fatal errors.
|
|
ecfbcc8a
|
2023-12-25T04:33:00
|
|
parser: Rework general entity parsing
Don't create a new parser context but reuse the existing one.
This exposes bug #601 in a more obvious way.
|
|
6e3a2ac6
|
2023-12-22T21:38:50
|
|
xinclude: Rework xml:base fixup
The xml:base fixup was broken in more complex cases.
Also avoid parsing and building the included URI multiple times.
|
|
f0df3e6d
|
2023-12-21T14:35:18
|
|
tests: Try to fix RelaxNG test cases
These were added recently in ea695ac0 and 8074b881 but were a total mess
of symbolic links and apparently mixed up files.
Symbolic links don't work on Windows.
Try to salvage one of the tests.
|
|
8d0aaf4b
|
2023-12-19T20:47:36
|
|
parser: Remove xmlErrEncoding
Use xmlFatalErr or xmlCtxtErrIO.
|
|
7e511f35
|
2023-12-19T15:41:37
|
|
io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile
This allows to report the reason why opening a file failed to the parser
context and improve error messages. Now we can also remove the stat call
before opening a file.
|
|
83c6aeef
|
2023-12-18T21:12:29
|
|
relaxng: Improve error handling
Pass RelaxNG structured error handler to XML parser.
Handle malloc failure from xmlRaiseError.
Remove argument from memory error handler.
Use xmlRaiseMemoryError.
Don't use xmlGenericError.
Remove TODO macro.
|
|
157df344
|
2023-12-10T18:23:53
|
|
xmlreader: Report malloc failures
Fix many places where malloc failures aren't reported.
Introduce a new API function xmlTextReaderGetLastError.
|
|
e58ea29f
|
2023-12-10T18:10:42
|
|
SAX2: Report malloc failures
Fix many places where malloc failures aren't reported.
Improve error handling when parsing entity declarations.
Fixes #308.
|
|
a1f7ecae
|
2023-12-10T15:25:42
|
|
entities: Report malloc failures
Fix places where malloc failures aren't reported.
Introduce new API function xmlAddEntity that returns separate error
codes.
Don't invoke global error handler for low-level errors which should be
handled by higher layers.
Invalid redelcaration warnings will be fixed later.
|
|
7d446e97
|
2023-12-08T12:13:49
|
|
parser: Fix namespaces redefined from default attributes
This regressed in commit e0dd330b.
Also fixes a long-standing issue where namespaces from default
attributes weren't added if they match an existing namespace.
Fixes #643.
|
|
e3959461
|
2023-11-30T16:15:46
|
|
html: Reenable buggy detection of XML declarations
Switch to UTF-8 if a document starts with '<?xm' to match old behavior.
Also enable this check in the push parser.
Fixes #637.
|
|
43b511fa
|
2023-11-26T14:31:39
|
|
parser: Make CRLF increment line number
Partial revert of cb927e85 fixing CRLFs not incrementing the line
number.
This requires to rework xmlParseQNameHashed. The original implementation
prompted the change to xmlCurrentChar which really shouldn't modify the
'cur' pointer as side effect. But the NEXTL macro relies on this
behavior.
Ultimately, we should reintroduce the change to xmlCurrentChar and fix
the NEXTL macro. This will lead to single CRs incrementing the line
number as well which seems more consistent.
Fixes #628.
|
|
a2b5c90a
|
2023-11-21T14:35:54
|
|
hash: Fix deletion of entries during scan
Functions like xmlCleanSpecialAttr scan a hash table and possibly delete
entries in the callback. xmlHashScanFull must detect such deletions and
rescan the entry.
This regressed when rewriting the hash table code in 4a513d56.
Fixes #626.
|
|
7a2d412f
|
2023-10-31T20:15:38
|
|
parser: Copy default namespace in xmlParseBalancedChunkMemory
|
|
e0c2f14d
|
2023-10-31T13:53:15
|
|
parser: Copy namespaces in xmlParseBalancedChunkMemory
Reenable copying of namespaces but don't set SAX data. This should
match the old behavior.
|
|
b76d81da
|
2023-10-06T11:50:29
|
|
parser: Fix regression when push parsing parameter entities
Short-lived regression from 834b8123.
Also shrink parameter entity buffers when push parsing.
|
|
134d2ad8
|
2023-10-06T00:31:44
|
|
parser: Protect against quadratic default attribute expansion
|
|
0ba22c05
|
2023-10-05T22:05:04
|
|
parser: Support encoded external PEs in entity values
Corner case which was never supported.
|
|
6337a14a
|
2023-10-06T10:44:38
|
|
tests: Handle entities in SAX tests
|