|
69b83bb6
|
2025-03-10T02:18:51
|
|
encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
|
|
8696ebe1
|
2025-03-11T14:32:35
|
|
parser: Fix ignorableWhitespace callback
If ignorableWhitespace differs from the "characters" callback, we have
to check for blanks as well.
Regressed with 1f5b537.
|
|
25490528
|
2025-03-11T10:54:34
|
|
parser: Fix spurious error in SAX mode
Short-lived regression from 5f0b1378.
|
|
5f0b1378
|
2025-03-08T22:07:15
|
|
parser: Add more parser context accessors
Fixes #763.
|
|
94d8a3e2
|
2025-03-05T14:56:46
|
|
parser: Convert xmlParserMaxDepth to macro
|
|
03a8d5f9
|
2025-03-04T16:00:08
|
|
unicode: Make Unicode functions private
|
|
cdc5cfed
|
2025-03-04T13:26:51
|
|
legacy: Remove legacy symbols
|
|
c42b3227
|
2025-03-04T13:11:18
|
|
parser: Convert inputPush and inputPop to macros
|
|
361f7bff
|
2025-03-04T13:02:36
|
|
parser: Make nodePush, nodePop, namePush, namePop private
|
|
05bd1720
|
2025-03-01T10:25:29
|
|
parser: Fix parsing of DTD content
Regressed in 2.11. Fixes #868.
|
|
e50d314a
|
2025-02-25T23:07:19
|
|
build: Add separate configuration option for RELAX NG
Support for RELAX NG used to be enabled together with XML Schema support
(--with-schemas). Now there's a separate option and a new feature macro
LIBXML_RELAXNG_ENABLED.
|
|
b4d3d87e
|
2025-02-01T22:02:33
|
|
parser: Fix parsing of doctype declarations
Fix some long-standing issues.
Fixes #504.
|
|
57e4bbd8
|
2025-01-31T16:45:35
|
|
parser: Improve handling of NOCDATA option
Don't modify the callback structure. This makes sure that unsetting the
option works.
|
|
1f5b5371
|
2025-01-31T16:21:20
|
|
parser: Improve handling of NOBLANKS option
Don't change the SAX handler.
Use a helper function to invoke "characters" SAX callback.
The old code didn't advance the input pointer consistently before
invoking the callback. There was also some inconsistency wrt to
ctxt->space handling. I don't understand the ctxt->space thing, but
now we always behave like the non-complex case before.
|
|
7a8722f5
|
2025-01-31T14:55:29
|
|
parser: Document that XML_PARSE_NOBLANKS is broken
Long text content can generate multiple "characters" callbacks which can
lead to NOBLANKS removing whitespace in non-whitespace text nodes. So
the NOBLANKS option doesn't even work reliably with the pull parser.
This would be extremely hard to fix.
Unfortunately, `xmllint --format` relies on this option which is another
reason why this feature never really worked.
|
|
9efe1414
|
2025-01-31T13:07:35
|
|
parser: Fix detection of ']]>' when push-parsing
Fixes #850.
|
|
115b13f9
|
2025-01-30T23:18:56
|
|
parser: Document push parser limitations
|
|
53a48468
|
2025-01-30T15:15:30
|
|
xmllint: Make --push report parse errors
The push parser leaves documents in ctxt->myDoc even if they're invalid.
Also fix documentation.
Regressed with f8ff4d86.
|
|
5535721f
|
2025-01-30T01:27:03
|
|
parser: Grow input buffer after lots of whitespace
Make sure that the input buffer is grown after consuming large amounts
of whitespace.
Also move a comment.
|
|
218264fa
|
2025-01-30T01:26:01
|
|
parser: Always shrink input buffer
Shrinking the input buffer is cheap now and should be done as soon as
possible.
|
|
93506d41
|
2025-01-29T00:17:01
|
|
parser: Make catalog PIs opt-in
This is an obscure feature that shouldn't be enabled by default.
|
|
1082d813
|
2025-01-28T23:21:34
|
|
parser: Prepare to make decompression opt-in
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
|
|
a78843be
|
2025-01-28T20:13:58
|
|
xmllint: Support compressed input from stdin
Another regression related to reading from stdin.
Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.
This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.
Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
|
|
ca819160
|
2025-01-03T20:50:08
|
|
include: Use intptr_t to cast between pointers and ints
|
|
2e3a91a7
|
2024-12-26T21:05:18
|
|
doc: Fix documentation
|
|
8231c036
|
2024-12-15T23:36:04
|
|
parser: Check reallocations for overflow
|
|
6548ba11
|
2024-12-13T16:37:40
|
|
parser: Fix argument checks in xmlCtxtParse*
- Raise invalid argument error.
- Free input stream if ctxt is NULL.
|
|
eae9a1bd
|
2024-11-26T14:18:22
|
|
parser: Pop input stream in xmlCtxtValidateDtd
|
|
dafcefb2
|
2024-11-25T22:22:26
|
|
parser: Fail on catastrophic errors in recovery mode
|
|
0dc26910
|
2024-11-20T21:04:19
|
|
parser: Deprecate more internal functions
|
|
84a6eece
|
2024-11-18T20:40:47
|
|
parser: Remove unneeded call to xmlDetectEncoding
|
|
497081ba
|
2024-11-17T20:25:07
|
|
parser: Remove remaining calls to xml{Push|Pop}Input
|
|
0f4f8900
|
2024-11-17T20:13:14
|
|
parser: Rename inputPush to xmlCtxtPushInput
|
|
e2ad249c
|
2024-11-17T19:48:44
|
|
parser: Deprecate more internal symbols
- xmlParseExternalSubset
- xmlPushInput
- xmlPopInput
- xmlCopyCharMultiByte
- xmlCreateEntityParserCtxt
- xmlStringComment
|
|
631778f6
|
2024-11-17T12:11:41
|
|
parser: Check for malloc failure in xmlCtxtParseDtd
|
|
7f8c436c
|
2024-11-15T16:30:52
|
|
parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd
This allows to use the context's error handler, options and other
settings.
Fixes #808.
|
|
aaecdc92
|
2024-11-12T16:42:36
|
|
parser: Assign value without if-statement
This avoids an if-statement, because effectively it does nothing. And,
for example, binary artifact generated by GCC with -O2 optimization
settings does not contain that if-statement -- the code just uses the
hprefix->name field explicitly.
No functional changes intended.
Signed-off-by: Ruslan Garipov <ruslanngaripov@gmail.com>
|
|
869e3fd4
|
2024-11-01T16:52:31
|
|
parser: Fix loading of parameter entities in external DTDs
Regressed with commit 12f0bb94.
Fixes #816.
|
|
efb57ddb
|
2024-10-30T14:02:36
|
|
parser: Fix downstream code that swaps DTDs
Downstream code like the nginx xslt module can change the document's DTD
pointers in a SAX callback. If an entity from a separate DTD is parsed
lazily, its content must not reference the current document.
Regressed with commit d025cfbb.
Fixes #815.
|
|
0ec5687e
|
2024-10-28T20:41:56
|
|
parser: Rework xmlCtxtGrowAttrs
Remove unneeded argument.
Check for integer overflow. We probably hit the buffer size limit in
xmlParserGrow before, but better be safe.
|
|
ffb058f4
|
2024-10-28T20:12:52
|
|
parser: Fix detection of duplicate attributes
We really need a second scan if more than one namespace clash was
detected.
|
|
b52a3044
|
2024-10-24T18:18:47
|
|
parser: Use counted_by attribute if supported
We only have a single struct with a flexible array member.
|
|
74dfc49b
|
2024-09-26T21:24:00
|
|
parser: Clarify logic in xmlParseStartTag2
|
|
0bc4608c
|
2024-09-15T20:28:49
|
|
html: Use hash table to check for duplicate attributes
|
|
0ce7bfe5
|
2024-09-12T01:44:18
|
|
html: Try to avoid passing XML options to HTML parser
|
|
16de1346
|
2024-09-11T19:05:38
|
|
parser: Make new options actually work
|
|
dde62ae5
|
2024-08-28T23:58:20
|
|
parser: Align push parsing of CDATA sections with pull parser
Remove special handling of CDATA sections in push parser. This makes
sure that only a single callback is generated for large sections.
Fixes #22 and needed for #412.
|
|
4d10e53a
|
2024-08-28T22:47:20
|
|
parser: Make sure to set and increment input id
Revert part of commits 410931e3 and b9d2f3c9.
|
|
6d365ca0
|
2024-08-28T22:09:30
|
|
doc: XML_PARSE_NO_XXE is available since 2.13.0
|
|
103aadbc
|
2024-08-14T23:15:30
|
|
parser: Suppress EDG maybe-uninitialized warning
|
|
02fcb1ef
|
2024-07-25T17:07:18
|
|
parser: Make xmlParseChunk return an error if parser was stopped
This regressed after enhancing the disableSAX member in 2.13.
Should fix #777.
|
|
1a893230
|
2024-07-06T01:03:46
|
|
[CVE-2024-40896] Fix XXE protection in downstream code
Some users set an entity's children manually in the getEntity SAX
callback to restrict entity expansion. This stopped working after
renaming the "checked" member of xmlEntity, making at least one
downstream project and its dependants susceptible to XXE attacks.
See #761.
|
|
6a3c0b0d
|
2024-07-22T12:53:00
|
|
parser: Increase XML_MAX_DICTIONARY_LIMIT
This limit is somewhat arbitrary and can be reached when fuzzing
documents up to 1 MB.
Increase limit to 100 MB and disable limit if XML_PARSE_HUGE is set.
|
|
5d36664f
|
2024-07-16T00:35:53
|
|
memory: Deprecate xmlGcMemSetup
|
|
7148b778
|
2024-07-07T16:11:08
|
|
parser: Optimize memory buffer I/O
Reenable zero-copy IO for zero-terminated static memory buffers.
Don't stream zero-terminated dynamic memory buffers on top of creating
a copy.
|
|
34c9108f
|
2024-07-07T18:38:31
|
|
encoding: Add sizeOut argument to xmlCharEncInput
When push parsing, we want to convert as much of the input as possible.
When pull parsing memory buffers, we want to convert data chunk by chunk
to save memory.
|
|
6be79014
|
2024-07-15T14:18:26
|
|
Remove unused code
|
|
fee0006a
|
2024-07-15T13:03:55
|
|
parser: Fix memory leak after malloc failure in xml*ParseDTD
|
|
8af55c8d
|
2024-07-06T22:14:21
|
|
parser: Rename new input API functions
These weren't made public yet.
|
|
d74ca594
|
2024-07-06T22:04:06
|
|
parser: Rename internal xmlNewInput functions
|
|
4f329dc5
|
2024-07-10T03:27:47
|
|
parser: Implement xmlCtxtParseContent
This implements xmlCtxtParseContent, a better alternative to
xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a
parser context and a parser input, making it a lot more versatile.
xmlParseInNodeContext is now implemented in terms of
xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never
modifies the target document, improving thread safety.
xmlParseInNodeContext is also more lenient now with regard to undeclared
entities.
Fixes #727.
|
|
f51ad063
|
2024-07-08T11:23:39
|
|
parser: Fix error return of xmlParseBalancedChunkMemory
Only return an error code if the chunk is not well-formed to match the
2.12 behavior. Return 0 on non-fatal errors like invalid namespaces.
Fixes #765.
|
|
2e63656e
|
2024-07-07T19:21:46
|
|
parser: Check return value of inputPush
inputPush typically doesn't fail because we pre-allocate the input
table. The return value should be checked nevertheless.
|
|
1e5375c1
|
2024-07-06T15:15:57
|
|
SAX2: Check return value of xmlPushInput
Fix null deref in case of malloc failure.
|
|
38195cf5
|
2024-07-06T14:58:16
|
|
parser: Don't produce names with invalid UTF-8 in recovery mode
|
|
fdfeecfe
|
2024-07-02T21:54:26
|
|
parser: Reenable ctxt->directory
Unused internally, but used in downstream code.
Should fix #753.
|
|
606f4108
|
2024-07-02T20:57:15
|
|
parser: Allow to disable catalogs with parser options
Implement XML_PARSE_NO_SYS_CATALOG and XML_PARSE_NO_CATALOG_PI.
Fixes #735.
|
|
866be54e
|
2024-07-02T04:27:53
|
|
parser: Don't use deprecated xmlSplitQName
|
|
bc793390
|
2024-06-27T16:23:14
|
|
parser: Update documentation
|
|
eca972e6
|
2024-06-26T02:22:04
|
|
parser: Add getters for XML declaration to parser context
Access to struct members will be deprecated.
|
|
bbbbbb46
|
2024-06-20T03:19:48
|
|
parser: implement xmlCtxtGetOptions
In 712a31ab, the `options` struct member was deprecated. To allow
callers to check the status of options bits, introduce
xmlCtxtGetOptions.
|
|
217e9b7a
|
2024-06-08T12:27:45
|
|
clang-tidy: don't return in void functions
Found with readability-redundant-control-flow
Signed-off-by: Rosen Penev <rosenp@gmail.com>
|
|
32cac377
|
2024-06-17T17:59:49
|
|
parser: Selectively reenable reading from "-"
Make filename "-" mean stdin for legacy SAX1 functions and xmlReadFile.
This should hopefully fix most command line utilities.
See #737.
|
|
33a1f897
|
2024-06-16T19:16:47
|
|
legacy: Merge SAX.c into legacy.c
|
|
10d60d15
|
2024-06-16T00:04:46
|
|
regexp: Stop using LIBXML_AUTOMATA_ENABLED
This macro always equals LIBXML_REGEXP_ENABLED.
|
|
b0fc67aa
|
2024-06-15T22:53:55
|
|
build: Remove --with-tree configuration option
This option would allow for a smaller, but mostly useless minimal build.
But it complicates the symbol availability logic in an insane way and
requires specialized tools like our custom C parser in doc/apibuild.py.
See #717.
|
|
039ce1e8
|
2024-06-14T16:41:43
|
|
parser: Pass global object to sax->setDocumentLocator
Revert part of commit c011e760.
Fixes #732.
|
|
dba1ed85
|
2024-06-12T18:19:55
|
|
ftp: Remove FTP support
Remove the built-in FTP client. If you configure --with-legacy, old
symbols are retained for ABI compatibility.
|
|
52384043
|
2024-06-11T19:10:41
|
|
parser: Pass resource type to resource loader
|
|
89fcae4d
|
2024-06-11T16:19:58
|
|
parser: Don't report malloc failures when creating context
We don't want messages to stderr before an error handler could be set on
a parser context.
|
|
410931e3
|
2024-06-11T00:55:38
|
|
parser: Only set input ID for PE refs
Other input streams don't require IDs.
|
|
ff3b0919
|
2024-06-11T00:00:32
|
|
parser: Implement XML_PARSE_NO_UNZIP option
|
|
47cbb6bb
|
2024-06-10T14:04:00
|
|
doc: Don't mention xmlNewInputURL
|
|
8318b5a6
|
2024-06-09T14:22:53
|
|
parser: Fix NULL checks for output arguments
|
|
0cde1b78
|
2024-06-06T23:50:03
|
|
parser: Fix "Truncated multi-byte sequence" error
Don't raise the error if decoding failed.
|
|
122b6130
|
2024-06-04T16:33:02
|
|
parser: Fix performance regression when parsing namespaces
The namespace hash table didn't reuse deleted buckets, leading to
quadratic behavior.
Also ignore deleted buckets when resizing.
Fixes #726.
|
|
a7e26707
|
2024-06-03T14:04:44
|
|
parser: Don't overwrite OOM errors in xmlSBuf
|
|
e75e878e
|
2024-05-20T13:58:22
|
|
doc: Update and fix documentation
|
|
4fefba4c
|
2024-05-15T17:52:20
|
|
parser: Rework handling of undeclared entities
Throw an error if entity substitution was requested.
Now we only downgrade to a warning if
- XML_PARSE_DTDLOAD wasn't specified, and
- entity aren't substituted or XML_PARSE_NO_XXE was specified.
Should fix #724.
|
|
4ff2dccf
|
2024-05-10T02:04:52
|
|
SAX2: Warn if URI resolution failed
|
|
4fe116eb
|
2024-05-10T00:05:44
|
|
parser: Don't report error on invalid URI
Only fragment identifiers are an error.
This removes the last user of xmlErrMsg*. Now every error reported by
the parser should result in one of ctxt->wellFormed, ctxt->nsWellFormed
or ctxt->valid being set to zero.
|
|
a4c2b723
|
2024-05-05T17:26:31
|
|
io: Don't set close callback in xmlParserInputBufferCreateFd
|
|
fdc5ff36
|
2024-05-02T16:23:04
|
|
parser: Always throw entity errors if external DTD is loaded
When parsing with XML_PARSE_DTDLOAD, missing entities are always an
error.
Also consolidate behavior when validating. See b717abdd.
|
|
39e5b35b
|
2024-05-02T22:06:19
|
|
parser: Don't create undeclared entity refs in substitution mode
We never want to create entity reference nodes if entity substitution
is enabled. This also applies to undeclared entities.
|
|
1cdfece1
|
2024-04-28T18:33:40
|
|
memory: Remove memory debugging
This is useless compared to sanitizers or valgrind and has a
considerable performance impact if enabled accidentally.
|
|
45fe9924
|
2024-04-22T17:12:54
|
|
parser: Don't create reference in xmlLookupGeneralEntity
This should only be done in xmlParseReference.
The handling of undeclared entities is still somewhat inconsistent. In
element content we create references even if entity substitution is
enabled. In attribute values undeclared entities are always ignored.
|
|
b717abdd
|
2024-04-22T15:42:39
|
|
parser: Consolidate error handling for undeclared entities
Always use XML_WAR_UNDECLARED_ENTITY with warning error level in
documents with external subset or parameter entities. Use
XML_ERR_UNDECLARED_ENTITY otherwise.
|
|
f506ec66
|
2024-04-15T11:27:44
|
|
parser: Always decode entities in namespace URIs
Also decode entities in namespace URIs if entity substitution wasn't
requested. This should fix some corner cases when comparing namespace
URIs. The Namespaces in XML 1.0 spec says:
> In a namespace declaration, the URI reference is the normalized value
> of the attribute, so replacement of XML character and entity
> references has already been done before any comparison.
Make the serialization code escape special characters in namespace URIs
like in attribute values. This fixes serialization if entities were
substituted when parsing.
Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
|
|
2840e33c
|
2024-03-04T07:34:25
|
|
tree: Allocate XML namespace statically
|
|
186562a1
|
2024-03-12T19:55:33
|
|
parser: Fix detection of duplicate attributes in XML namespace
Fixes a regression from commit e0dd330b, resulting in duplicate
attributes in the predefined XML namespace not being detected or
extraneous default attributes being passed.
Fixes #704.
|