|
25ae533b
|
2025-02-17T11:27:30
|
|
xmllint: Fix SIGBUS with --memory option
If the input file size is a multiple of page size, the byte after the
file's content is on a new page and accessing it will lead to SIGBUS.
Remove XML_INPUT_BUF_ZERO_TERMINATED hint for mmapped files.
Regressed with a221cd78.
Fixes #864.
|
|
7a61c32b
|
2025-02-13T23:09:28
|
|
html: Use enum instead of magic values for insertion modes
|
|
3793eaad
|
2025-02-16T13:54:56
|
|
fuzz: Fix build
|
|
69b91da3
|
2025-02-13T19:45:41
|
|
Revert "xpath: Make contextSize and proximityPosition default to 1"
This reverts commit afbc0a0405236de4ab8cbac94745e9885db0a198.
|
|
9c16a153
|
2025-02-13T18:41:33
|
|
Revert "include: Make most IS_* macros private"
This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
|
|
6c716d49
|
2025-02-13T16:48:53
|
|
pattern: Fix compilation of explicit child axis
The child axis is the default axis and should generate XML_OP_ELEM like
the case without an axis.
|
|
8cf6129b
|
2025-02-13T18:20:46
|
|
html: Stop implying <p> start tags
Only <html>, <head> or <body> should be implied. Opening extra <p> tags
has always been a libxml2 quirk.
|
|
71122421
|
2025-02-13T14:04:10
|
|
html: Make implied <p> tags more deterministic
libxml2's HTML parser adds <p> start tags in some situations. This
behavior, which doesn't follow any standard, was added in 2000, see
here: http://veillard.com/XML/messages/0655.html
Text nodes that only contain whitespace don't imply a <p> tag, but the
whitespace check cannot work reliably if we're parsing partial text data
which can happen with both pull and push parser.
The logic in `areBlanks` is hard to follow. The checks involving `CUR`
depend on the position of the input pointer and seem dubious. It's also
possible that the behavior changed inadvertently with a later commit.
As a result, it's hard to come up with good test cases.
We now process leading whitespace before creating implied tags. This is
more in line with HTML5 and should avoid at least some issues with
partial text data.
For example, parsing the string "<head> x" used to result in:
<html>
<head></head>
<body><p> x</p></body>
</html>
And now results in:
<html>
<head> </head>
<body><p>x</p></body>
</html>
Except for the implied <p> tag, this matches HTML5.
|
|
ebbc31cc
|
2025-02-13T12:09:58
|
|
malloc-fail: Check for malloc failure in xhtmlNodeDumpOutput
|
|
79ab721c
|
2025-02-11T11:39:08
|
|
tests: Fix error return in testHugeEncodedChunk
Fixes #859.
|
|
cfc854b8
|
2025-02-11T00:21:12
|
|
fuzz: Work around glibc iconv() bug
|
|
3a1526a5
|
2025-02-10T19:32:32
|
|
xpath: Don't raise OOM error on long names
Short-lived regression.
|
|
3dcde736
|
2025-02-05T15:18:48
|
|
Use __has_attribute to check for __counted_by__ support
The initial clang patch to support __counted_by__ was landed and
reverted several times. There are some clang toolchains (e.g. the
Android toolchain) that report themselves as version 18 but do not
support __counted_by__. While it is debatable if Android should be
shipping a pre-release clang, using __has_attribute should be a bit
simpler overall.
Note that this doesn't migrate everything else to use __has_attribute:
while clang has always supported __has_attribute, gcc didn't support
it until a bit later.
|
|
35d8a230
|
2025-02-06T10:14:56
|
|
tests: Fix expected errors in runxmlconf
The extra failure if regexps weren't enabled was actually a regression
fixed by the previous commit.
|
|
b466e70a
|
2025-02-05T14:11:04
|
|
Fix early return in vstateVPush in valid.c
While looking over the code in the fallback method for `vstateVPush` in
valid.c when `LIBXML_REGEXP_ENABLED` is not defined, I noticed that
there is an ungated `return(-1)` after attempting to allocate memory.
I believe this should be inside a check, for if the malloc fails.
|
|
62d4697d
|
2025-02-02T16:43:25
|
|
gitlab-ci: Disable cmake:mingw for now
Executing /mingw64/bin/cmake.exe with any arguments fails without error
message and exit code 127 since 2025-01-21. I have no idea why.
|
|
a25dc439
|
2025-02-02T15:01:50
|
|
Debug CI failure
|
|
cd491ac0
|
2025-02-02T13:13:20
|
|
dict: Handle ENOSYS from getentropy gracefully
Also add some comments.
Should fix #854.
|
|
bc437868
|
2025-01-31T23:11:55
|
|
fuzz: Improve HTML fuzzer
Verify that pull and push parser produce the same result.
Fixes #849.
|
|
c4f760be
|
2025-02-01T15:29:56
|
|
encoding: Handle iconv() returning EOPNOTSUPP on Apple
iconv() really shouldn't return undocumented error codes.
|
|
8d7e38d5
|
2025-02-01T22:41:53
|
|
fuzz: Ignore encodings when fuzzing on Apple
Not long ago, Apple decided to replace GNU libiconv with a patched up
version of FreeBSD's iconv implementation in their operating systems.
Unfortunately, the quality of both the original implementation as well
as Apple's patches is so abysmal that you routinely find issues when
fuzzing your own code.
|
|
68be036f
|
2025-02-01T22:09:18
|
|
fuzz: Disable HTML encoding detection for now
This doesn't work with the push parser.
|
|
b4d3d87e
|
2025-02-01T22:02:33
|
|
parser: Fix parsing of doctype declarations
Fix some long-standing issues.
Fixes #504.
|
|
c13fcc19
|
2025-02-01T19:36:06
|
|
html: Chunk text data in push parser
Follow the logic of the XML parser and chunk large text nodes.
|
|
08028572
|
2025-02-01T18:21:47
|
|
html: Make data parsing modes work with push parser
This can't be solved with a simple scan for a terminator. Instead, we
make htmlParseCharData handle incomplete data if the "partial" flag is
set.
|
|
4be1e8be
|
2025-02-01T15:00:26
|
|
html: Simplify htmlParseTryOrFinish a little
|
|
12732592
|
2025-02-01T00:36:12
|
|
html: Remove unused epilog state
|
|
70bf754e
|
2025-02-01T00:17:01
|
|
html: Fix pull-parsing of incomplete end tags
Handle this HTML5 quirk in htmlParseEndTag.
|
|
4a776c78
|
2025-01-31T23:57:44
|
|
html: Use htmlParseElementInternal in push parser
|
|
ba153737
|
2025-01-31T22:51:59
|
|
html: Fix corner case when push-parsing HTML5 comments
|
|
e48fb5e4
|
2025-01-31T22:08:13
|
|
html: Handle incomplete UTF-8 when push-parsing
For now, incomplete UTF-8 is always an error in push mode.
Eventually, we could pass chunked data to the character handler when
push-parsing. Then we'd have to handle incomplete sequences.
|
|
6bb2ea8e
|
2025-02-01T14:58:06
|
|
html: Adjust xmlDetectEncoding for HTML
Don't check for UTF-32 or EBCDIC.
We now perform BOM sniffing and the first step of the HTML5 prescan
algorithm (detect UTF-16 XML declarations). The rest of the algorithm
still has to be implemented.
|
|
227d8f73
|
2025-01-31T21:05:22
|
|
html: Support encoding auto-detection in push parser
Align with pull parser.
|
|
641fb1ac
|
2025-01-31T20:41:28
|
|
html: Fix state update in push parser
|
|
a86a8ae9
|
2025-01-31T20:09:54
|
|
html: Fix push-parsing of empty documents
Also simplify end-of-document handling in push parser.
Align with pull parser.
|
|
d2fb68ed
|
2025-01-31T19:02:33
|
|
fuzz: Make large chunk size more likely
This now detects issues like 3eced32e in about 30 seconds.
|
|
cdfb54ff
|
2025-01-31T18:38:40
|
|
Fix typos
|
|
57e4bbd8
|
2025-01-31T16:45:35
|
|
parser: Improve handling of NOCDATA option
Don't modify the callback structure. This makes sure that unsetting the
option works.
|
|
1f5b5371
|
2025-01-31T16:21:20
|
|
parser: Improve handling of NOBLANKS option
Don't change the SAX handler.
Use a helper function to invoke "characters" SAX callback.
The old code didn't advance the input pointer consistently before
invoking the callback. There was also some inconsistency wrt to
ctxt->space handling. I don't understand the ctxt->space thing, but
now we always behave like the non-complex case before.
|
|
7a8722f5
|
2025-01-31T14:55:29
|
|
parser: Document that XML_PARSE_NOBLANKS is broken
Long text content can generate multiple "characters" callbacks which can
lead to NOBLANKS removing whitespace in non-whitespace text nodes. So
the NOBLANKS option doesn't even work reliably with the pull parser.
This would be extremely hard to fix.
Unfortunately, `xmllint --format` relies on this option which is another
reason why this feature never really worked.
|
|
40e423d6
|
2025-01-30T19:30:44
|
|
fuzz: Improve fuzzing of push parser
Also serialize the result of push-parsing and compare whether pull and
push parser produce the same result (differential fuzzing).
We lose the ability to inject IO errors when serializing for now, but
this isn't too important.
Use variable chunk size for push parser.
Fixes #849.
|
|
9efe1414
|
2025-01-31T13:07:35
|
|
parser: Fix detection of ']]>' when push-parsing
Fixes #850.
|
|
115b13f9
|
2025-01-30T23:18:56
|
|
parser: Document push parser limitations
|
|
53a48468
|
2025-01-30T15:15:30
|
|
xmllint: Make --push report parse errors
The push parser leaves documents in ctxt->myDoc even if they're invalid.
Also fix documentation.
Regressed with f8ff4d86.
|
|
5535721f
|
2025-01-30T01:27:03
|
|
parser: Grow input buffer after lots of whitespace
Make sure that the input buffer is grown after consuming large amounts
of whitespace.
Also move a comment.
|
|
218264fa
|
2025-01-30T01:26:01
|
|
parser: Always shrink input buffer
Shrinking the input buffer is cheap now and should be done as soon as
possible.
|
|
0de90f51
|
2025-01-30T01:25:31
|
|
parser: Define SIZE_MAX
|
|
3eced32e
|
2025-01-29T23:49:56
|
|
parser: Fix push parser with encoding and single chunk
When push-parsing with an encoding handler, we must convert the whole
buffer in the initial conversion. Otherwise, parsing a single chunk
larger than ~4KB would fail.
Regressed with commit 34c9108f.
|
|
4bd66d45
|
2025-01-29T13:11:38
|
|
Mention contributors in Copyright
To clarify that libxml2 is the work of many people, add the following
copyright notice to Copyright:
Copyright (C) The Libxml2 Contributors.
|
|
fdc73dd0
|
2025-01-29T12:58:31
|
|
README: Fix CMake example options
zlib is disabled by default now.
|
|
64bfe1f7
|
2025-01-29T12:48:50
|
|
README: Add note about security issues
|
|
93506d41
|
2025-01-29T00:17:01
|
|
parser: Make catalog PIs opt-in
This is an obscure feature that shouldn't be enabled by default.
|
|
1082d813
|
2025-01-28T23:21:34
|
|
parser: Prepare to make decompression opt-in
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
|
|
a78843be
|
2025-01-28T20:13:58
|
|
xmllint: Support compressed input from stdin
Another regression related to reading from stdin.
Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.
This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.
Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
|
|
a8d8a70c
|
2025-01-27T13:31:08
|
|
uri: Fix handling of Windows drive letters
Allow drive letters in URI paths. Technically, these should be treated
as URI schemes, but this is not what users expect. This also makes sure
that paths with drive letters are resolved as filesystem paths and
unescaped, for example when used in libxslt's document() function.
Should fix #832.
|
|
6904d4c2
|
2025-01-25T13:54:15
|
|
fuzz: Fix OSS-Fuzz build of lint fuzzer
|
|
cd7299a8
|
2025-01-24T18:59:12
|
|
meson: Fix setup with ICU as sibling subproject
Meson wrapdb provides a wrap for ICU, so libxml2 and ICU could both be
built as subprojects of the same Meson parent project. In this case, with
the icu option enabled, setup was failing with:
subprojects/libxml2-2.13.5/meson.build:603:22: ERROR: Could not get an internal variable and no default provided for <InternalDependency dep228908115162702543524838879388991448872: True>
This is because we can't get a dependency variable from a subproject that
hasn't been built yet. Fall back to assuming DEFS is empty, as it is on
my system.
|
|
6ec616ba
|
2025-01-24T18:26:55
|
|
encoding: Don't allow POSIX indicator suffixes in encoding names
Suffixes like "//IGNORE" change the behavior of iconv.
Also add comment on how we currently rely on GNU libiconv behavior
which technically violates the POSIX spec.
|
|
9b1028c9
|
2025-01-23T20:37:37
|
|
fuzz: Fix comments
|
|
e95c4b07
|
2025-01-22T10:06:39
|
|
fuzz: Also test xmllint --repeat option
|
|
dc6270d1
|
2025-01-22T09:38:43
|
|
xmllint: Fix UAF with --push --repeat
Short-lived regression. Fixes #841.
|
|
9d7bbf19
|
2025-01-23T14:36:33
|
|
tree: Fix variable name in xmlAddChild documentation
|
|
f043bf25
|
2025-01-22T19:25:59
|
|
meson: Fix build with MSVC
Check compiler options with cc.get_supported_arguments().
Fixes #842
|
|
b524cd7a
|
2025-01-21T17:35:04
|
|
meson: Fix build as subproject
Use add_project_arguments instead of add_global_arguments.
Should fix #840.
|
|
1c82bca6
|
2025-01-17T22:54:51
|
|
xmllint: Improve error reports from reader
|
|
16286dea
|
2025-01-17T23:03:20
|
|
xmllint: Fix memory leak in parseAndPrintFile
|
|
9cfc723c
|
2025-01-17T21:42:35
|
|
xmllint: Always reuse parser context
Also move push parsing into parseXml which makes "--sax --push" work.
|
|
5f1131dd
|
2025-01-17T19:54:04
|
|
xpath: Don't descend into OP_VALUE in debug dump
For some reason, its "ch1" value is invalid.
|
|
00167cae
|
2025-01-17T18:50:55
|
|
xmllint: Report OOM errors to stderr
For the validators, some work still has to be done, but for core
features, xmllint should now report OOM errors reliably.
|
|
67b738d9
|
2025-01-17T17:59:21
|
|
fuzz: Check whether xmllint reports malloc failures correctly
This relies on xmllint's "maxmem" option.
|
|
bfe6af2e
|
2025-01-17T17:09:04
|
|
fuzz: Remove hacks to build lint fuzzer
Don't include source file directly.
|
|
bf1d8b9c
|
2025-01-17T18:13:35
|
|
xmllint: Report malloc failures from parsing patterns
|
|
255fd5f3
|
2025-01-17T16:52:06
|
|
xmllint: Store error stream in global state
|
|
e42ded42
|
2025-01-17T16:00:35
|
|
xmllint: Stop using global variables
The only exception is "maxmem". The custom malloc functions don't
support an extra context.
|
|
e4194110
|
2025-01-17T16:00:05
|
|
schemas: Make ValidateStream take a const SAXHandler
|
|
d39e5714
|
2025-01-17T13:12:36
|
|
xmllint: Fix memory leak in parseFile
Short-lived regression.
|
|
0f4d36e0
|
2025-01-17T13:04:35
|
|
xmllint: Fix memory leak in error case
|
|
fbaacfe2
|
2025-01-16T15:57:35
|
|
encoding: Clean up UCS-4 encodings
Use "UCS-*" instead of "ISO-10646-UCS-*". While the XML spec recommends
"ISO-10646-UCS-2" and "ISO-10646-UCS-4", GNU iconv doesn't understand
these names.
Ignore UCS4_2143 and UCS4_3412 which were never supported.
|
|
be579a26
|
2025-01-15T12:52:53
|
|
reader: Fix return value of xmlTextReaderReadString again
Make sure to return NULL for node types except elements or text to match
the old behavior.
Note that CDATA sections are still treated like text nodes and will have
their content returned.
Fixes #838.
|
|
86401cc3
|
2025-01-07T19:01:57
|
|
xmllint: Make --shell ignore some other options
When the shell should be launched with the --shell option, don't
post-validate, stream or dump the document. Ignore the --repeat option.
|
|
c0c69cb8
|
2025-01-07T18:55:35
|
|
xmllint: Always reuse parser context
Simplifies "repeat" logic.
|
|
a5be2cc3
|
2025-01-04T22:52:19
|
|
xmllint: Support --xpath --debug
Dump compiled expression if --debug was supplied.
|
|
f22707f4
|
2024-12-30T23:21:56
|
|
xmllint: Use xmlXPathOrderDocElems for XPath queries
|
|
ca819160
|
2025-01-03T20:50:08
|
|
include: Use intptr_t to cast between pointers and ints
|
|
41c10c0c
|
2025-01-03T19:49:37
|
|
io: Don't cast file descriptors to pointers
This doesn't work if open() returns 0 which is rare but can happen. Wrap
the fd in a context struct.
Fixes #835.
|
|
71c37a56
|
2024-12-30T11:41:44
|
|
malloc-fail: Fix memory leak in xmlValidateElementContent
|
|
ab62fc27
|
2024-12-27T14:58:30
|
|
gitlab-ci: Add --with-valid to medium config
Building --with-valid --without-regexps enables some rarely tested code.
There's an additional test failure in runxmlconf without regexps.
|
|
cd220b93
|
2024-12-27T14:55:43
|
|
valid: Remove duplicate error messages when streaming
|
|
bd2a1648
|
2024-12-27T13:44:10
|
|
valid: Fix build --without-regexps
|
|
41aed089
|
2024-12-24T23:50:39
|
|
automake: Only build testdso when testing
|
|
0cf25b3d
|
2024-12-26T20:32:35
|
|
Regenerate docs and testapi.c
|
|
2e3a91a7
|
2024-12-26T21:05:18
|
|
doc: Fix documentation
|
|
53c131f6
|
2024-12-26T20:29:58
|
|
doc: Make apibuild.py work again
|
|
260954c5
|
2024-12-26T18:17:45
|
|
autotools: Set AC_CONFIG_AUX_DIR
This should make sure that autoreconf doesn't mess with parent
directories.
Should fix #833.
|
|
b3871dd1
|
2024-12-21T21:50:13
|
|
io: Fix memory leaks of encoding handler in error cases
xmlOutputBufferCreate* must always free the encoding handler.
|
|
afeff9c5
|
2024-12-21T20:47:40
|
|
xinclude: Allow build without XPath
This disables XPath queries and makes the tests fail, but might be
useful.
|
|
c134e8b4
|
2024-12-19T21:05:49
|
|
include: Make INPUT_CHUNK macro private
|
|
84a6c82f
|
2024-12-19T20:59:10
|
|
include: Make most IS_* macros private
Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.
|
|
0d4a17af
|
2024-12-18T12:02:36
|
|
valid: Fix and check return value of nodeVPush
|
|
3f0bac48
|
2024-12-11T16:23:30
|
|
malloc-fail: Handle more malloc failures in schema code
These issues can only arise after a memory allocation failed.
- WXS_ADD_*: Add NULL check and raise error
- XML_SCHEMA_*: Make macros safe
- xmlSchemaParseUnion: Fix leak, raise error, commit after success to
avoid memory corruption
- xmlSchemaVAddNodeQName: Restore nbItems after partial success,
raise error
- xmlSchemaIDCAcquireTargetList: Raise error
- xmlSchemaXPathProcessHistory: Handle errors
- xmlSchemaIDCFillNodeTables: Fix leak
- xmlSchemaCheckCVCIDCKeyRef: Handle errors
- xmlSchemaVPushText: Reset flag to avoid memory corruption
- xmlSchemaNewValidCtxt: Handle errors
- xmlSchemaVDocWalk: Fix leak
- xmlSchemaInitBasicType: Handle error
- xmlSchemaCleanupTypesInternal: Fix null deref
- xmlSchemaWhiteSpaceReplace: Handle error
- xmlSchemaParseUInt: Handle error
- xmlSchemaValAtomicType: Fix leak, handle error
- xmlSchemaDateNormalize: Fix leak
|