Log

Author Commit Date CI Message
Nick Wellnhofer 25ae533b 2025-02-17T11:27:30 xmllint: Fix SIGBUS with --memory option If the input file size is a multiple of page size, the byte after the file's content is on a new page and accessing it will lead to SIGBUS. Remove XML_INPUT_BUF_ZERO_TERMINATED hint for mmapped files. Regressed with a221cd78. Fixes #864.
Nick Wellnhofer 7a61c32b 2025-02-13T23:09:28 html: Use enum instead of magic values for insertion modes
Nick Wellnhofer 3793eaad 2025-02-16T13:54:56 fuzz: Fix build
Nick Wellnhofer 69b91da3 2025-02-13T19:45:41 Revert "xpath: Make contextSize and proximityPosition default to 1" This reverts commit afbc0a0405236de4ab8cbac94745e9885db0a198.
Nick Wellnhofer 9c16a153 2025-02-13T18:41:33 Revert "include: Make most IS_* macros private" This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
Nick Wellnhofer 6c716d49 2025-02-13T16:48:53 pattern: Fix compilation of explicit child axis The child axis is the default axis and should generate XML_OP_ELEM like the case without an axis.
Nick Wellnhofer 8cf6129b 2025-02-13T18:20:46 html: Stop implying <p> start tags Only <html>, <head> or <body> should be implied. Opening extra <p> tags has always been a libxml2 quirk.
Nick Wellnhofer 71122421 2025-02-13T14:04:10 html: Make implied <p> tags more deterministic libxml2's HTML parser adds <p> start tags in some situations. This behavior, which doesn't follow any standard, was added in 2000, see here: http://veillard.com/XML/messages/0655.html Text nodes that only contain whitespace don't imply a <p> tag, but the whitespace check cannot work reliably if we're parsing partial text data which can happen with both pull and push parser. The logic in `areBlanks` is hard to follow. The checks involving `CUR` depend on the position of the input pointer and seem dubious. It's also possible that the behavior changed inadvertently with a later commit. As a result, it's hard to come up with good test cases. We now process leading whitespace before creating implied tags. This is more in line with HTML5 and should avoid at least some issues with partial text data. For example, parsing the string "<head> x" used to result in: <html> <head></head> <body><p> x</p></body> </html> And now results in: <html> <head> </head> <body><p>x</p></body> </html> Except for the implied <p> tag, this matches HTML5.
Nick Wellnhofer ebbc31cc 2025-02-13T12:09:58 malloc-fail: Check for malloc failure in xhtmlNodeDumpOutput
Nick Wellnhofer 79ab721c 2025-02-11T11:39:08 tests: Fix error return in testHugeEncodedChunk Fixes #859.
Nick Wellnhofer cfc854b8 2025-02-11T00:21:12 fuzz: Work around glibc iconv() bug
Nick Wellnhofer 3a1526a5 2025-02-10T19:32:32 xpath: Don't raise OOM error on long names Short-lived regression.
Daniel Cheng 3dcde736 2025-02-05T15:18:48 Use __has_attribute to check for __counted_by__ support The initial clang patch to support __counted_by__ was landed and reverted several times. There are some clang toolchains (e.g. the Android toolchain) that report themselves as version 18 but do not support __counted_by__. While it is debatable if Android should be shipping a pre-release clang, using __has_attribute should be a bit simpler overall. Note that this doesn't migrate everything else to use __has_attribute: while clang has always supported __has_attribute, gcc didn't support it until a bit later.
Nick Wellnhofer 35d8a230 2025-02-06T10:14:56 tests: Fix expected errors in runxmlconf The extra failure if regexps weren't enabled was actually a regression fixed by the previous commit.
Zak Ridouh b466e70a 2025-02-05T14:11:04 Fix early return in vstateVPush in valid.c While looking over the code in the fallback method for `vstateVPush` in valid.c when `LIBXML_REGEXP_ENABLED` is not defined, I noticed that there is an ungated `return(-1)` after attempting to allocate memory. I believe this should be inside a check, for if the malloc fails.
Nick Wellnhofer 62d4697d 2025-02-02T16:43:25 gitlab-ci: Disable cmake:mingw for now Executing /mingw64/bin/cmake.exe with any arguments fails without error message and exit code 127 since 2025-01-21. I have no idea why.
Nick Wellnhofer a25dc439 2025-02-02T15:01:50 Debug CI failure
Nick Wellnhofer cd491ac0 2025-02-02T13:13:20 dict: Handle ENOSYS from getentropy gracefully Also add some comments. Should fix #854.
Nick Wellnhofer bc437868 2025-01-31T23:11:55 fuzz: Improve HTML fuzzer Verify that pull and push parser produce the same result. Fixes #849.
Nick Wellnhofer c4f760be 2025-02-01T15:29:56 encoding: Handle iconv() returning EOPNOTSUPP on Apple iconv() really shouldn't return undocumented error codes.
Nick Wellnhofer 8d7e38d5 2025-02-01T22:41:53 fuzz: Ignore encodings when fuzzing on Apple Not long ago, Apple decided to replace GNU libiconv with a patched up version of FreeBSD's iconv implementation in their operating systems. Unfortunately, the quality of both the original implementation as well as Apple's patches is so abysmal that you routinely find issues when fuzzing your own code.
Nick Wellnhofer 68be036f 2025-02-01T22:09:18 fuzz: Disable HTML encoding detection for now This doesn't work with the push parser.
Nick Wellnhofer b4d3d87e 2025-02-01T22:02:33 parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.
Nick Wellnhofer c13fcc19 2025-02-01T19:36:06 html: Chunk text data in push parser Follow the logic of the XML parser and chunk large text nodes.
Nick Wellnhofer 08028572 2025-02-01T18:21:47 html: Make data parsing modes work with push parser This can't be solved with a simple scan for a terminator. Instead, we make htmlParseCharData handle incomplete data if the "partial" flag is set.
Nick Wellnhofer 4be1e8be 2025-02-01T15:00:26 html: Simplify htmlParseTryOrFinish a little
Nick Wellnhofer 12732592 2025-02-01T00:36:12 html: Remove unused epilog state
Nick Wellnhofer 70bf754e 2025-02-01T00:17:01 html: Fix pull-parsing of incomplete end tags Handle this HTML5 quirk in htmlParseEndTag.
Nick Wellnhofer 4a776c78 2025-01-31T23:57:44 html: Use htmlParseElementInternal in push parser
Nick Wellnhofer ba153737 2025-01-31T22:51:59 html: Fix corner case when push-parsing HTML5 comments
Nick Wellnhofer e48fb5e4 2025-01-31T22:08:13 html: Handle incomplete UTF-8 when push-parsing For now, incomplete UTF-8 is always an error in push mode. Eventually, we could pass chunked data to the character handler when push-parsing. Then we'd have to handle incomplete sequences.
Nick Wellnhofer 6bb2ea8e 2025-02-01T14:58:06 html: Adjust xmlDetectEncoding for HTML Don't check for UTF-32 or EBCDIC. We now perform BOM sniffing and the first step of the HTML5 prescan algorithm (detect UTF-16 XML declarations). The rest of the algorithm still has to be implemented.
Nick Wellnhofer 227d8f73 2025-01-31T21:05:22 html: Support encoding auto-detection in push parser Align with pull parser.
Nick Wellnhofer 641fb1ac 2025-01-31T20:41:28 html: Fix state update in push parser
Nick Wellnhofer a86a8ae9 2025-01-31T20:09:54 html: Fix push-parsing of empty documents Also simplify end-of-document handling in push parser. Align with pull parser.
Nick Wellnhofer d2fb68ed 2025-01-31T19:02:33 fuzz: Make large chunk size more likely This now detects issues like 3eced32e in about 30 seconds.
Nick Wellnhofer cdfb54ff 2025-01-31T18:38:40 Fix typos
Nick Wellnhofer 57e4bbd8 2025-01-31T16:45:35 parser: Improve handling of NOCDATA option Don't modify the callback structure. This makes sure that unsetting the option works.
Nick Wellnhofer 1f5b5371 2025-01-31T16:21:20 parser: Improve handling of NOBLANKS option Don't change the SAX handler. Use a helper function to invoke "characters" SAX callback. The old code didn't advance the input pointer consistently before invoking the callback. There was also some inconsistency wrt to ctxt->space handling. I don't understand the ctxt->space thing, but now we always behave like the non-complex case before.
Nick Wellnhofer 7a8722f5 2025-01-31T14:55:29 parser: Document that XML_PARSE_NOBLANKS is broken Long text content can generate multiple "characters" callbacks which can lead to NOBLANKS removing whitespace in non-whitespace text nodes. So the NOBLANKS option doesn't even work reliably with the pull parser. This would be extremely hard to fix. Unfortunately, `xmllint --format` relies on this option which is another reason why this feature never really worked.
Nick Wellnhofer 40e423d6 2025-01-30T19:30:44 fuzz: Improve fuzzing of push parser Also serialize the result of push-parsing and compare whether pull and push parser produce the same result (differential fuzzing). We lose the ability to inject IO errors when serializing for now, but this isn't too important. Use variable chunk size for push parser. Fixes #849.
Nick Wellnhofer 9efe1414 2025-01-31T13:07:35 parser: Fix detection of ']]>' when push-parsing Fixes #850.
Nick Wellnhofer 115b13f9 2025-01-30T23:18:56 parser: Document push parser limitations
Nick Wellnhofer 53a48468 2025-01-30T15:15:30 xmllint: Make --push report parse errors The push parser leaves documents in ctxt->myDoc even if they're invalid. Also fix documentation. Regressed with f8ff4d86.
Nick Wellnhofer 5535721f 2025-01-30T01:27:03 parser: Grow input buffer after lots of whitespace Make sure that the input buffer is grown after consuming large amounts of whitespace. Also move a comment.
Nick Wellnhofer 218264fa 2025-01-30T01:26:01 parser: Always shrink input buffer Shrinking the input buffer is cheap now and should be done as soon as possible.
Nick Wellnhofer 0de90f51 2025-01-30T01:25:31 parser: Define SIZE_MAX
Nick Wellnhofer 3eced32e 2025-01-29T23:49:56 parser: Fix push parser with encoding and single chunk When push-parsing with an encoding handler, we must convert the whole buffer in the initial conversion. Otherwise, parsing a single chunk larger than ~4KB would fail. Regressed with commit 34c9108f.
Nick Wellnhofer 4bd66d45 2025-01-29T13:11:38 Mention contributors in Copyright To clarify that libxml2 is the work of many people, add the following copyright notice to Copyright: Copyright (C) The Libxml2 Contributors.
Nick Wellnhofer fdc73dd0 2025-01-29T12:58:31 README: Fix CMake example options zlib is disabled by default now.
Nick Wellnhofer 64bfe1f7 2025-01-29T12:48:50 README: Add note about security issues
Nick Wellnhofer 93506d41 2025-01-29T00:17:01 parser: Make catalog PIs opt-in This is an obscure feature that shouldn't be enabled by default.
Nick Wellnhofer 1082d813 2025-01-28T23:21:34 parser: Prepare to make decompression opt-in Add a new parser option XML_PARSE_UNZIP that enables decompression. xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set this option currently, but downstream users should start to set the option if they really need it.
Nick Wellnhofer a78843be 2025-01-28T20:13:58 xmllint: Support compressed input from stdin Another regression related to reading from stdin. Making a "-" filename read from stdin was deeply baked into the core IO code but is inherently insecure. I really want to reenable this dangerous feature as sparingly as possible. This now enables compressed input when using the "Fd" API functions which wan't supported before. But XML_PARSE_NO_UNZIP will be inverted later. Allow compressed stdin in xmlReadFile to support xmlstarlet and older versions of xsltproc. So far, these are the only known command-line tools that rely on "-" meaning stdin.
Nick Wellnhofer a8d8a70c 2025-01-27T13:31:08 uri: Fix handling of Windows drive letters Allow drive letters in URI paths. Technically, these should be treated as URI schemes, but this is not what users expect. This also makes sure that paths with drive letters are resolved as filesystem paths and unescaped, for example when used in libxslt's document() function. Should fix #832.
Nick Wellnhofer 6904d4c2 2025-01-25T13:54:15 fuzz: Fix OSS-Fuzz build of lint fuzzer
Benjamin Gilbert cd7299a8 2025-01-24T18:59:12 meson: Fix setup with ICU as sibling subproject Meson wrapdb provides a wrap for ICU, so libxml2 and ICU could both be built as subprojects of the same Meson parent project. In this case, with the icu option enabled, setup was failing with: subprojects/libxml2-2.13.5/meson.build:603:22: ERROR: Could not get an internal variable and no default provided for <InternalDependency dep228908115162702543524838879388991448872: True> This is because we can't get a dependency variable from a subproject that hasn't been built yet. Fall back to assuming DEFS is empty, as it is on my system.
Nick Wellnhofer 6ec616ba 2025-01-24T18:26:55 encoding: Don't allow POSIX indicator suffixes in encoding names Suffixes like "//IGNORE" change the behavior of iconv. Also add comment on how we currently rely on GNU libiconv behavior which technically violates the POSIX spec.
Nick Wellnhofer 9b1028c9 2025-01-23T20:37:37 fuzz: Fix comments
Nick Wellnhofer e95c4b07 2025-01-22T10:06:39 fuzz: Also test xmllint --repeat option
Nick Wellnhofer dc6270d1 2025-01-22T09:38:43 xmllint: Fix UAF with --push --repeat Short-lived regression. Fixes #841.
Grzegorz Szymaszek 9d7bbf19 2025-01-23T14:36:33 tree: Fix variable name in xmlAddChild documentation
Kjell Ahlstedt f043bf25 2025-01-22T19:25:59 meson: Fix build with MSVC Check compiler options with cc.get_supported_arguments(). Fixes #842
Nick Wellnhofer b524cd7a 2025-01-21T17:35:04 meson: Fix build as subproject Use add_project_arguments instead of add_global_arguments. Should fix #840.
Nick Wellnhofer 1c82bca6 2025-01-17T22:54:51 xmllint: Improve error reports from reader
Nick Wellnhofer 16286dea 2025-01-17T23:03:20 xmllint: Fix memory leak in parseAndPrintFile
Nick Wellnhofer 9cfc723c 2025-01-17T21:42:35 xmllint: Always reuse parser context Also move push parsing into parseXml which makes "--sax --push" work.
Nick Wellnhofer 5f1131dd 2025-01-17T19:54:04 xpath: Don't descend into OP_VALUE in debug dump For some reason, its "ch1" value is invalid.
Nick Wellnhofer 00167cae 2025-01-17T18:50:55 xmllint: Report OOM errors to stderr For the validators, some work still has to be done, but for core features, xmllint should now report OOM errors reliably.
Nick Wellnhofer 67b738d9 2025-01-17T17:59:21 fuzz: Check whether xmllint reports malloc failures correctly This relies on xmllint's "maxmem" option.
Nick Wellnhofer bfe6af2e 2025-01-17T17:09:04 fuzz: Remove hacks to build lint fuzzer Don't include source file directly.
Nick Wellnhofer bf1d8b9c 2025-01-17T18:13:35 xmllint: Report malloc failures from parsing patterns
Nick Wellnhofer 255fd5f3 2025-01-17T16:52:06 xmllint: Store error stream in global state
Nick Wellnhofer e42ded42 2025-01-17T16:00:35 xmllint: Stop using global variables The only exception is "maxmem". The custom malloc functions don't support an extra context.
Nick Wellnhofer e4194110 2025-01-17T16:00:05 schemas: Make ValidateStream take a const SAXHandler
Nick Wellnhofer d39e5714 2025-01-17T13:12:36 xmllint: Fix memory leak in parseFile Short-lived regression.
Nick Wellnhofer 0f4d36e0 2025-01-17T13:04:35 xmllint: Fix memory leak in error case
Nick Wellnhofer fbaacfe2 2025-01-16T15:57:35 encoding: Clean up UCS-4 encodings Use "UCS-*" instead of "ISO-10646-UCS-*". While the XML spec recommends "ISO-10646-UCS-2" and "ISO-10646-UCS-4", GNU iconv doesn't understand these names. Ignore UCS4_2143 and UCS4_3412 which were never supported.
Nick Wellnhofer be579a26 2025-01-15T12:52:53 reader: Fix return value of xmlTextReaderReadString again Make sure to return NULL for node types except elements or text to match the old behavior. Note that CDATA sections are still treated like text nodes and will have their content returned. Fixes #838.
Nick Wellnhofer 86401cc3 2025-01-07T19:01:57 xmllint: Make --shell ignore some other options When the shell should be launched with the --shell option, don't post-validate, stream or dump the document. Ignore the --repeat option.
Nick Wellnhofer c0c69cb8 2025-01-07T18:55:35 xmllint: Always reuse parser context Simplifies "repeat" logic.
Nick Wellnhofer a5be2cc3 2025-01-04T22:52:19 xmllint: Support --xpath --debug Dump compiled expression if --debug was supplied.
Nick Wellnhofer f22707f4 2024-12-30T23:21:56 xmllint: Use xmlXPathOrderDocElems for XPath queries
Nick Wellnhofer ca819160 2025-01-03T20:50:08 include: Use intptr_t to cast between pointers and ints
Nick Wellnhofer 41c10c0c 2025-01-03T19:49:37 io: Don't cast file descriptors to pointers This doesn't work if open() returns 0 which is rare but can happen. Wrap the fd in a context struct. Fixes #835.
Nick Wellnhofer 71c37a56 2024-12-30T11:41:44 malloc-fail: Fix memory leak in xmlValidateElementContent
Nick Wellnhofer ab62fc27 2024-12-27T14:58:30 gitlab-ci: Add --with-valid to medium config Building --with-valid --without-regexps enables some rarely tested code. There's an additional test failure in runxmlconf without regexps.
Nick Wellnhofer cd220b93 2024-12-27T14:55:43 valid: Remove duplicate error messages when streaming
Nick Wellnhofer bd2a1648 2024-12-27T13:44:10 valid: Fix build --without-regexps
Nick Wellnhofer 41aed089 2024-12-24T23:50:39 automake: Only build testdso when testing
Nick Wellnhofer 0cf25b3d 2024-12-26T20:32:35 Regenerate docs and testapi.c
Nick Wellnhofer 2e3a91a7 2024-12-26T21:05:18 doc: Fix documentation
Nick Wellnhofer 53c131f6 2024-12-26T20:29:58 doc: Make apibuild.py work again
Nick Wellnhofer 260954c5 2024-12-26T18:17:45 autotools: Set AC_CONFIG_AUX_DIR This should make sure that autoreconf doesn't mess with parent directories. Should fix #833.
Nick Wellnhofer b3871dd1 2024-12-21T21:50:13 io: Fix memory leaks of encoding handler in error cases xmlOutputBufferCreate* must always free the encoding handler.
Nick Wellnhofer afeff9c5 2024-12-21T20:47:40 xinclude: Allow build without XPath This disables XPath queries and makes the tests fail, but might be useful.
Nick Wellnhofer c134e8b4 2024-12-19T21:05:49 include: Make INPUT_CHUNK macro private
Nick Wellnhofer 84a6c82f 2024-12-19T20:59:10 include: Make most IS_* macros private Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.
Nick Wellnhofer 0d4a17af 2024-12-18T12:02:36 valid: Fix and check return value of nodeVPush
Nick Wellnhofer 3f0bac48 2024-12-11T16:23:30 malloc-fail: Handle more malloc failures in schema code These issues can only arise after a memory allocation failed. - WXS_ADD_*: Add NULL check and raise error - XML_SCHEMA_*: Make macros safe - xmlSchemaParseUnion: Fix leak, raise error, commit after success to avoid memory corruption - xmlSchemaVAddNodeQName: Restore nbItems after partial success, raise error - xmlSchemaIDCAcquireTargetList: Raise error - xmlSchemaXPathProcessHistory: Handle errors - xmlSchemaIDCFillNodeTables: Fix leak - xmlSchemaCheckCVCIDCKeyRef: Handle errors - xmlSchemaVPushText: Reset flag to avoid memory corruption - xmlSchemaNewValidCtxt: Handle errors - xmlSchemaVDocWalk: Fix leak - xmlSchemaInitBasicType: Handle error - xmlSchemaCleanupTypesInternal: Fix null deref - xmlSchemaWhiteSpaceReplace: Handle error - xmlSchemaParseUInt: Handle error - xmlSchemaValAtomicType: Fix leak, handle error - xmlSchemaDateNormalize: Fix leak