Log

Author Commit Date CI Message
Nick Wellnhofer eed1a07d 2025-03-04T13:32:52 build: Remove version script
Nick Wellnhofer cdc5cfed 2025-03-04T13:26:51 legacy: Remove legacy symbols
Nick Wellnhofer 3250a01d 2025-03-04T13:15:42 error: Convert initGenericErrorDefaultFunc to macro
Nick Wellnhofer c42b3227 2025-03-04T13:11:18 parser: Convert inputPush and inputPop to macros
Nick Wellnhofer 361f7bff 2025-03-04T13:02:36 parser: Make nodePush, nodePop, namePush, namePop private
Nick Wellnhofer 0b27097a 2025-03-04T12:55:25 encoding: Rename unprefixed public functions
Nick Wellnhofer 66fdf94c 2025-03-03T10:12:18 cmake: Fix WITH_RELAXNG option Dependent options must come after dependencies.
Nick Wellnhofer a0f156ff 2025-03-02T13:21:29 io: Fix `compressed` flag for uncompressed stdin This could cause xmlstarlet to generate compressed output unexpectedly. Regressed with a78843be. Should fix #869.
Nick Wellnhofer 05bd1720 2025-03-01T10:25:29 parser: Fix parsing of DTD content Regressed in 2.11. Fixes #868.
Nick Wellnhofer 552864f1 2025-02-25T23:10:46 Remove os400 port This is based on an ancient version and completely outdated.
Nick Wellnhofer e60f0712 2025-02-25T23:07:55 Update NEWS
Nick Wellnhofer e50d314a 2025-02-25T23:07:19 build: Add separate configuration option for RELAX NG Support for RELAX NG used to be enabled together with XML Schema support (--with-schemas). Now there's a separate option and a new feature macro LIBXML_RELAXNG_ENABLED.
Nick Wellnhofer ce1b704e 2025-02-25T20:09:36 doc: Regenerate libxml2-api.xml
Nick Wellnhofer 6ab430ca 2025-02-22T21:17:42 Remove unnecessary #includes
Nick Wellnhofer 7ae8e8ac 2025-02-22T21:06:34 schemas: Make xmlSchemaDump depend on DEBUG_ENABLED
Nick Wellnhofer 6fc26076 2025-02-22T20:31:45 regexp: Hide debugging code behind DEBUG_REGEXP xmlRegexpPrint is now a deprecated no-op.
Florin Haja 4649f28f 2025-02-22T19:29:07 xmlregexp: add support for compact form of automata in xmlRegexpPrint
Nick Wellnhofer c82270a9 2025-02-22T18:51:38 regexp: Avoid dangling start/stop pointers in atom States could be eliminated later, so set start/stop pointers to NULL after they're used in xmlFAGenerateTransitions.
Nick Wellnhofer 5ed4eafd 2025-02-22T14:51:39 html: Don't invoke SAX callbacks if parser was stopped
Nick Wellnhofer 6dfa68ac 2025-02-22T14:49:51 SAX2: Fix ctxt->nodemem check In some error cases and maybe other situations, nodemem can have a value of -1.
Nick Wellnhofer 73514f2d 2025-02-20T18:50:58 gitlab-ci: Stop downloading and installing CMake for MSVC CMake should already be installed.
Jan Alexander Steffens (heftig) 064a0211 2025-02-20T13:52:40 meson: Fix Python module build
Jan Alexander Steffens (heftig) c2e2d762 2025-02-20T13:51:26 python: Pass destination dir to generator.py Simplify usage across build systems.
Jan Alexander Steffens (heftig) 82fb5cae 2025-02-20T13:49:39 meson: Use project_name instead of 'libxml2'
Nick Wellnhofer e649c972 2024-12-18T12:49:24 fuzz: Add utility scripts Add scripts to minimize a corpus and generate HTML coverage reports.
Nick Wellnhofer 63dfcca6 2024-12-16T01:34:29 fuzz: Reduce initial array size
Nick Wellnhofer 6f903d43 2024-12-13T19:15:38 fuzz: Rework fixed parser options Remove XML_PARSE_XINCLUDE. This is only honored by the XML Reader interface which is now fuzzed in reader.c. Don't validate in XInclude fuzzer. This doesn't increase coverage after moving the Reader fuzzer.
Nick Wellnhofer 44628d45 2024-12-13T15:23:30 fuzz: Harden leak check in lint fuzzer Check for undetected memory leaks from previous iterations. This also makes sure that the maxmem limit is checked deterministically.
Nick Wellnhofer c6c6d8af 2024-12-11T16:24:23 fuzz: Mutate fuzz data chunks separately Implement a custom mutator that takes a list of fixed-size chunks which are mutated with a given probability. This makes sure that values like parser options or failure position are mutated regularly even as the fuzz data grows large. Values can also be adjusted temporarily to make the fuzzer focus on failure injection, for example. Thanks to David Kilzer for the idea.
Nick Wellnhofer f5257d92 2024-12-11T16:24:43 fuzz: Fix failure injection in schema fuzzer
Nick Wellnhofer 9f86dae9 2024-12-15T14:27:05 test: Add test case for UAF in xmlSchemaIDCFillNodeTables
Nick Wellnhofer fd359a7e 2024-12-10T15:54:12 fuzz: Start to fuzz XML Schema validator
Himanshibansal fe7f835f 2025-02-20T10:24:50 Fix C4296 warning: Resolve comparison of unsigned int with 0
Nick Wellnhofer b8234e8c 2025-02-19T12:53:32 html: Fix check for partial named character references Digits are allowed after the first character.
Nick Wellnhofer f68c70d2 2025-02-19T12:20:57 html: Remove htmlSaveErr This function is useless now.
Nick Wellnhofer 0315ac93 2025-02-19T12:18:50 html: Handle error from htmlFindOutputEncoder
Nick Wellnhofer 22ada0a0 2025-02-18T23:27:40 tests: Look for xmlconf in source directory Add -d option to runxmlconf for automake. Fix extraction of xmlconf.tar.gz on Windows. Make runxmlconf work with Meson CI.
Nick Wellnhofer aedc1f3d 2025-02-18T23:15:20 gitlab-ci: Run meson tests verbosely
Nick Wellnhofer 9037dce9 2025-02-18T19:38:28 fuzz: Add dictionary for lint fuzzer Mostly a combination of xml.dict and xpath.dict. This should with fuzzing pattern.c.
Nick Wellnhofer 51622c05 2025-02-18T17:27:16 doc: Update release instructions
Nick Wellnhofer 8c8753ad 2025-02-11T17:30:40 [CVE-2025-24928] Fix stack-buffer-overflow in xmlSnprintfElements Fixes #847.
Nick Wellnhofer 5880a9a6 2024-12-10T16:52:05 [CVE-2024-56171] Fix use-after-free after xmlSchemaItemListAdd xmlSchemaItemListAdd can reallocate the items array. Update local variables after adding item in - xmlSchemaIDCFillNodeTables - xmlSchemaBubbleIDCNodeTables Fixes #828.
Nick Wellnhofer 06b39650 2025-02-17T12:19:23 fuzz: Stop testing xmllint --memory option The --memory option mmaps files directly, bypassing the resource loader. We'd need a temp file to make it work when fuzzing.
Nick Wellnhofer 25ae533b 2025-02-17T11:27:30 xmllint: Fix SIGBUS with --memory option If the input file size is a multiple of page size, the byte after the file's content is on a new page and accessing it will lead to SIGBUS. Remove XML_INPUT_BUF_ZERO_TERMINATED hint for mmapped files. Regressed with a221cd78. Fixes #864.
Nick Wellnhofer 7a61c32b 2025-02-13T23:09:28 html: Use enum instead of magic values for insertion modes
Nick Wellnhofer 3793eaad 2025-02-16T13:54:56 fuzz: Fix build
Nick Wellnhofer 69b91da3 2025-02-13T19:45:41 Revert "xpath: Make contextSize and proximityPosition default to 1" This reverts commit afbc0a0405236de4ab8cbac94745e9885db0a198.
Nick Wellnhofer 9c16a153 2025-02-13T18:41:33 Revert "include: Make most IS_* macros private" This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
Nick Wellnhofer 6c716d49 2025-02-13T16:48:53 pattern: Fix compilation of explicit child axis The child axis is the default axis and should generate XML_OP_ELEM like the case without an axis.
Nick Wellnhofer 8cf6129b 2025-02-13T18:20:46 html: Stop implying <p> start tags Only <html>, <head> or <body> should be implied. Opening extra <p> tags has always been a libxml2 quirk.
Nick Wellnhofer 71122421 2025-02-13T14:04:10 html: Make implied <p> tags more deterministic libxml2's HTML parser adds <p> start tags in some situations. This behavior, which doesn't follow any standard, was added in 2000, see here: http://veillard.com/XML/messages/0655.html Text nodes that only contain whitespace don't imply a <p> tag, but the whitespace check cannot work reliably if we're parsing partial text data which can happen with both pull and push parser. The logic in `areBlanks` is hard to follow. The checks involving `CUR` depend on the position of the input pointer and seem dubious. It's also possible that the behavior changed inadvertently with a later commit. As a result, it's hard to come up with good test cases. We now process leading whitespace before creating implied tags. This is more in line with HTML5 and should avoid at least some issues with partial text data. For example, parsing the string "<head> x" used to result in: <html> <head></head> <body><p> x</p></body> </html> And now results in: <html> <head> </head> <body><p>x</p></body> </html> Except for the implied <p> tag, this matches HTML5.
Nick Wellnhofer ebbc31cc 2025-02-13T12:09:58 malloc-fail: Check for malloc failure in xhtmlNodeDumpOutput
Nick Wellnhofer 79ab721c 2025-02-11T11:39:08 tests: Fix error return in testHugeEncodedChunk Fixes #859.
Nick Wellnhofer cfc854b8 2025-02-11T00:21:12 fuzz: Work around glibc iconv() bug
Nick Wellnhofer 3a1526a5 2025-02-10T19:32:32 xpath: Don't raise OOM error on long names Short-lived regression.
Daniel Cheng 3dcde736 2025-02-05T15:18:48 Use __has_attribute to check for __counted_by__ support The initial clang patch to support __counted_by__ was landed and reverted several times. There are some clang toolchains (e.g. the Android toolchain) that report themselves as version 18 but do not support __counted_by__. While it is debatable if Android should be shipping a pre-release clang, using __has_attribute should be a bit simpler overall. Note that this doesn't migrate everything else to use __has_attribute: while clang has always supported __has_attribute, gcc didn't support it until a bit later.
Nick Wellnhofer 35d8a230 2025-02-06T10:14:56 tests: Fix expected errors in runxmlconf The extra failure if regexps weren't enabled was actually a regression fixed by the previous commit.
Zak Ridouh b466e70a 2025-02-05T14:11:04 Fix early return in vstateVPush in valid.c While looking over the code in the fallback method for `vstateVPush` in valid.c when `LIBXML_REGEXP_ENABLED` is not defined, I noticed that there is an ungated `return(-1)` after attempting to allocate memory. I believe this should be inside a check, for if the malloc fails.
Nick Wellnhofer 62d4697d 2025-02-02T16:43:25 gitlab-ci: Disable cmake:mingw for now Executing /mingw64/bin/cmake.exe with any arguments fails without error message and exit code 127 since 2025-01-21. I have no idea why.
Nick Wellnhofer a25dc439 2025-02-02T15:01:50 Debug CI failure
Nick Wellnhofer cd491ac0 2025-02-02T13:13:20 dict: Handle ENOSYS from getentropy gracefully Also add some comments. Should fix #854.
Nick Wellnhofer 8d7e38d5 2025-02-01T22:41:53 fuzz: Ignore encodings when fuzzing on Apple Not long ago, Apple decided to replace GNU libiconv with a patched up version of FreeBSD's iconv implementation in their operating systems. Unfortunately, the quality of both the original implementation as well as Apple's patches is so abysmal that you routinely find issues when fuzzing your own code.
Nick Wellnhofer 68be036f 2025-02-01T22:09:18 fuzz: Disable HTML encoding detection for now This doesn't work with the push parser.
Nick Wellnhofer b4d3d87e 2025-02-01T22:02:33 parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.
Nick Wellnhofer c13fcc19 2025-02-01T19:36:06 html: Chunk text data in push parser Follow the logic of the XML parser and chunk large text nodes.
Nick Wellnhofer 08028572 2025-02-01T18:21:47 html: Make data parsing modes work with push parser This can't be solved with a simple scan for a terminator. Instead, we make htmlParseCharData handle incomplete data if the "partial" flag is set.
Nick Wellnhofer 4be1e8be 2025-02-01T15:00:26 html: Simplify htmlParseTryOrFinish a little
Nick Wellnhofer 12732592 2025-02-01T00:36:12 html: Remove unused epilog state
Nick Wellnhofer 70bf754e 2025-02-01T00:17:01 html: Fix pull-parsing of incomplete end tags Handle this HTML5 quirk in htmlParseEndTag.
Nick Wellnhofer 4a776c78 2025-01-31T23:57:44 html: Use htmlParseElementInternal in push parser
Nick Wellnhofer ba153737 2025-01-31T22:51:59 html: Fix corner case when push-parsing HTML5 comments
Nick Wellnhofer e48fb5e4 2025-01-31T22:08:13 html: Handle incomplete UTF-8 when push-parsing For now, incomplete UTF-8 is always an error in push mode. Eventually, we could pass chunked data to the character handler when push-parsing. Then we'd have to handle incomplete sequences.
Nick Wellnhofer bc437868 2025-01-31T23:11:55 fuzz: Improve HTML fuzzer Verify that pull and push parser produce the same result. Fixes #849.
Nick Wellnhofer c4f760be 2025-02-01T15:29:56 encoding: Handle iconv() returning EOPNOTSUPP on Apple iconv() really shouldn't return undocumented error codes.
Nick Wellnhofer 6bb2ea8e 2025-02-01T14:58:06 html: Adjust xmlDetectEncoding for HTML Don't check for UTF-32 or EBCDIC. We now perform BOM sniffing and the first step of the HTML5 prescan algorithm (detect UTF-16 XML declarations). The rest of the algorithm still has to be implemented.
Nick Wellnhofer 227d8f73 2025-01-31T21:05:22 html: Support encoding auto-detection in push parser Align with pull parser.
Nick Wellnhofer 641fb1ac 2025-01-31T20:41:28 html: Fix state update in push parser
Nick Wellnhofer a86a8ae9 2025-01-31T20:09:54 html: Fix push-parsing of empty documents Also simplify end-of-document handling in push parser. Align with pull parser.
Nick Wellnhofer d2fb68ed 2025-01-31T19:02:33 fuzz: Make large chunk size more likely This now detects issues like 3eced32e in about 30 seconds.
Nick Wellnhofer cdfb54ff 2025-01-31T18:38:40 Fix typos
Nick Wellnhofer 57e4bbd8 2025-01-31T16:45:35 parser: Improve handling of NOCDATA option Don't modify the callback structure. This makes sure that unsetting the option works.
Nick Wellnhofer 1f5b5371 2025-01-31T16:21:20 parser: Improve handling of NOBLANKS option Don't change the SAX handler. Use a helper function to invoke "characters" SAX callback. The old code didn't advance the input pointer consistently before invoking the callback. There was also some inconsistency wrt to ctxt->space handling. I don't understand the ctxt->space thing, but now we always behave like the non-complex case before.
Nick Wellnhofer 7a8722f5 2025-01-31T14:55:29 parser: Document that XML_PARSE_NOBLANKS is broken Long text content can generate multiple "characters" callbacks which can lead to NOBLANKS removing whitespace in non-whitespace text nodes. So the NOBLANKS option doesn't even work reliably with the pull parser. This would be extremely hard to fix. Unfortunately, `xmllint --format` relies on this option which is another reason why this feature never really worked.
Nick Wellnhofer 40e423d6 2025-01-30T19:30:44 fuzz: Improve fuzzing of push parser Also serialize the result of push-parsing and compare whether pull and push parser produce the same result (differential fuzzing). We lose the ability to inject IO errors when serializing for now, but this isn't too important. Use variable chunk size for push parser. Fixes #849.
Nick Wellnhofer 9efe1414 2025-01-31T13:07:35 parser: Fix detection of ']]>' when push-parsing Fixes #850.
Nick Wellnhofer 115b13f9 2025-01-30T23:18:56 parser: Document push parser limitations
Nick Wellnhofer 53a48468 2025-01-30T15:15:30 xmllint: Make --push report parse errors The push parser leaves documents in ctxt->myDoc even if they're invalid. Also fix documentation. Regressed with f8ff4d86.
Nick Wellnhofer 5535721f 2025-01-30T01:27:03 parser: Grow input buffer after lots of whitespace Make sure that the input buffer is grown after consuming large amounts of whitespace. Also move a comment.
Nick Wellnhofer 218264fa 2025-01-30T01:26:01 parser: Always shrink input buffer Shrinking the input buffer is cheap now and should be done as soon as possible.
Nick Wellnhofer 0de90f51 2025-01-30T01:25:31 parser: Define SIZE_MAX
Nick Wellnhofer 3eced32e 2025-01-29T23:49:56 parser: Fix push parser with encoding and single chunk When push-parsing with an encoding handler, we must convert the whole buffer in the initial conversion. Otherwise, parsing a single chunk larger than ~4KB would fail. Regressed with commit 34c9108f.
Nick Wellnhofer 4bd66d45 2025-01-29T13:11:38 Mention contributors in Copyright To clarify that libxml2 is the work of many people, add the following copyright notice to Copyright: Copyright (C) The Libxml2 Contributors.
Nick Wellnhofer fdc73dd0 2025-01-29T12:58:31 README: Fix CMake example options zlib is disabled by default now.
Nick Wellnhofer 64bfe1f7 2025-01-29T12:48:50 README: Add note about security issues
Nick Wellnhofer 93506d41 2025-01-29T00:17:01 parser: Make catalog PIs opt-in This is an obscure feature that shouldn't be enabled by default.
Nick Wellnhofer 1082d813 2025-01-28T23:21:34 parser: Prepare to make decompression opt-in Add a new parser option XML_PARSE_UNZIP that enables decompression. xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set this option currently, but downstream users should start to set the option if they really need it.
Nick Wellnhofer a78843be 2025-01-28T20:13:58 xmllint: Support compressed input from stdin Another regression related to reading from stdin. Making a "-" filename read from stdin was deeply baked into the core IO code but is inherently insecure. I really want to reenable this dangerous feature as sparingly as possible. This now enables compressed input when using the "Fd" API functions which wan't supported before. But XML_PARSE_NO_UNZIP will be inverted later. Allow compressed stdin in xmlReadFile to support xmlstarlet and older versions of xsltproc. So far, these are the only known command-line tools that rely on "-" meaning stdin.
Nick Wellnhofer a8d8a70c 2025-01-27T13:31:08 uri: Fix handling of Windows drive letters Allow drive letters in URI paths. Technically, these should be treated as URI schemes, but this is not what users expect. This also makes sure that paths with drive letters are resolved as filesystem paths and unescaped, for example when used in libxslt's document() function. Should fix #832.
Nick Wellnhofer 6904d4c2 2025-01-25T13:54:15 fuzz: Fix OSS-Fuzz build of lint fuzzer
Benjamin Gilbert cd7299a8 2025-01-24T18:59:12 meson: Fix setup with ICU as sibling subproject Meson wrapdb provides a wrap for ICU, so libxml2 and ICU could both be built as subprojects of the same Meson parent project. In this case, with the icu option enabled, setup was failing with: subprojects/libxml2-2.13.5/meson.build:603:22: ERROR: Could not get an internal variable and no default provided for <InternalDependency dep228908115162702543524838879388991448872: True> This is because we can't get a dependency variable from a subproject that hasn't been built yet. Fall back to assuming DEFS is empty, as it is on my system.