Log

Author Commit Date CI Message
Nick Wellnhofer edc2dd48 2023-09-04T16:07:23 dict: Update hash function Update hash function from classic Jenkins OAAT (dict.c) and a variant of DJB2 (hash.c) to "GoodOAAT" taken from the SMHasher repo. This hash function passes all SMHasher tests.
James Le Cuirot 93e8bb2a 2023-09-02T17:12:58 build: Generate better pkg-config files for static-only builds pkg-config supports `Requires.private` and `Libs.private` fields for static linking. However, if you're building a dynamic binary, then pkg-config will use the non-private fields, even if just the static libxml2 is available. This will result in libxml2 being underlinked, causing the build to fail. The solution is to fold the private fields into the non-private fields when the shared libxml2 is not being built. This works for Autotools and CMake. Meson also knows how to handle this when it automatically generates pkg-config files.
James Le Cuirot 4640ccac 2023-09-02T16:18:30 build: Generate better pkg-config file for SYSROOT builds The -I and -L flags you use to build should not necessarily be the same ones you bake into installed files. If you are building with dependencies located under a SYSROOT then the installed files should have no knowledge of that SYSROOT. For example, if the build requires `-L/path/to/sysroot/usr/lib/foo` then only `-L/usr/lib/foo` should be baked into the installed files. pkg-config is SYSROOT-aware, so this issue can be sidestepped by using the `Requires` field rather than the `Libs` and `Cflags` fields. This is easily resolved if you rely solely on pkg-config, but this project falls back to standard Autoconf checks, so a little more effort is required. Unfortunately, this issue cannot feasibly be resolved for CMake. `find_package` is used rather than `pkg_check_modules`, so we cannot tell whether a pkg-config file for each dependency is present or not, even if `find_package` uses pkg-config behind the scenes. The CMake build does not record any dependency -I or -L flags into the pkg-config file anyway. This is a problem in itself, although these dependencies are most likely installed to standard locations. Meson is very much better at handling this, as it generates the pkg-config file automatically using the correct logic.
Nick Wellnhofer 54a0b19a 2023-09-01T14:52:14 autoconf: Allow custom --with-icu configure option
Nick Wellnhofer c5989473 2023-09-01T14:52:11 dict: Use thread-local storage for PRNG state
Nick Wellnhofer 57cfd221 2023-09-01T14:52:04 dict: Use xoroshiro64** as PRNG Stop using rand_r. This enables hash randomization on all platforms.
Nick Wellnhofer 6d7aaaa8 2023-09-01T14:51:55 dict: Tune hash table growth Introduce load factor as main trigger and increase MAX_HASH_LEN. This should make growth behavior more predictable. Raise size limit to INT_MAX. This avoids quadratic behavior with larger tables.
Nick Wellnhofer 4b8f7cf0 2023-09-01T13:07:27 hash: Fix integer overflow of nbElems
Nick Wellnhofer bfd7d286 2023-08-29T21:16:34 xmllint: Fix more error messages
Nick Wellnhofer 373244bc 2023-08-29T21:05:32 xmllint: Fix error message when push parsing empty documents
Nick Wellnhofer 53050b1d 2023-08-29T20:06:43 parser: More fixes to push parser error handling
Nick Wellnhofer bbd918b2 2023-08-29T15:56:37 parser: Fix detection of null bytes Also suppress misleading extra errors. Fixes #122.
Nick Wellnhofer c6083a32 2023-08-29T16:30:22 parser: Improve error handling in push parser - Report errors earlier - Align error messages with pull parser
Nick Wellnhofer 1edae30f 2023-08-29T15:58:22 parser: Don't check inputNr in xmlParseTryOrFinish There's no apparent reason for this check. inputNr should always be 1 here.
Nick Wellnhofer e48f2695 2023-08-29T17:41:18 parser: Remove push parser debugging code
Nick Wellnhofer cde44997 2023-08-27T16:35:23 SAX2: Allow multiple top-level elements When parsing with HTML_PARSE_NOIMPLIED, the result document can contain multiple top-level elements. Rework xmlSAX2StartElement to simply add the element as a child of ctxt->node or ctxt->myDoc. Don't invoke xmlAddSibling for non-element parents. The context node should always be an element node. Fixes #584.
Nick Wellnhofer d39f7806 2023-08-23T20:24:24 tree: Fix copying of DTDs - Don't create multiple DTD nodes. - Fix UAF if malloc fails. - Skip DTD nodes if tree module is disabled. Fixes #583.
Nick Wellnhofer 4e4c89a4 2023-08-21T00:26:01 doc: Improve documentation of configuration options
Nick Wellnhofer 778cca38 2023-08-20T22:50:57 legacy: Add stubs for disabled modules When legacy support is requested, always enable stubs for FTP and XPointer location modules which were removed from the standard configuration. Going forward, the --with-legacy configuration option should be used to provide maximum ABI compatibility. Fixes #433.
Nick Wellnhofer ed3bd052 2023-08-20T20:48:10 parser: Allow to set maximum amplification factor
Nick Wellnhofer 9d80a2b1 2023-08-16T19:45:34 entities: Don't change doc when encoding entities doc->encoding shouldn't be touched by xmlEncodeEntitiesInternal.
Nick Wellnhofer f1c1f5c6 2023-08-16T19:43:02 parser: Revert change to doc->encoding Fixes #579.
Nick Wellnhofer 61b8e097 2023-08-16T19:20:47 parser: Never use UTF-8 encoding handler
Nick Wellnhofer 507f11ed 2023-08-16T15:43:47 encoding: Remove debugging code
Nick Wellnhofer 138213ac 2023-08-15T12:49:27 python: Fix tests on MinGW Add the directory containing libxml2.dll with os.add_dll_directory to make tests work on MinGW. This has changed in Python 3.8 but for some reason, the issue only turned up with Python 3.11 on MinGW. Contrary to documentation, copying libxml2.dll into the directory containing the .pyd file doesn't work.
Nick Wellnhofer e2ab48b9 2023-08-14T15:05:30 malloc-fail: Fix unsigned integer overflow in xmlTextReaderPushData Return immediately if xmlParserInputBufferRead fails. Found by OSS-Fuzz, see #344.
Nick Wellnhofer 0d24fc0a 2023-08-14T12:53:49 html: Remove encoding hack in htmlCreateFileParserCtxt Switch encoding directly instead of calling htmlCheckEncoding with faked content.
Nick Wellnhofer 5db5a704 2023-08-09T18:39:14 html: Fix UAF in htmlCurrentChar Short-lived regression found by OSS-Fuzz.
Nick Wellnhofer b973ceaf 2023-08-09T18:37:20 parser: Fix mistake in xmlDetectEncoding Short-lived regression.
Nick Wellnhofer cb717d7e 2023-08-09T16:52:02 parser: Update line number after coalescing text nodes This should make the line number of text nodes deterministic. Before, it depended on the callback sequence which depends on the size of chunks fed to the parser.
Nick Wellnhofer 855818bd 2023-08-08T15:21:37 parser: Check for truncated multi-byte sequences When decoding input data, check whether the "raw" buffer is empty after parsing the document. Otherwise, the input ends with a truncated multi-byte sequence which shouldn't be silently ignored.
Nick Wellnhofer 95e81a36 2023-08-08T15:21:31 parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.
Nick Wellnhofer 834b8123 2023-08-08T15:21:28 parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.
Nick Wellnhofer 5aff27ae 2023-08-08T15:21:25 parser: Optimize xmlLoadEntityContent Load entity content via xmlParserInputBufferGrow, avoiding a copy. This also fixes an entity size accounting error.
Nick Wellnhofer facc2a06 2023-08-08T15:21:21 parser: Don't overwrite EOF parser state
Nick Wellnhofer 59fa0bb3 2023-08-08T15:21:14 parser: Simplify input pointer updates The base member always points to the beginning of the buffer.
Nick Wellnhofer c88ab7e3 2023-08-08T15:19:54 parser: Don't reinitialize parser input members The parser input struct should already be initialized.
Nick Wellnhofer 4ee08155 2023-08-08T15:19:51 encoding: Move rawconsumed accounting to xmlCharEncInput
Nick Wellnhofer a0462e2d 2023-08-08T15:19:49 test: Add push parser test with overridden encoding After recent changes, it should work to call xmlSwitchEncoding to override the encoding for the push parser. This was never properly supported, so Chromium and WebKit added a hack to reset the encoding in the startDocument SAX handler.
Nick Wellnhofer ec7be506 2023-08-08T15:19:46 parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.
Nick Wellnhofer d38e73f9 2023-08-08T15:19:44 parser: Always create UTF-8 in xmlParseReference It seems that this code path could only be triggered after an encoding error in recovery mode. Creating char-ref nodes is unnecessary and typically unexpected.
Nick Wellnhofer 131d0dc0 2023-08-08T15:19:39 parser: Don't use 'standalone' member of xmlParserInput The standalone declaration is only parsed in the main input stream.
Nick Wellnhofer d9ec182b 2023-08-08T15:19:36 parser: Don't detect encoding in xmlCtxtResetPush The encoding will be detected in xmlParseTryOrFinish.
Nick Wellnhofer 3a64f394 2023-08-08T15:19:25 html: Remove some debugging code in htmlParseTryOrFinish
Nick Wellnhofer 58de9d31 2023-08-03T12:00:55 valid: Fix c1->parent pointer in xmlCopyDocElementContent Fixes #572.
Nick Wellnhofer 75693281 2023-07-21T14:50:30 malloc-fail: Fix memory leak in xmlCompileAttributeTest Found by OSS-Fuzz, see #344.
Nick Wellnhofer 90bcbcfc 2023-07-20T21:08:01 parser: Fix potential use-after-free in xmlParseCharDataInternal Return immediately if a SAX handler stops the parser. Fixes #569.
Nick Wellnhofer 88447447 2023-06-23T23:04:30 parser: Fix typo in previous commit
Nick Wellnhofer 9d0541dd 2023-06-22T18:06:53 parser: Make xmlSwitchEncoding always skip the BOM Chromium calls xmlSwitchEncoding from the start document handler and relies on this function to skip the BOM. Commit 98840d40 changed the behavior when switching to UTF-16 since inspecting the input buffer at this point is fragile. Revert part of the commit to also skip a potential (decoded UTF-8) BOM when switching to UTF-16. Make sure that we do this only at the start of an input stream to avoid U-FEFF characters being lost. BOM handling should ultimately be moved to the parsing code to avoid such bugs. See https://bugs.chromium.org/p/chromium/issues/detail?id=1451026
Christoph Reiter 2473b485 2023-06-21T14:15:02 autotools: fix Python module file ext for cygwin/msys2 both use .dll, not .pyd
David Kilzer 5f54bac9 2023-06-10T10:50:02 testapi: test_xmlSAXDefaultVersion() leaves xmlSAX2DefaultVersionValue set to 1 with LIBXML_SAX1_ENABLED Add code to save and to restore the default value of xmlSAX2DefaultVersionValue. Fixes #554.
Nick Wellnhofer b236b7a5 2023-06-08T21:53:05 parser: Halt parser when growing buffer results in OOM Fix short-lived regression from previous commit. It might be safer to make xmlBufSetInputBaseCur use the original buffer even in case of errors. Found by OSS-Fuzz.
Nick Wellnhofer 20f5c734 2023-06-07T14:05:34 parser: Recover more input from encoding errors Don't halt the parser in xmlParserGrow to allow more input to be recovered in case of encoding errors. Fixes #543.
Nick Wellnhofer db21cd5d 2023-06-06T14:25:30 malloc-fail: Handle malloc failures in xmlAddEncodingAlias Avoid memory errors if an allocation fails. See #344. Fixes #553.
Nick Wellnhofer 305a75cc 2023-06-06T13:15:46 malloc-fail: Fix null-deref with xmllint --copy See #344. Fixes #552.
Nick Wellnhofer 6273df6c 2023-05-30T12:30:27 xpath: Ignore entity ref nodes when computing node hash XPath queries only work reliably if entities are substituted. Nevertheless, it's possible to query a document with entity reference nodes. xmllint even deletes entities when the `--dropdtd` option is passed, resulting in dangling pointers, so it's best to skip entity reference nodes to avoid a use-after-free. Fixes #550.
Nick Wellnhofer e2f21c22 2023-05-25T13:01:48 win32: Deprecate old Windows build system
Nick Wellnhofer 1e8ab697 2023-05-25T03:03:33 gitlab-ci: Lower _XOPEN_SOURCE value
Nick Wellnhofer cb8ccb10 2023-05-25T03:07:57 testapi: Don't set http_proxy environment variable We already disable network access, so this has no effect.
Nick Wellnhofer 9fd57df8 2023-05-25T02:37:57 autotools: Improve iconv check Use a custom test program which includes iconv.h, so we can check whether the possibly redefined symbols in this header file match the symbols in the iconv library. Should fix #547.
Nick Wellnhofer c3c6cc62 2023-05-24T20:08:33 runtest: Fix compilation without LIBXML_HTML_ENABLED Fixes #545.
Nick Wellnhofer 981093ab 2023-05-18T19:23:58 test: Add push parser tests for split UTF-8 sequences
Nick Wellnhofer e0f3016f 2023-05-18T17:31:44 parser: Fix regression when push parsing UTF-8 sequences Partial UTF-8 sequences are allowed when push parsing. Fixes #542.
Nick Wellnhofer 687a2b71 2023-05-08T17:05:13 xinclude: Lower initial table size when fuzzing We don't have test cases with many documents, so set the initial table size to 1 when fuzzing, so there is a chance to detect reallocation issues.
Nick Wellnhofer c40cbf07 2023-05-08T17:03:00 malloc-fail: Fix null deref after xmlXIncludeNewRef See #344.
Nick Wellnhofer 105ce73d 2023-05-08T16:45:28 xinclude: Fix false positives in inclusion loop detection xmlXIncludeRecurseDoc can realloc the cache.
Nick Wellnhofer bdb5667a 2023-05-10T18:13:47 autotools: Fix ICU detection Fixes #540.
Nick Wellnhofer 9dae389c 2023-05-09T13:28:06 parser: Fix "huge input lookup" error with push parser Fix parsing of larger documents without XML_PARSE_HUGE. Should fix #538.
Nick Wellnhofer b8961df6 2023-05-09T03:25:24 SAX: Always validate xml:ids The behavior shouldn't depend on mostly random configuration options.
Nick Wellnhofer f24ffddb 2023-05-08T23:33:04 Stop using sprintf Switch remaining users to snprintf.
Nick Wellnhofer 01723fc6 2023-05-08T23:12:33 xpath: Fix build without LIBXML_XPATH_ENABLED Move static function declaration into XPATH block. Also move comparison functions. Fixes #537.
Nick Wellnhofer 235b15a5 2023-05-08T17:58:02 SAX: Always initialize SAX1 element handlers Follow-up to commit d0c3f01e. A parser context will be initialized to SAX version 2, but this can be overridden with XML_PARSE_SAX1 later, so we must initialize the SAX1 element handlers as well. Change the check in xmlDetectSAX2 to only look for XML_SAX2_MAGIC, so we don't switch to SAX1 if the SAX2 element handlers are NULL.
Mike Dalessio 34630630 2023-05-05T17:34:57 autoconf: fix iconv library paths and pass cflags when building executables See 0f77167f for prior related work
Nick Wellnhofer d0c3f01e 2023-05-06T17:47:37 parser: Fix old SAX1 parser with custom callbacks For some reason, xmlCtxtUseOptionsInternal set the start and end element SAX handlers to the internal DOM builder functions when XML_PARSE_SAX1 was specified. This means that custom SAX handlers could never work with that flag because these functions would receive the wrong user data argument and crash immediately. Fixes #535.
Nick Wellnhofer 06a2c251 2023-05-06T15:28:13 hash: Fix possible startup crash with old libxslt versions Call xmlInitParser in xmlHashCreate to make it work if the library wasn't initialized yet. Otherwise, exsltRegisterAll from libxslt 1.1.24 or older might cause a crash. See #534.
Nick Wellnhofer a800b7e0 2023-05-04T12:47:00 regexp: Fix null deref in xmlFAFinishReduceEpsilonTransitions Short-lived regression found by OSS-Fuzz.
Nick Wellnhofer 8d5e33ef 2023-05-03T20:42:10 Fix compiler warning on GCC < 8 -Wcast-function-type is only available since GCC 8.
Nick Wellnhofer d6882f64 2023-05-03T18:33:20 threads: Fix startup crash with weak symbol hack Fix another issue when running with older libc, threads and libpthread not linked in.
Nick Wellnhofer 7f3f3f11 2023-05-03T03:20:14 dict: Raise MAX_DICT_HASH limit This fixes quadratic behavior with large dictionaries. Also rework testdict.c to support tests with larger dictionaries.
Nick Wellnhofer 11a95279 2023-05-02T13:32:24 win32: Don't depend on removed .def file Fixes broken build after 21cec82b. Fixes #532.
Nick Wellnhofer c613ab14 2023-05-02T00:32:50 regexp: Fix mistake in previous commit The `ret = 0` line should have been deleted. Fixes #531.
Nick Wellnhofer a06eaa61 2023-03-09T06:58:24 regexp: Fix determinism checks Swap arguments in initial call to xmlFARecurseDeterminism. Fix the check whether we revisit the initial state in xmlFARecurseDeterminism. If there are transitions with equal atoms and targets but different counters, treat the regex as deterministic but mark the transitions as non-deterministic internally. Don't overwrite zero return value of xmlFAComputesDeterminism with non-zero value from xmlFARecurseDeterminism. Most of these errors lead to non-deterministic regexes not being detected which typically isn't an issue. The improved code may break users who relied on buggy behavior or cause other bugs to become visible. Fixes #469.
Nick Wellnhofer e301865e 2023-03-09T05:34:38 regexp: Fix checks for eliminated transitions 'to' can be set to -1 or -2 when eliminating transitions, so check for all negative values.
Nick Wellnhofer 90759c59 2023-03-09T16:34:11 regexp: Simplify xmlFAReduceEpsilonTransitions
Nick Wellnhofer 9f7b1142 2023-03-09T05:25:09 regexp: Fix cycle check in xmlFAReduceEpsilonTransitions The visited flag must only be reset after the first call to xmlFAReduceEpsilonTransitions has finished. Visiting states multiple times could lead to unnecessary processing of duplicate transitions. Similar to 68eadabd.
Nick Wellnhofer 4f49017e 2023-04-30T21:26:55 tests: Test streaming schema validation
Nick Wellnhofer d88763cc 2023-04-30T21:26:03 schemas: Fix filename in xmlSchemaValidateFile Make sure that filename appears in error messages.
Nick Wellnhofer 165f3436 2023-04-30T21:24:50 schemas: Fix line numbers in streaming validation
Nick Wellnhofer 57d88da6 2023-04-30T21:30:21 schemas: Fix memory leak in xmlSchemaValidateStream Regressed in 9a82b94a. Fixes #530.
Nick Wellnhofer 0ffc2d82 2023-04-30T20:28:47 runtest: Skip element name in schema error messages This makes sure that memory and streaming tests will report the same messages.
Nick Wellnhofer 550eaac6 2023-04-30T19:40:43 writer: Add error check in xmlTextWriterEndDocument
Nick Wellnhofer 2f12e3a9 2023-04-30T18:46:05 encoding: Stop calling xmlEncodingErr This invokes the global error handler which should be avoided.
Nick Wellnhofer b230861d 2023-04-30T18:38:16 xmlIO: Remove some calls to xmlIOErr The xmlIOErr functions use the global error handler and should be avoided if possible.
Nick Wellnhofer 320f5084 2023-04-30T18:25:09 parser: Improve handling of encoding and IO errors Make sure that xmlCharEncInput, xmlParserInputBufferPush and xmlParserInputBufferGrow set the correct error code in the xmlParserInputBuffer. Handle errors when calling these functions.
Nick Wellnhofer fc69cf56 2023-04-30T17:51:29 parser: Move xmlFatalErr to parserInternals.c
Nick Wellnhofer 3ff6abbf 2023-02-22T17:11:20 encoding: Rework error codes Use an enum instead of magic numbers. Fix a few error codes. Simplify handling of "space" and "partial" errors. See #506.
Nick Wellnhofer b463b38b 2023-04-30T16:19:28 .gitignore: Split up and rearrange .gitignore files
Nick Wellnhofer 0260de55 2023-04-30T16:00:44 .gitignore: Add runsuite.log
Nick Wellnhofer 886bf4e6 2023-04-30T15:35:47 Stop calling xmlMemoryDump This was used to check for memory leaks but could potentially create a .memdump file. These days, there are better ways to check for memory leaks.
Nick Wellnhofer fc119e32 2023-04-30T15:28:12 examples: Don't call xmlCleanupParser and xmlMemoryDump xmlCleanupParser is dangerous and shouldn't be called in most cases. Being part of the examples led many people to use it incorrectly. xmlMemoryDump is an obsolete way to test for memory leaks.