Log

Author Commit Date CI Message
Nick Wellnhofer 9fc5090c 2023-09-16T19:58:42 hash: Clean up libxml/hash.h Rename variables, fix subincludes, whitespace.
Nick Wellnhofer de4b270a 2023-09-21T14:31:31 autotools: Make --with-minimum disable lzma support Fix an oversight when handling the --with-minimum option.
Nick Wellnhofer f9d717af 2023-09-21T13:05:49 fuzz: Allow to fuzz without push, reader or output modules
Nick Wellnhofer fe1bfb34 2023-09-21T12:33:46 gitlab-ci: Add a "medium" config build Also run CI tests with a build where most modules except a few are disabled. This is the minimum configuration required for libxslt: --with-tree --with-xpath --with-output --with-html Also add --with-threads.
Nick Wellnhofer e7f0d88b 2023-09-21T01:38:26 build: Remove some GCC warnings -Wnested-externs produces spurious warnings after implicit declaration of functions. -Winline is useless since we don't use inlines. -Wredundant-decls was already removed for autotools.
Nick Wellnhofer da274bfa 2023-09-21T01:29:40 build: Fix build when certain modules are disabled
Nick Wellnhofer 9b5cce7a 2023-09-21T00:44:50 include: Remove more unnecessary includes
Nick Wellnhofer f0e8358e 2023-09-20T23:07:58 globals: Final fixes
Nick Wellnhofer d6ba4033 2023-09-20T20:49:59 globals: Move remaining declarations to correct places globals.h is now deprecated. Sanity is restored.
Nick Wellnhofer 1117fae0 2023-09-20T19:20:41 include: Remove unneeded includes
Nick Wellnhofer 736327df 2023-09-20T19:09:15 include: Break inclusion cycle between tree.h and xmlregexp.h
Nick Wellnhofer 699299ca 2023-09-20T18:54:39 globals: Stop including globals.h
Nick Wellnhofer d1336fd3 2023-09-20T17:00:50 globals: Move malloc hooks back to xmlmemory.h
Nick Wellnhofer 39a275a5 2023-09-18T21:25:35 globals: Define globals using macros Declare and define globals and helper functions by (ab)using the preprocessor.
Nick Wellnhofer a77f9ab8 2023-09-20T16:57:22 globals: Don't include SAX2.h from globals.h
Nick Wellnhofer 2e6c49a7 2023-09-20T14:43:14 globals: Don't store xmlParserVersion in global state This is a constant.
Nick Wellnhofer 0830fcfa 2023-09-20T14:30:12 globals: Deprecate xmlLastError The last error should be accessed with xmlGetLastError.
Nick Wellnhofer db8b9722 2023-09-20T13:56:16 parser: Deprecate global parser options Note that setting global options has no effect anyway when using any of the modern parser API functions which take an option argument like xmlReadMemory or when using xmlCtxtUseOptions. Global options only have an effect when using old API functions xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling xmlCtxtUseOptions. Unfortunately, many downstream projects still modify global parser options often without realizing that it has no effect. If necessary, switch to the modern API. Then you can safely remove all code that changes global options. Here's a list of deprecated functions and global variables together with the corresponding parser options. - xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue Parser option XML_PARSE_NOENT - xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue Inverse of parser option XML_PARSE_NOBLANKS - xmlPedanticParserDefault, xmlPedanticParserDefaultValue Parser option XML_PARSE_PEDANTIC - xmlLineNumbersDefault, xmlLineNumbersDefaultValue Always enabled by new API - xmlDoValidityCheckingDefaultValue Parser option XML_PARSE_DTDVALID - xmlGetWarningsDefaultValue Inverse of parser option XML_PARSE_NOWARNING - xmlLoadExtDtdDefaultValue Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR
Nick Wellnhofer 209516ac 2023-09-20T15:49:03 tests: Don't use deprecated symbols
Nick Wellnhofer 692a5c40 2023-09-20T13:51:26 xmllint: Don't set deprecated globals
Nick Wellnhofer ea29b951 2023-09-20T13:30:01 globals: Abort if lazy allocation of global state failed There's really nothing we can do in this situation, so it's better to abort with an error message.
Nick Wellnhofer 868b94b8 2023-09-20T13:10:29 globals: Reformat libxml/globals.h
Nick Wellnhofer bbf08608 2023-09-20T13:05:02 globals: Move buffer callback declarations to xmlIO.h
Nick Wellnhofer 11a1839d 2023-09-20T17:54:48 globals: Move remaining globals back to correct header files This undoes a lot of damage.
Nick Wellnhofer dc3382ef 2023-09-20T12:58:03 globals: Move xmlRegisterNodeDefault to tree.c Code in globals.c must not try to access globals itself since the accessor macros aren't defined and we would only see the main variable.
Nick Wellnhofer 75976742 2023-09-20T12:45:14 globals: Add a few comments
Nick Wellnhofer 7909ff08 2023-09-20T17:38:26 include: Remove unnecessary includes - Don't include tree.h from encoding.h - Don't include parser.h from xmlIO.h
Nick Wellnhofer ecbd634c 2023-09-19T17:21:30 threads: Fix double-checked locking in xmlInitParser Hopefully work around the classic problem with double-checked locking: Another thread could read xmlParserInitialized == 1 but doesn't see other initialization results yet due to compiler or hardware reordering. While unlikely, this seems theoretically possible. The solution is to add a memory barrier after initializing the data but before setting xmlParserInitialized. It might be enough to use a second initialization flag which is only used inside the locked section and update xmlParserInitialized after unlocking. But I haven't seen this approach in many articles discussing this issue, so it's possibly flawed as well.
Nick Wellnhofer f7a403c2 2023-09-19T13:52:53 globals: Move xmlIsMainThread to globals.c xmlIsMainThread is mainly needed for global variables.
Nick Wellnhofer eb985d6f 2023-09-20T17:17:49 globals: Move error globals back to xmlerror.c
Nick Wellnhofer b173b724 2023-09-19T13:17:00 globals: Use thread-local storage if available Also use thread-local storage to store globals on POSIX platforms. Most importantly, this makes sure that global variable access can't fail when allocating the global state struct.
Nick Wellnhofer e7b6ca15 2023-09-18T13:25:06 globals: Rework global state destruction on Windows If DllMain is used, rely on it working as expected. The old code seemed to attempt to free global state of other threads if, for some reason, the DllMain mechanism didn't work. In a static build, register a destructor with RegisterWaitForSingleObject. Make public functions xmlGetGlobalState and xmlInitializeGlobalState no-ops. Move initialization and registration of global state objects to xmlInitGlobalState. Lookup global state with xmlGetThreadLocalStorage which can be inlined nicely. Also cleanup global state when using TLS. xmlLastError must be reset.
Nick Wellnhofer bf6bd161 2023-09-18T19:53:31 globals: Introduce xmlCheckThreadLocalStorage Checks whether (emulated) thread-local storage could be allocated.
Nick Wellnhofer 89f49767 2023-09-18T18:44:32 globals: Make xmlGlobalState private This removes a public struct but it seems impossible to use its members in a sensible way from external code.
Nick Wellnhofer a07ec7c1 2023-09-18T17:39:13 threads: Move library initialization code to threads.c This allows to consolidate the initialization code since the global init lock was already implemented in threads.c.
Nick Wellnhofer 4e1c13eb 2023-09-18T14:45:10 debug: Remove debugging code This is barely useful these days and only clutters the code base.
Nick Wellnhofer c19771c1 2023-09-18T00:54:39 globals: Move code from threads.c to globals.c Move all code that handles globals to the place where it belongs.
Nick Wellnhofer 2a4b8114 2023-09-17T23:16:49 globals: Rename members of xmlGlobalState This is a deliberate first step to remove some internals from the public API and to avoid issues when redefining tokens.
Nick Wellnhofer d7cfe356 2023-09-14T20:52:24 parser: Avoid undefined behavior in xmlParseStartTag2 Instead of using arithmetic on dangling pointers, store ptrdiff_t values in void pointers which is at least implementation-defined.
Nick Wellnhofer 90d5b799 2023-09-14T15:30:38 schemas: Fix memory leak of annotations in notations Found by OSS-Fuzz.
Markus Rickert 99cba4b3 2023-09-09T17:46:34 Handle NOCONFIG case when setting locations from CMake target properties
Nick Wellnhofer 4aa08c80 2023-09-08T14:52:22 xinclude: Fix 'last' pointer in xmlXIncludeCopyNode Also set the 'last' pointer for the root node. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/93
James Le Cuirot f369154f 2023-09-03T22:14:01 cmake: Generate better pkg-config file for SYSROOT builds under CMake I recently fixed this for Autotools but said that fixing this for CMake was not feasible due to it using `find_package` rather than `pkg_check_modules`. I then thought about it and couldn't find any reason why CMake couldn't try `pkg_check_modules` first and then fall back to `find_package`, as that's basically what Autotools does. I had wanted to use the linker flags generated by CMake when it does fall back to `find_package`, but it only returns direct paths to the libraries, as opposed to `-l` flags. Baking these library paths into the pkg-config and xml2-config files would break static linking and cross-compiling, so I've stuck with the `-l` flags we already have. There is no need to set `CMAKE_REQUIRED_LIBRARIES` because we already add the dependencies to the library target.
James Le Cuirot 5a18c505 2023-09-04T09:30:38 autoconf: Include non-pkg-config dependency flags in the pkg-config file These were present before, but I accidentally dropped them in my recent build improvements.
James Le Cuirot 6864d92f 2023-09-04T09:25:44 autoconf: Don't bake build time CFLAGS into pkg-config file Having slept on it, I've realised that baking the dependency CFLAGS into the pkg-config file is pointless when it is only used to link against them. It may even cause problems.
Nick Wellnhofer efcaeadc 2023-09-04T16:00:53 hash: Fix use-of-uninitialized-value Short-lived regression.
Nick Wellnhofer 05c28305 2023-09-04T15:50:22 dict: Stop using uint32_t stdint.h is a C99 header.
Nick Wellnhofer f45abbd3 2023-09-04T15:31:04 dict: Fix integer overflow of string lengths Fixes #546.
Nick Wellnhofer edc2dd48 2023-09-04T16:07:23 dict: Update hash function Update hash function from classic Jenkins OAAT (dict.c) and a variant of DJB2 (hash.c) to "GoodOAAT" taken from the SMHasher repo. This hash function passes all SMHasher tests.
James Le Cuirot 93e8bb2a 2023-09-02T17:12:58 build: Generate better pkg-config files for static-only builds pkg-config supports `Requires.private` and `Libs.private` fields for static linking. However, if you're building a dynamic binary, then pkg-config will use the non-private fields, even if just the static libxml2 is available. This will result in libxml2 being underlinked, causing the build to fail. The solution is to fold the private fields into the non-private fields when the shared libxml2 is not being built. This works for Autotools and CMake. Meson also knows how to handle this when it automatically generates pkg-config files.
James Le Cuirot 4640ccac 2023-09-02T16:18:30 build: Generate better pkg-config file for SYSROOT builds The -I and -L flags you use to build should not necessarily be the same ones you bake into installed files. If you are building with dependencies located under a SYSROOT then the installed files should have no knowledge of that SYSROOT. For example, if the build requires `-L/path/to/sysroot/usr/lib/foo` then only `-L/usr/lib/foo` should be baked into the installed files. pkg-config is SYSROOT-aware, so this issue can be sidestepped by using the `Requires` field rather than the `Libs` and `Cflags` fields. This is easily resolved if you rely solely on pkg-config, but this project falls back to standard Autoconf checks, so a little more effort is required. Unfortunately, this issue cannot feasibly be resolved for CMake. `find_package` is used rather than `pkg_check_modules`, so we cannot tell whether a pkg-config file for each dependency is present or not, even if `find_package` uses pkg-config behind the scenes. The CMake build does not record any dependency -I or -L flags into the pkg-config file anyway. This is a problem in itself, although these dependencies are most likely installed to standard locations. Meson is very much better at handling this, as it generates the pkg-config file automatically using the correct logic.
Nick Wellnhofer 54a0b19a 2023-09-01T14:52:14 autoconf: Allow custom --with-icu configure option
Nick Wellnhofer c5989473 2023-09-01T14:52:11 dict: Use thread-local storage for PRNG state
Nick Wellnhofer 57cfd221 2023-09-01T14:52:04 dict: Use xoroshiro64** as PRNG Stop using rand_r. This enables hash randomization on all platforms.
Nick Wellnhofer 6d7aaaa8 2023-09-01T14:51:55 dict: Tune hash table growth Introduce load factor as main trigger and increase MAX_HASH_LEN. This should make growth behavior more predictable. Raise size limit to INT_MAX. This avoids quadratic behavior with larger tables.
Nick Wellnhofer 4b8f7cf0 2023-09-01T13:07:27 hash: Fix integer overflow of nbElems
Nick Wellnhofer bfd7d286 2023-08-29T21:16:34 xmllint: Fix more error messages
Nick Wellnhofer 373244bc 2023-08-29T21:05:32 xmllint: Fix error message when push parsing empty documents
Nick Wellnhofer 53050b1d 2023-08-29T20:06:43 parser: More fixes to push parser error handling
Nick Wellnhofer bbd918b2 2023-08-29T15:56:37 parser: Fix detection of null bytes Also suppress misleading extra errors. Fixes #122.
Nick Wellnhofer c6083a32 2023-08-29T16:30:22 parser: Improve error handling in push parser - Report errors earlier - Align error messages with pull parser
Nick Wellnhofer 1edae30f 2023-08-29T15:58:22 parser: Don't check inputNr in xmlParseTryOrFinish There's no apparent reason for this check. inputNr should always be 1 here.
Nick Wellnhofer e48f2695 2023-08-29T17:41:18 parser: Remove push parser debugging code
Nick Wellnhofer cde44997 2023-08-27T16:35:23 SAX2: Allow multiple top-level elements When parsing with HTML_PARSE_NOIMPLIED, the result document can contain multiple top-level elements. Rework xmlSAX2StartElement to simply add the element as a child of ctxt->node or ctxt->myDoc. Don't invoke xmlAddSibling for non-element parents. The context node should always be an element node. Fixes #584.
Nick Wellnhofer d39f7806 2023-08-23T20:24:24 tree: Fix copying of DTDs - Don't create multiple DTD nodes. - Fix UAF if malloc fails. - Skip DTD nodes if tree module is disabled. Fixes #583.
Nick Wellnhofer 4e4c89a4 2023-08-21T00:26:01 doc: Improve documentation of configuration options
Nick Wellnhofer 778cca38 2023-08-20T22:50:57 legacy: Add stubs for disabled modules When legacy support is requested, always enable stubs for FTP and XPointer location modules which were removed from the standard configuration. Going forward, the --with-legacy configuration option should be used to provide maximum ABI compatibility. Fixes #433.
Nick Wellnhofer ed3bd052 2023-08-20T20:48:10 parser: Allow to set maximum amplification factor
Nick Wellnhofer 9d80a2b1 2023-08-16T19:45:34 entities: Don't change doc when encoding entities doc->encoding shouldn't be touched by xmlEncodeEntitiesInternal.
Nick Wellnhofer f1c1f5c6 2023-08-16T19:43:02 parser: Revert change to doc->encoding Fixes #579.
Nick Wellnhofer 61b8e097 2023-08-16T19:20:47 parser: Never use UTF-8 encoding handler
Nick Wellnhofer 507f11ed 2023-08-16T15:43:47 encoding: Remove debugging code
Nick Wellnhofer 138213ac 2023-08-15T12:49:27 python: Fix tests on MinGW Add the directory containing libxml2.dll with os.add_dll_directory to make tests work on MinGW. This has changed in Python 3.8 but for some reason, the issue only turned up with Python 3.11 on MinGW. Contrary to documentation, copying libxml2.dll into the directory containing the .pyd file doesn't work.
Nick Wellnhofer e2ab48b9 2023-08-14T15:05:30 malloc-fail: Fix unsigned integer overflow in xmlTextReaderPushData Return immediately if xmlParserInputBufferRead fails. Found by OSS-Fuzz, see #344.
Nick Wellnhofer 0d24fc0a 2023-08-14T12:53:49 html: Remove encoding hack in htmlCreateFileParserCtxt Switch encoding directly instead of calling htmlCheckEncoding with faked content.
Nick Wellnhofer 5db5a704 2023-08-09T18:39:14 html: Fix UAF in htmlCurrentChar Short-lived regression found by OSS-Fuzz.
Nick Wellnhofer b973ceaf 2023-08-09T18:37:20 parser: Fix mistake in xmlDetectEncoding Short-lived regression.
Nick Wellnhofer cb717d7e 2023-08-09T16:52:02 parser: Update line number after coalescing text nodes This should make the line number of text nodes deterministic. Before, it depended on the callback sequence which depends on the size of chunks fed to the parser.
Nick Wellnhofer 855818bd 2023-08-08T15:21:37 parser: Check for truncated multi-byte sequences When decoding input data, check whether the "raw" buffer is empty after parsing the document. Otherwise, the input ends with a truncated multi-byte sequence which shouldn't be silently ignored.
Nick Wellnhofer 95e81a36 2023-08-08T15:21:31 parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.
Nick Wellnhofer 834b8123 2023-08-08T15:21:28 parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.
Nick Wellnhofer 5aff27ae 2023-08-08T15:21:25 parser: Optimize xmlLoadEntityContent Load entity content via xmlParserInputBufferGrow, avoiding a copy. This also fixes an entity size accounting error.
Nick Wellnhofer facc2a06 2023-08-08T15:21:21 parser: Don't overwrite EOF parser state
Nick Wellnhofer 59fa0bb3 2023-08-08T15:21:14 parser: Simplify input pointer updates The base member always points to the beginning of the buffer.
Nick Wellnhofer c88ab7e3 2023-08-08T15:19:54 parser: Don't reinitialize parser input members The parser input struct should already be initialized.
Nick Wellnhofer 4ee08155 2023-08-08T15:19:51 encoding: Move rawconsumed accounting to xmlCharEncInput
Nick Wellnhofer a0462e2d 2023-08-08T15:19:49 test: Add push parser test with overridden encoding After recent changes, it should work to call xmlSwitchEncoding to override the encoding for the push parser. This was never properly supported, so Chromium and WebKit added a hack to reset the encoding in the startDocument SAX handler.
Nick Wellnhofer ec7be506 2023-08-08T15:19:46 parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.
Nick Wellnhofer d38e73f9 2023-08-08T15:19:44 parser: Always create UTF-8 in xmlParseReference It seems that this code path could only be triggered after an encoding error in recovery mode. Creating char-ref nodes is unnecessary and typically unexpected.
Nick Wellnhofer 131d0dc0 2023-08-08T15:19:39 parser: Don't use 'standalone' member of xmlParserInput The standalone declaration is only parsed in the main input stream.
Nick Wellnhofer d9ec182b 2023-08-08T15:19:36 parser: Don't detect encoding in xmlCtxtResetPush The encoding will be detected in xmlParseTryOrFinish.
Nick Wellnhofer 3a64f394 2023-08-08T15:19:25 html: Remove some debugging code in htmlParseTryOrFinish
Nick Wellnhofer 58de9d31 2023-08-03T12:00:55 valid: Fix c1->parent pointer in xmlCopyDocElementContent Fixes #572.
Nick Wellnhofer 75693281 2023-07-21T14:50:30 malloc-fail: Fix memory leak in xmlCompileAttributeTest Found by OSS-Fuzz, see #344.
Nick Wellnhofer 90bcbcfc 2023-07-20T21:08:01 parser: Fix potential use-after-free in xmlParseCharDataInternal Return immediately if a SAX handler stops the parser. Fixes #569.
Nick Wellnhofer 88447447 2023-06-23T23:04:30 parser: Fix typo in previous commit
Nick Wellnhofer 9d0541dd 2023-06-22T18:06:53 parser: Make xmlSwitchEncoding always skip the BOM Chromium calls xmlSwitchEncoding from the start document handler and relies on this function to skip the BOM. Commit 98840d40 changed the behavior when switching to UTF-16 since inspecting the input buffer at this point is fragile. Revert part of the commit to also skip a potential (decoded UTF-8) BOM when switching to UTF-16. Make sure that we do this only at the start of an input stream to avoid U-FEFF characters being lost. BOM handling should ultimately be moved to the parsing code to avoid such bugs. See https://bugs.chromium.org/p/chromium/issues/detail?id=1451026
Christoph Reiter 2473b485 2023-06-21T14:15:02 autotools: fix Python module file ext for cygwin/msys2 both use .dll, not .pyd
David Kilzer 5f54bac9 2023-06-10T10:50:02 testapi: test_xmlSAXDefaultVersion() leaves xmlSAX2DefaultVersionValue set to 1 with LIBXML_SAX1_ENABLED Add code to save and to restore the default value of xmlSAX2DefaultVersionValue. Fixes #554.
Nick Wellnhofer b236b7a5 2023-06-08T21:53:05 parser: Halt parser when growing buffer results in OOM Fix short-lived regression from previous commit. It might be safer to make xmlBufSetInputBaseCur use the original buffer even in case of errors. Found by OSS-Fuzz.