Log

Author Commit Date CI Message
Nick Wellnhofer bc4e82ff 2023-09-22T13:37:28 globals: Don't use thread-local storage on Darwin It seems that thread-local storage destructors are run before pthread thread-specific data destructors on Darwin, defeating our scheme to use TSD to clean up TLS. Here's an example program that reports a use-after-free when compiled with `-fsanitize=address` on macOS: #include <pthread.h> typedef struct { int v; } my_struct; static _Thread_local my_struct tls; pthread_key_t key; void dtor(void *tsd) { my_struct *s = (my_struct *) tsd; /* * This will crash ASan, apparently because * TLS has already been freed. */ s->v = 1; } void *thread(void *p) { pthread_setspecific(key, &tls); return NULL; } int main(void) { pthread_key_create(&key, dtor); pthread_t handle; pthread_create(&handle, NULL, thread, NULL); pthread_join(handle, NULL); return 0; }
Nick Wellnhofer 45470611 2023-09-21T23:52:52 error: Make xmlGetLastError return a const error This is a slight break of the API, but users really shouldn't modify the global error struct. The goal is to make xmlLastError use static buffers for its strings eventually. This should warn people if they're abusing the struct.
Nick Wellnhofer fc26934e 2023-09-21T23:29:18 memory: Fix memory debugging with Windows threads On Windows, malloc hooks can be called after the final call to xmlCleanupParser in various tests. This means that xmlMemMutex can still be accessed if memory debugging is enabled, so the mutex should not be cleaned. This also means that tests may report spurious memory leaks on Windows. The old implementation avoided the issue by keeping track of all global state objects in a doubly linked list, so they could be cleaned during xmlCleanupParser. But as far as I can tell all memory will be freed eventually, so this is mostly an issue with our test suite.
Nick Wellnhofer 6eb2a00d 2023-09-21T22:58:02 tests: Update testapi.c
Nick Wellnhofer 8c084ebd 2023-09-21T22:57:33 doc: Make apibuild.py happy
Nick Wellnhofer e4091bcf 2023-09-21T22:54:57 doc: Allow 'unsigned' without 'int'
Nick Wellnhofer 46d7aaec 2023-09-21T22:54:30 doc: Add ignored tokens to apibuild.py
Nick Wellnhofer 6c4ea468 2023-09-21T21:31:52 python: Fix tests Revert part of commit 138213ac.
Nick Wellnhofer 05135536 2023-09-21T20:40:32 globals: Fix build --with-threads --without-output Fixes #593.
Nick Wellnhofer c5890716 2023-09-21T17:01:35 html: Fix logic in htmlAutoClose Note that the function is never called with a NULL newtag. Fixes #591.
Nick Wellnhofer 81741ea4 2023-09-21T16:29:28 xmlreader: Fix EOF detection in xmlTextReaderPushData
Nick Wellnhofer 89ee0369 2023-09-21T15:13:16 python: Fix potential crash in tests/thread2.py Memory debugging must be initialized.
Nick Wellnhofer 72262030 2023-09-21T14:52:14 parser: Readd some includes to parser.h and xmlreader.h Fix backward compatibility.
Nick Wellnhofer 9fc5090c 2023-09-16T19:58:42 hash: Clean up libxml/hash.h Rename variables, fix subincludes, whitespace.
Nick Wellnhofer de4b270a 2023-09-21T14:31:31 autotools: Make --with-minimum disable lzma support Fix an oversight when handling the --with-minimum option.
Nick Wellnhofer f9d717af 2023-09-21T13:05:49 fuzz: Allow to fuzz without push, reader or output modules
Nick Wellnhofer fe1bfb34 2023-09-21T12:33:46 gitlab-ci: Add a "medium" config build Also run CI tests with a build where most modules except a few are disabled. This is the minimum configuration required for libxslt: --with-tree --with-xpath --with-output --with-html Also add --with-threads.
Nick Wellnhofer e7f0d88b 2023-09-21T01:38:26 build: Remove some GCC warnings -Wnested-externs produces spurious warnings after implicit declaration of functions. -Winline is useless since we don't use inlines. -Wredundant-decls was already removed for autotools.
Nick Wellnhofer da274bfa 2023-09-21T01:29:40 build: Fix build when certain modules are disabled
Nick Wellnhofer 9b5cce7a 2023-09-21T00:44:50 include: Remove more unnecessary includes
Nick Wellnhofer f0e8358e 2023-09-20T23:07:58 globals: Final fixes
Nick Wellnhofer d6ba4033 2023-09-20T20:49:59 globals: Move remaining declarations to correct places globals.h is now deprecated. Sanity is restored.
Nick Wellnhofer 1117fae0 2023-09-20T19:20:41 include: Remove unneeded includes
Nick Wellnhofer 736327df 2023-09-20T19:09:15 include: Break inclusion cycle between tree.h and xmlregexp.h
Nick Wellnhofer 699299ca 2023-09-20T18:54:39 globals: Stop including globals.h
Nick Wellnhofer 0830fcfa 2023-09-20T14:30:12 globals: Deprecate xmlLastError The last error should be accessed with xmlGetLastError.
Nick Wellnhofer db8b9722 2023-09-20T13:56:16 parser: Deprecate global parser options Note that setting global options has no effect anyway when using any of the modern parser API functions which take an option argument like xmlReadMemory or when using xmlCtxtUseOptions. Global options only have an effect when using old API functions xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling xmlCtxtUseOptions. Unfortunately, many downstream projects still modify global parser options often without realizing that it has no effect. If necessary, switch to the modern API. Then you can safely remove all code that changes global options. Here's a list of deprecated functions and global variables together with the corresponding parser options. - xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue Parser option XML_PARSE_NOENT - xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue Inverse of parser option XML_PARSE_NOBLANKS - xmlPedanticParserDefault, xmlPedanticParserDefaultValue Parser option XML_PARSE_PEDANTIC - xmlLineNumbersDefault, xmlLineNumbersDefaultValue Always enabled by new API - xmlDoValidityCheckingDefaultValue Parser option XML_PARSE_DTDVALID - xmlGetWarningsDefaultValue Inverse of parser option XML_PARSE_NOWARNING - xmlLoadExtDtdDefaultValue Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR
Nick Wellnhofer 209516ac 2023-09-20T15:49:03 tests: Don't use deprecated symbols
Nick Wellnhofer 692a5c40 2023-09-20T13:51:26 xmllint: Don't set deprecated globals
Nick Wellnhofer ea29b951 2023-09-20T13:30:01 globals: Abort if lazy allocation of global state failed There's really nothing we can do in this situation, so it's better to abort with an error message.
Nick Wellnhofer 868b94b8 2023-09-20T13:10:29 globals: Reformat libxml/globals.h
Nick Wellnhofer bbf08608 2023-09-20T13:05:02 globals: Move buffer callback declarations to xmlIO.h
Nick Wellnhofer dc3382ef 2023-09-20T12:58:03 globals: Move xmlRegisterNodeDefault to tree.c Code in globals.c must not try to access globals itself since the accessor macros aren't defined and we would only see the main variable.
Nick Wellnhofer 75976742 2023-09-20T12:45:14 globals: Add a few comments
Nick Wellnhofer ecbd634c 2023-09-19T17:21:30 threads: Fix double-checked locking in xmlInitParser Hopefully work around the classic problem with double-checked locking: Another thread could read xmlParserInitialized == 1 but doesn't see other initialization results yet due to compiler or hardware reordering. While unlikely, this seems theoretically possible. The solution is to add a memory barrier after initializing the data but before setting xmlParserInitialized. It might be enough to use a second initialization flag which is only used inside the locked section and update xmlParserInitialized after unlocking. But I haven't seen this approach in many articles discussing this issue, so it's possibly flawed as well.
Nick Wellnhofer f7a403c2 2023-09-19T13:52:53 globals: Move xmlIsMainThread to globals.c xmlIsMainThread is mainly needed for global variables.
Nick Wellnhofer b173b724 2023-09-19T13:17:00 globals: Use thread-local storage if available Also use thread-local storage to store globals on POSIX platforms. Most importantly, this makes sure that global variable access can't fail when allocating the global state struct.
Nick Wellnhofer e7b6ca15 2023-09-18T13:25:06 globals: Rework global state destruction on Windows If DllMain is used, rely on it working as expected. The old code seemed to attempt to free global state of other threads if, for some reason, the DllMain mechanism didn't work. In a static build, register a destructor with RegisterWaitForSingleObject. Make public functions xmlGetGlobalState and xmlInitializeGlobalState no-ops. Move initialization and registration of global state objects to xmlInitGlobalState. Lookup global state with xmlGetThreadLocalStorage which can be inlined nicely. Also cleanup global state when using TLS. xmlLastError must be reset.
Nick Wellnhofer 39a275a5 2023-09-18T21:25:35 globals: Define globals using macros Declare and define globals and helper functions by (ab)using the preprocessor.
Nick Wellnhofer 11a1839d 2023-09-20T17:54:48 globals: Move remaining globals back to correct header files This undoes a lot of damage.
Nick Wellnhofer 7909ff08 2023-09-20T17:38:26 include: Remove unnecessary includes - Don't include tree.h from encoding.h - Don't include parser.h from xmlIO.h
Nick Wellnhofer eb985d6f 2023-09-20T17:17:49 globals: Move error globals back to xmlerror.c
Nick Wellnhofer d1336fd3 2023-09-20T17:00:50 globals: Move malloc hooks back to xmlmemory.h
Nick Wellnhofer a77f9ab8 2023-09-20T16:57:22 globals: Don't include SAX2.h from globals.h
Nick Wellnhofer 2e6c49a7 2023-09-20T14:43:14 globals: Don't store xmlParserVersion in global state This is a constant.
Nick Wellnhofer bf6bd161 2023-09-18T19:53:31 globals: Introduce xmlCheckThreadLocalStorage Checks whether (emulated) thread-local storage could be allocated.
Nick Wellnhofer 89f49767 2023-09-18T18:44:32 globals: Make xmlGlobalState private This removes a public struct but it seems impossible to use its members in a sensible way from external code.
Nick Wellnhofer a07ec7c1 2023-09-18T17:39:13 threads: Move library initialization code to threads.c This allows to consolidate the initialization code since the global init lock was already implemented in threads.c.
Nick Wellnhofer 4e1c13eb 2023-09-18T14:45:10 debug: Remove debugging code This is barely useful these days and only clutters the code base.
Nick Wellnhofer c19771c1 2023-09-18T00:54:39 globals: Move code from threads.c to globals.c Move all code that handles globals to the place where it belongs.
Nick Wellnhofer 2a4b8114 2023-09-17T23:16:49 globals: Rename members of xmlGlobalState This is a deliberate first step to remove some internals from the public API and to avoid issues when redefining tokens.
Nick Wellnhofer d7cfe356 2023-09-14T20:52:24 parser: Avoid undefined behavior in xmlParseStartTag2 Instead of using arithmetic on dangling pointers, store ptrdiff_t values in void pointers which is at least implementation-defined.
Nick Wellnhofer 90d5b799 2023-09-14T15:30:38 schemas: Fix memory leak of annotations in notations Found by OSS-Fuzz.
Markus Rickert 99cba4b3 2023-09-09T17:46:34 Handle NOCONFIG case when setting locations from CMake target properties
Nick Wellnhofer 4aa08c80 2023-09-08T14:52:22 xinclude: Fix 'last' pointer in xmlXIncludeCopyNode Also set the 'last' pointer for the root node. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/93
James Le Cuirot f369154f 2023-09-03T22:14:01 cmake: Generate better pkg-config file for SYSROOT builds under CMake I recently fixed this for Autotools but said that fixing this for CMake was not feasible due to it using `find_package` rather than `pkg_check_modules`. I then thought about it and couldn't find any reason why CMake couldn't try `pkg_check_modules` first and then fall back to `find_package`, as that's basically what Autotools does. I had wanted to use the linker flags generated by CMake when it does fall back to `find_package`, but it only returns direct paths to the libraries, as opposed to `-l` flags. Baking these library paths into the pkg-config and xml2-config files would break static linking and cross-compiling, so I've stuck with the `-l` flags we already have. There is no need to set `CMAKE_REQUIRED_LIBRARIES` because we already add the dependencies to the library target.
James Le Cuirot 5a18c505 2023-09-04T09:30:38 autoconf: Include non-pkg-config dependency flags in the pkg-config file These were present before, but I accidentally dropped them in my recent build improvements.
James Le Cuirot 6864d92f 2023-09-04T09:25:44 autoconf: Don't bake build time CFLAGS into pkg-config file Having slept on it, I've realised that baking the dependency CFLAGS into the pkg-config file is pointless when it is only used to link against them. It may even cause problems.
Nick Wellnhofer efcaeadc 2023-09-04T16:00:53 hash: Fix use-of-uninitialized-value Short-lived regression.
Nick Wellnhofer 05c28305 2023-09-04T15:50:22 dict: Stop using uint32_t stdint.h is a C99 header.
Nick Wellnhofer f45abbd3 2023-09-04T15:31:04 dict: Fix integer overflow of string lengths Fixes #546.
Nick Wellnhofer edc2dd48 2023-09-04T16:07:23 dict: Update hash function Update hash function from classic Jenkins OAAT (dict.c) and a variant of DJB2 (hash.c) to "GoodOAAT" taken from the SMHasher repo. This hash function passes all SMHasher tests.
James Le Cuirot 93e8bb2a 2023-09-02T17:12:58 build: Generate better pkg-config files for static-only builds pkg-config supports `Requires.private` and `Libs.private` fields for static linking. However, if you're building a dynamic binary, then pkg-config will use the non-private fields, even if just the static libxml2 is available. This will result in libxml2 being underlinked, causing the build to fail. The solution is to fold the private fields into the non-private fields when the shared libxml2 is not being built. This works for Autotools and CMake. Meson also knows how to handle this when it automatically generates pkg-config files.
James Le Cuirot 4640ccac 2023-09-02T16:18:30 build: Generate better pkg-config file for SYSROOT builds The -I and -L flags you use to build should not necessarily be the same ones you bake into installed files. If you are building with dependencies located under a SYSROOT then the installed files should have no knowledge of that SYSROOT. For example, if the build requires `-L/path/to/sysroot/usr/lib/foo` then only `-L/usr/lib/foo` should be baked into the installed files. pkg-config is SYSROOT-aware, so this issue can be sidestepped by using the `Requires` field rather than the `Libs` and `Cflags` fields. This is easily resolved if you rely solely on pkg-config, but this project falls back to standard Autoconf checks, so a little more effort is required. Unfortunately, this issue cannot feasibly be resolved for CMake. `find_package` is used rather than `pkg_check_modules`, so we cannot tell whether a pkg-config file for each dependency is present or not, even if `find_package` uses pkg-config behind the scenes. The CMake build does not record any dependency -I or -L flags into the pkg-config file anyway. This is a problem in itself, although these dependencies are most likely installed to standard locations. Meson is very much better at handling this, as it generates the pkg-config file automatically using the correct logic.
Nick Wellnhofer 54a0b19a 2023-09-01T14:52:14 autoconf: Allow custom --with-icu configure option
Nick Wellnhofer c5989473 2023-09-01T14:52:11 dict: Use thread-local storage for PRNG state
Nick Wellnhofer 57cfd221 2023-09-01T14:52:04 dict: Use xoroshiro64** as PRNG Stop using rand_r. This enables hash randomization on all platforms.
Nick Wellnhofer 6d7aaaa8 2023-09-01T14:51:55 dict: Tune hash table growth Introduce load factor as main trigger and increase MAX_HASH_LEN. This should make growth behavior more predictable. Raise size limit to INT_MAX. This avoids quadratic behavior with larger tables.
Nick Wellnhofer 4b8f7cf0 2023-09-01T13:07:27 hash: Fix integer overflow of nbElems
Nick Wellnhofer bfd7d286 2023-08-29T21:16:34 xmllint: Fix more error messages
Nick Wellnhofer 373244bc 2023-08-29T21:05:32 xmllint: Fix error message when push parsing empty documents
Nick Wellnhofer 53050b1d 2023-08-29T20:06:43 parser: More fixes to push parser error handling
Nick Wellnhofer bbd918b2 2023-08-29T15:56:37 parser: Fix detection of null bytes Also suppress misleading extra errors. Fixes #122.
Nick Wellnhofer c6083a32 2023-08-29T16:30:22 parser: Improve error handling in push parser - Report errors earlier - Align error messages with pull parser
Nick Wellnhofer 1edae30f 2023-08-29T15:58:22 parser: Don't check inputNr in xmlParseTryOrFinish There's no apparent reason for this check. inputNr should always be 1 here.
Nick Wellnhofer e48f2695 2023-08-29T17:41:18 parser: Remove push parser debugging code
Nick Wellnhofer cde44997 2023-08-27T16:35:23 SAX2: Allow multiple top-level elements When parsing with HTML_PARSE_NOIMPLIED, the result document can contain multiple top-level elements. Rework xmlSAX2StartElement to simply add the element as a child of ctxt->node or ctxt->myDoc. Don't invoke xmlAddSibling for non-element parents. The context node should always be an element node. Fixes #584.
Nick Wellnhofer d39f7806 2023-08-23T20:24:24 tree: Fix copying of DTDs - Don't create multiple DTD nodes. - Fix UAF if malloc fails. - Skip DTD nodes if tree module is disabled. Fixes #583.
Nick Wellnhofer 4e4c89a4 2023-08-21T00:26:01 doc: Improve documentation of configuration options
Nick Wellnhofer 778cca38 2023-08-20T22:50:57 legacy: Add stubs for disabled modules When legacy support is requested, always enable stubs for FTP and XPointer location modules which were removed from the standard configuration. Going forward, the --with-legacy configuration option should be used to provide maximum ABI compatibility. Fixes #433.
Nick Wellnhofer ed3bd052 2023-08-20T20:48:10 parser: Allow to set maximum amplification factor
Nick Wellnhofer 9d80a2b1 2023-08-16T19:45:34 entities: Don't change doc when encoding entities doc->encoding shouldn't be touched by xmlEncodeEntitiesInternal.
Nick Wellnhofer f1c1f5c6 2023-08-16T19:43:02 parser: Revert change to doc->encoding Fixes #579.
Nick Wellnhofer 61b8e097 2023-08-16T19:20:47 parser: Never use UTF-8 encoding handler
Nick Wellnhofer 507f11ed 2023-08-16T15:43:47 encoding: Remove debugging code
Nick Wellnhofer 138213ac 2023-08-15T12:49:27 python: Fix tests on MinGW Add the directory containing libxml2.dll with os.add_dll_directory to make tests work on MinGW. This has changed in Python 3.8 but for some reason, the issue only turned up with Python 3.11 on MinGW. Contrary to documentation, copying libxml2.dll into the directory containing the .pyd file doesn't work.
Nick Wellnhofer e2ab48b9 2023-08-14T15:05:30 malloc-fail: Fix unsigned integer overflow in xmlTextReaderPushData Return immediately if xmlParserInputBufferRead fails. Found by OSS-Fuzz, see #344.
Nick Wellnhofer 0d24fc0a 2023-08-14T12:53:49 html: Remove encoding hack in htmlCreateFileParserCtxt Switch encoding directly instead of calling htmlCheckEncoding with faked content.
Nick Wellnhofer 5db5a704 2023-08-09T18:39:14 html: Fix UAF in htmlCurrentChar Short-lived regression found by OSS-Fuzz.
Nick Wellnhofer b973ceaf 2023-08-09T18:37:20 parser: Fix mistake in xmlDetectEncoding Short-lived regression.
Nick Wellnhofer cb717d7e 2023-08-09T16:52:02 parser: Update line number after coalescing text nodes This should make the line number of text nodes deterministic. Before, it depended on the callback sequence which depends on the size of chunks fed to the parser.
Nick Wellnhofer 855818bd 2023-08-08T15:21:37 parser: Check for truncated multi-byte sequences When decoding input data, check whether the "raw" buffer is empty after parsing the document. Otherwise, the input ends with a truncated multi-byte sequence which shouldn't be silently ignored.
Nick Wellnhofer 95e81a36 2023-08-08T15:21:31 parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.
Nick Wellnhofer 834b8123 2023-08-08T15:21:28 parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.
Nick Wellnhofer 5aff27ae 2023-08-08T15:21:25 parser: Optimize xmlLoadEntityContent Load entity content via xmlParserInputBufferGrow, avoiding a copy. This also fixes an entity size accounting error.
Nick Wellnhofer facc2a06 2023-08-08T15:21:21 parser: Don't overwrite EOF parser state
Nick Wellnhofer 59fa0bb3 2023-08-08T15:21:14 parser: Simplify input pointer updates The base member always points to the beginning of the buffer.
Nick Wellnhofer c88ab7e3 2023-08-08T15:19:54 parser: Don't reinitialize parser input members The parser input struct should already be initialized.
Nick Wellnhofer 4ee08155 2023-08-08T15:19:51 encoding: Move rawconsumed accounting to xmlCharEncInput
Nick Wellnhofer a0462e2d 2023-08-08T15:19:49 test: Add push parser test with overridden encoding After recent changes, it should work to call xmlSwitchEncoding to override the encoding for the push parser. This was never properly supported, so Chromium and WebKit added a hack to reset the encoding in the startDocument SAX handler.