include


Log

Author Commit Date CI Message
Nick Wellnhofer 7d6969d9 2023-11-23T15:48:52 Remove Trio Trio is a rather old cross-platform printf library which was bundled with libxml2. It was needed for ancient pre-C99 systems without snprintf and should be safe to remove these days.
Nick Wellnhofer ff6c3188 2023-11-23T15:22:59 include: Remove useless 'const' from function arguments
makise-homura 6bc86405 2023-11-09T02:04:15 Avoid EDG deprecation warnings for LCC compiler
Nick Wellnhofer aca37d8c 2023-11-20T15:20:37 parser: Only enable SAX2 if there are SAX2 element handlers This reverts part of commit 235b15a5 for backward compatibility and adds some comments trying to clarify the whole mess. Fixes #623.
Nick Wellnhofer 58598494 2023-11-04T23:47:33 parser: Fix combination of hash values This bug resulted in a stuck bit in hash values which can have a severe performance impact.
Nick Wellnhofer 61034116 2023-10-24T15:02:36 error: Make more xmlError structs constant Prepare for future changes, see 45470611.
Nick Wellnhofer c082ef46 2023-08-09T16:59:36 parser: Stop switching to ISO-8859-1 on encoding errors Use U+FFFD Replacement Character if invalid UTF-8 is encountered in recovery mode. Also rewrite xmlNextChar and xmlCurrentChar. Fixes #598.
Nick Wellnhofer 253f260b 2023-10-18T20:06:35 threads: Fix --with-thread-alloc Fixes #606.
Nick Wellnhofer 713ded60 2023-10-06T10:43:38 entities: Make xmlFreeEntity public
Nick Wellnhofer eb69c1d3 2023-10-02T12:16:05 parser: Fix initialization of namespace data Move initialization to xmlInitSAXParserCtxt. Also add missing XML_HIDDEN to xmlParserNsFree. Fixes #597.
Nick Wellnhofer e0dd330b 2023-09-29T00:18:44 parser: Use hash tables to avoid quadratic behavior Use a hash table to lookup namespaces by prefix. The hash table stores an index into the namespace table. Auxiliary data for namespaces is stored in a separate array along the main namespace table. Use a hash table to verify attribute uniqueness. The hash table stores an index into the attribute table. Reuse hash value from the dictionary to avoid computing them twice. See #346.
Nick Wellnhofer 19161bab 2023-09-25T14:00:48 dict: Internal API to look up hash values
Nick Wellnhofer 1425d8f6 2023-09-16T19:08:10 dict: Separate RNG code
Nick Wellnhofer b31813e6 2023-09-28T15:34:08 include: Add more missing stdio.h includes
Nick Wellnhofer 84e1ffc8 2023-09-22T15:44:17 doc: Don't document internal macros in xmlversion.h
Nick Wellnhofer b94283fb 2023-09-22T14:23:27 regexp: Add missing include
Nick Wellnhofer 45470611 2023-09-21T23:52:52 error: Make xmlGetLastError return a const error This is a slight break of the API, but users really shouldn't modify the global error struct. The goal is to make xmlLastError use static buffers for its strings eventually. This should warn people if they're abusing the struct.
Nick Wellnhofer 8c084ebd 2023-09-21T22:57:33 doc: Make apibuild.py happy
Nick Wellnhofer 72262030 2023-09-21T14:52:14 parser: Readd some includes to parser.h and xmlreader.h Fix backward compatibility.
Nick Wellnhofer 9fc5090c 2023-09-16T19:58:42 hash: Clean up libxml/hash.h Rename variables, fix subincludes, whitespace.
Nick Wellnhofer da274bfa 2023-09-21T01:29:40 build: Fix build when certain modules are disabled
Nick Wellnhofer 9b5cce7a 2023-09-21T00:44:50 include: Remove more unnecessary includes
Nick Wellnhofer d6ba4033 2023-09-20T20:49:59 globals: Move remaining declarations to correct places globals.h is now deprecated. Sanity is restored.
Nick Wellnhofer 1117fae0 2023-09-20T19:20:41 include: Remove unneeded includes
Nick Wellnhofer 736327df 2023-09-20T19:09:15 include: Break inclusion cycle between tree.h and xmlregexp.h
Nick Wellnhofer 699299ca 2023-09-20T18:54:39 globals: Stop including globals.h
Nick Wellnhofer 2e6c49a7 2023-09-20T14:43:14 globals: Don't store xmlParserVersion in global state This is a constant.
Nick Wellnhofer 0830fcfa 2023-09-20T14:30:12 globals: Deprecate xmlLastError The last error should be accessed with xmlGetLastError.
Nick Wellnhofer db8b9722 2023-09-20T13:56:16 parser: Deprecate global parser options Note that setting global options has no effect anyway when using any of the modern parser API functions which take an option argument like xmlReadMemory or when using xmlCtxtUseOptions. Global options only have an effect when using old API functions xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling xmlCtxtUseOptions. Unfortunately, many downstream projects still modify global parser options often without realizing that it has no effect. If necessary, switch to the modern API. Then you can safely remove all code that changes global options. Here's a list of deprecated functions and global variables together with the corresponding parser options. - xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue Parser option XML_PARSE_NOENT - xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue Inverse of parser option XML_PARSE_NOBLANKS - xmlPedanticParserDefault, xmlPedanticParserDefaultValue Parser option XML_PARSE_PEDANTIC - xmlLineNumbersDefault, xmlLineNumbersDefaultValue Always enabled by new API - xmlDoValidityCheckingDefaultValue Parser option XML_PARSE_DTDVALID - xmlGetWarningsDefaultValue Inverse of parser option XML_PARSE_NOWARNING - xmlLoadExtDtdDefaultValue Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR
Nick Wellnhofer 868b94b8 2023-09-20T13:10:29 globals: Reformat libxml/globals.h
Nick Wellnhofer bbf08608 2023-09-20T13:05:02 globals: Move buffer callback declarations to xmlIO.h
Nick Wellnhofer dc3382ef 2023-09-20T12:58:03 globals: Move xmlRegisterNodeDefault to tree.c Code in globals.c must not try to access globals itself since the accessor macros aren't defined and we would only see the main variable.
Nick Wellnhofer e7b6ca15 2023-09-18T13:25:06 globals: Rework global state destruction on Windows If DllMain is used, rely on it working as expected. The old code seemed to attempt to free global state of other threads if, for some reason, the DllMain mechanism didn't work. In a static build, register a destructor with RegisterWaitForSingleObject. Make public functions xmlGetGlobalState and xmlInitializeGlobalState no-ops. Move initialization and registration of global state objects to xmlInitGlobalState. Lookup global state with xmlGetThreadLocalStorage which can be inlined nicely. Also cleanup global state when using TLS. xmlLastError must be reset.
Nick Wellnhofer 39a275a5 2023-09-18T21:25:35 globals: Define globals using macros Declare and define globals and helper functions by (ab)using the preprocessor.
Nick Wellnhofer 11a1839d 2023-09-20T17:54:48 globals: Move remaining globals back to correct header files This undoes a lot of damage.
Nick Wellnhofer 7909ff08 2023-09-20T17:38:26 include: Remove unnecessary includes - Don't include tree.h from encoding.h - Don't include parser.h from xmlIO.h
Nick Wellnhofer eb985d6f 2023-09-20T17:17:49 globals: Move error globals back to xmlerror.c
Nick Wellnhofer d1336fd3 2023-09-20T17:00:50 globals: Move malloc hooks back to xmlmemory.h
Nick Wellnhofer a77f9ab8 2023-09-20T16:57:22 globals: Don't include SAX2.h from globals.h
Nick Wellnhofer bf6bd161 2023-09-18T19:53:31 globals: Introduce xmlCheckThreadLocalStorage Checks whether (emulated) thread-local storage could be allocated.
Nick Wellnhofer 89f49767 2023-09-18T18:44:32 globals: Make xmlGlobalState private This removes a public struct but it seems impossible to use its members in a sensible way from external code.
Nick Wellnhofer a07ec7c1 2023-09-18T17:39:13 threads: Move library initialization code to threads.c This allows to consolidate the initialization code since the global init lock was already implemented in threads.c.
Nick Wellnhofer 4e1c13eb 2023-09-18T14:45:10 debug: Remove debugging code This is barely useful these days and only clutters the code base.
Nick Wellnhofer c19771c1 2023-09-18T00:54:39 globals: Move code from threads.c to globals.c Move all code that handles globals to the place where it belongs.
Nick Wellnhofer 2a4b8114 2023-09-17T23:16:49 globals: Rename members of xmlGlobalState This is a deliberate first step to remove some internals from the public API and to avoid issues when redefining tokens.
Nick Wellnhofer edc2dd48 2023-09-04T16:07:23 dict: Update hash function Update hash function from classic Jenkins OAAT (dict.c) and a variant of DJB2 (hash.c) to "GoodOAAT" taken from the SMHasher repo. This hash function passes all SMHasher tests.
Nick Wellnhofer 57cfd221 2023-09-01T14:52:04 dict: Use xoroshiro64** as PRNG Stop using rand_r. This enables hash randomization on all platforms.
Nick Wellnhofer 778cca38 2023-08-20T22:50:57 legacy: Add stubs for disabled modules When legacy support is requested, always enable stubs for FTP and XPointer location modules which were removed from the standard configuration. Going forward, the --with-legacy configuration option should be used to provide maximum ABI compatibility. Fixes #433.
Nick Wellnhofer ed3bd052 2023-08-20T20:48:10 parser: Allow to set maximum amplification factor
Nick Wellnhofer f1c1f5c6 2023-08-16T19:43:02 parser: Revert change to doc->encoding Fixes #579.
Nick Wellnhofer 95e81a36 2023-08-08T15:21:31 parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.
Nick Wellnhofer 834b8123 2023-08-08T15:21:28 parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.
Nick Wellnhofer 59fa0bb3 2023-08-08T15:21:14 parser: Simplify input pointer updates The base member always points to the beginning of the buffer.
Nick Wellnhofer ec7be506 2023-08-08T15:19:46 parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.
Nick Wellnhofer b8961df6 2023-05-09T03:25:24 SAX: Always validate xml:ids The behavior shouldn't depend on mostly random configuration options.
Nick Wellnhofer 8d5e33ef 2023-05-03T20:42:10 Fix compiler warning on GCC < 8 -Wcast-function-type is only available since GCC 8.
Nick Wellnhofer fc69cf56 2023-04-30T17:51:29 parser: Move xmlFatalErr to parserInternals.c
Nick Wellnhofer 3ff6abbf 2023-02-22T17:11:20 encoding: Rework error codes Use an enum instead of magic numbers. Fix a few error codes. Simplify handling of "space" and "partial" errors. See #506.
Nick Wellnhofer fa993130 2023-04-30T12:57:09 xpath: Remove remaining references to valueFrame Fixes #529.
Nick Wellnhofer 3ffcc03b 2023-03-13T19:38:41 parser: Deprecate more internal functions
Nick Wellnhofer 98840d40 2023-03-21T19:07:12 parser: Rework EBCDIC code page detection To detect EBCDIC code pages, we used to switch the encoding twice and had to be very careful not to decode data after the XML declaration before the second switch. This relied on a hard-coded expected size of the XML declaration and was complicated and unreliable. Now we convert the first 200 bytes to EBCDIC-US and parse the encoding declaration manually.
Nick Wellnhofer 04d1bedd 2023-03-21T13:08:44 parser: Rework shrinking of input buffers Don't try to grow the input buffer in xmlParserShrink. This makes sure that no memory allocations are made and the function always succeeds. Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of DTD parsing loops. Shrink before growing.
Nick Wellnhofer b167c731 2023-03-14T14:42:36 parser: Fix short-lived regression causing infinite loops Fix 3eb6bf03. We really have to halt the parser, so the input buffer gets reset.
Nick Wellnhofer f8efa589 2023-03-14T13:55:06 malloc-fail: Handle malloc failures in xmlSchemaInitTypes Note that this changes the return value of public function xmlSchemaInitTypes from void to int. This shouldn't break the ABI on most platforms. Found when investigating #500.
Nick Wellnhofer d7daf9fd 2023-03-14T13:02:36 xmllint: Fix use-after-free with --maxmem Fixes #498.
Nick Wellnhofer e7c3a4ca 2023-03-13T19:19:46 parser: Deprecate some parser input functions
Nick Wellnhofer 2099441f 2023-03-13T17:51:13 parser: Stop calling xmlParserInputShrink Introduce xmlParserShrink which takes a parser context to simplify error handling.
Nick Wellnhofer 48379394 2023-03-13T17:11:27 malloc-fail: Stop using XPath stack frames There's too much code which assumes that if ctxt->value is non-null, a value can be successfully popped off the stack. This assumption can break with stack frames when malloc fails. Instead of trying to fix all call sites, remove the stack frame logic. It only offered very little protection against misbehaving extension functions. We already check the stack size after a function call which should be enough. Found by OSS-Fuzz.
Nick Wellnhofer bd63d730 2023-03-12T17:40:55 html: Impose some length limits Impose length limits on names, attribute values, PIs and comments, similar to the XML parser.
Nick Wellnhofer 3eb6bf03 2023-03-12T16:47:15 parser: Stop calling xmlParserInputGrow Introduce xmlParserGrow which takes a parser context to simplify error handling.
Nick Wellnhofer b51478dc 2023-02-24T16:21:17 Revert "malloc-fail: Avoid use-after-free after unsuccessful valuePush" This reverts commit 6a12be77c6a94c374ab7476087edcee2ba41d9b4. There's too much code reading ctxt->value directly and making the wrong assumptions.
Nick Wellnhofer 4f0a0fb7 2023-02-22T14:24:24 xinclude: Fix include guard
Nick Wellnhofer 905386ec 2023-02-13T11:14:34 autotools: Fix make distcheck - Add private/xinclude.h to EXTRA_DIST - Add runsuite.log to CLEANFILES Fixes #485.
Nick Wellnhofer 6a12be77 2023-01-31T12:46:30 malloc-fail: Avoid use-after-free after unsuccessful valuePush In xpath.c there's a lot of code like: valuePush(ctxt, xmlCacheNewX()); ... valuePop(ctxt); If xmlCacheNewX fails, no value will be pushed on the stack. If there's no error check in between, valuePop will pop an unrelated value which can lead to use-after-free errors. Instead of trying to fix all call sites, we simply stop popping values if an error was signaled. This requires to change the CHECK_TYPE macro which is often used to determine whether a value can be safely popped. Found with libFuzzer, see #344.
Nick Wellnhofer 59b33661 2022-12-27T14:15:51 error: Limit number of parser errors Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.
Nick Wellnhofer a41b09c7 2022-12-23T21:29:28 parser: Improve detection of entity loops Set a flag to detect entity loops at once instead of processing until the depth limit is exceeded.
Nick Wellnhofer b47ebf04 2022-12-21T00:02:47 parser: Deprecate xmlString*DecodeEntities These are internal functions.
Nick Wellnhofer ce76ebfd 2022-12-19T20:56:23 entities: Stop counting entities This was only used in the old version of xmlParserEntityCheck.
Nick Wellnhofer a3c8b180 2022-12-19T20:51:52 entities: Add entity flag for loop check
Nick Wellnhofer 463bbeec 2022-12-19T18:39:45 entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.
Nick Wellnhofer 7e3f469b 2022-12-19T15:59:49 entities: Use flags to store '<' check results Instead of abusing the LSB of the "checked" member, store the result of testing for occurrence of '<' character in "flags". Also use the flags in xmlParseStringEntityRef instead of rescanning every time.
Nick Wellnhofer 481d79d4 2022-12-19T15:26:46 entities: Add XML_ENT_PARSED flag To check whether an entity was already parsed, the code previously tested whether "checked" was non-zero or "children" was non-null. The "children" check could be unreliable because an empty entity also results in an empty (NULL) node list. Use a separate flag to make this check more reliable.
Nick Wellnhofer f34f184f 2022-12-19T15:24:53 entities: Add "flags" member to struct xmlEntity This will hold various flags and eventually replace the "checked" member.
Nick Wellnhofer 93a01c46 2022-12-08T03:58:41 libxml.h: Add comments and indentation
Nick Wellnhofer a6debffd 2022-12-08T03:37:24 xmlexports.h: Disable docs for internal macro XMLPUBLIC
Nick Wellnhofer 3b6cc47a 2022-12-08T02:51:52 xmlexports.h: Remove LIBXML_FASTCALL optimization This was an experimental and undocumented micro-optimization for Windows which apparently required different calling conventions for variable-argument functions, making it impossible to maintain without domain knowledge.
Nick Wellnhofer ce9baf94 2022-12-08T02:48:27 Remove XMLCALL and XMLCDECL macros from public headers
Nick Wellnhofer ccb6d544 2022-11-27T02:09:27 Hide internal functions These functions were never declared in public headers, so it should be safe to hide them. Fixes #139.
Nick Wellnhofer c16fd705 2022-11-25T14:52:37 xpath: Make init function private
Nick Wellnhofer 53ab3840 2022-11-25T14:26:59 encoding: Make init function private
Nick Wellnhofer c73d464a 2022-11-24T15:00:03 threads: Deprecate some internal functions
Nick Wellnhofer 65d381f3 2022-11-24T20:54:18 threads: Allocate mutexes statically
Nick Wellnhofer ed053c50 2022-11-25T12:27:14 dict: Make init/cleanup functions private
Nick Wellnhofer 7010d877 2022-11-25T12:06:27 threads: Rework initialization Make init/cleanup functions private. Merge xmlOnceInit into xmlInitThreadsInternal.
Nick Wellnhofer 9dbf1374 2022-11-24T20:52:57 parser: Make some module init/cleanup functions private
Chun-wei Fan 707ade22 2022-11-22T14:56:58 Visual Studio builds: Allow silencing deprecation warnings Define XML_IGNORE_DEPRECATION_WARNINGS and the corresponding XML_POP_WARNINGS for Visual Studio, and consequently define XML_IGNORE_FPTR_CAST_WARNINGS so that we do not get a compiler warning on Visual Studio by doing a __pragma(warning(pop)) without a corresponding __pragma(warning(push)). Also correct the documentation a bit for XML_POP_WARNINGS.
Chun-wei Fan b9590d5d 2022-11-18T11:23:23 Visual Studio: Define XML_DEPRECATED We can mark APIs as deprecated using __declspec(deprecated) with Visual Studio 2005 and later, so add a definition of that so that we can help users avoid using deprecated APIs when using Visual Studio as well. For the existing GCC definition, check whether we are on GCC 3.1+ before enabling the definition.
Nick Wellnhofer 68a6518c 2022-11-15T18:23:33 parser: Rewrite push parser boundary checks Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.
Nick Wellnhofer 2059df53 2022-11-14T22:27:58 buf: Deprecate static/immutable buffers
Nick Wellnhofer 46cd7d22 2022-11-13T16:30:46 io: Remove xmlInputReadCallbackNop In some cases, for example when using encoders, the read callback was set to NULL, in other cases it was set to xmlInputReadCallbackNop. xmlGROW only tested for xmlInputReadCallbackNop, resulting in errors when parsing large encoded content from memory. Always use a NULL callback for memory buffers to avoid ambiguities. Fixes #262.