include


Log

Author Commit Date CI Message
Nick Wellnhofer ec7be506 2023-08-08T15:19:46 parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.
Nick Wellnhofer b8961df6 2023-05-09T03:25:24 SAX: Always validate xml:ids The behavior shouldn't depend on mostly random configuration options.
Nick Wellnhofer 8d5e33ef 2023-05-03T20:42:10 Fix compiler warning on GCC < 8 -Wcast-function-type is only available since GCC 8.
Nick Wellnhofer fc69cf56 2023-04-30T17:51:29 parser: Move xmlFatalErr to parserInternals.c
Nick Wellnhofer 3ff6abbf 2023-02-22T17:11:20 encoding: Rework error codes Use an enum instead of magic numbers. Fix a few error codes. Simplify handling of "space" and "partial" errors. See #506.
Nick Wellnhofer fa993130 2023-04-30T12:57:09 xpath: Remove remaining references to valueFrame Fixes #529.
Nick Wellnhofer 3ffcc03b 2023-03-13T19:38:41 parser: Deprecate more internal functions
Nick Wellnhofer 98840d40 2023-03-21T19:07:12 parser: Rework EBCDIC code page detection To detect EBCDIC code pages, we used to switch the encoding twice and had to be very careful not to decode data after the XML declaration before the second switch. This relied on a hard-coded expected size of the XML declaration and was complicated and unreliable. Now we convert the first 200 bytes to EBCDIC-US and parse the encoding declaration manually.
Nick Wellnhofer 04d1bedd 2023-03-21T13:08:44 parser: Rework shrinking of input buffers Don't try to grow the input buffer in xmlParserShrink. This makes sure that no memory allocations are made and the function always succeeds. Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of DTD parsing loops. Shrink before growing.
Nick Wellnhofer b167c731 2023-03-14T14:42:36 parser: Fix short-lived regression causing infinite loops Fix 3eb6bf03. We really have to halt the parser, so the input buffer gets reset.
Nick Wellnhofer f8efa589 2023-03-14T13:55:06 malloc-fail: Handle malloc failures in xmlSchemaInitTypes Note that this changes the return value of public function xmlSchemaInitTypes from void to int. This shouldn't break the ABI on most platforms. Found when investigating #500.
Nick Wellnhofer d7daf9fd 2023-03-14T13:02:36 xmllint: Fix use-after-free with --maxmem Fixes #498.
Nick Wellnhofer e7c3a4ca 2023-03-13T19:19:46 parser: Deprecate some parser input functions
Nick Wellnhofer 2099441f 2023-03-13T17:51:13 parser: Stop calling xmlParserInputShrink Introduce xmlParserShrink which takes a parser context to simplify error handling.
Nick Wellnhofer 48379394 2023-03-13T17:11:27 malloc-fail: Stop using XPath stack frames There's too much code which assumes that if ctxt->value is non-null, a value can be successfully popped off the stack. This assumption can break with stack frames when malloc fails. Instead of trying to fix all call sites, remove the stack frame logic. It only offered very little protection against misbehaving extension functions. We already check the stack size after a function call which should be enough. Found by OSS-Fuzz.
Nick Wellnhofer bd63d730 2023-03-12T17:40:55 html: Impose some length limits Impose length limits on names, attribute values, PIs and comments, similar to the XML parser.
Nick Wellnhofer 3eb6bf03 2023-03-12T16:47:15 parser: Stop calling xmlParserInputGrow Introduce xmlParserGrow which takes a parser context to simplify error handling.
Nick Wellnhofer b51478dc 2023-02-24T16:21:17 Revert "malloc-fail: Avoid use-after-free after unsuccessful valuePush" This reverts commit 6a12be77c6a94c374ab7476087edcee2ba41d9b4. There's too much code reading ctxt->value directly and making the wrong assumptions.
Nick Wellnhofer 4f0a0fb7 2023-02-22T14:24:24 xinclude: Fix include guard
Nick Wellnhofer 905386ec 2023-02-13T11:14:34 autotools: Fix make distcheck - Add private/xinclude.h to EXTRA_DIST - Add runsuite.log to CLEANFILES Fixes #485.
Nick Wellnhofer 6a12be77 2023-01-31T12:46:30 malloc-fail: Avoid use-after-free after unsuccessful valuePush In xpath.c there's a lot of code like: valuePush(ctxt, xmlCacheNewX()); ... valuePop(ctxt); If xmlCacheNewX fails, no value will be pushed on the stack. If there's no error check in between, valuePop will pop an unrelated value which can lead to use-after-free errors. Instead of trying to fix all call sites, we simply stop popping values if an error was signaled. This requires to change the CHECK_TYPE macro which is often used to determine whether a value can be safely popped. Found with libFuzzer, see #344.
Nick Wellnhofer 59b33661 2022-12-27T14:15:51 error: Limit number of parser errors Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.
Nick Wellnhofer a41b09c7 2022-12-23T21:29:28 parser: Improve detection of entity loops Set a flag to detect entity loops at once instead of processing until the depth limit is exceeded.
Nick Wellnhofer b47ebf04 2022-12-21T00:02:47 parser: Deprecate xmlString*DecodeEntities These are internal functions.
Nick Wellnhofer ce76ebfd 2022-12-19T20:56:23 entities: Stop counting entities This was only used in the old version of xmlParserEntityCheck.
Nick Wellnhofer a3c8b180 2022-12-19T20:51:52 entities: Add entity flag for loop check
Nick Wellnhofer 463bbeec 2022-12-19T18:39:45 entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.
Nick Wellnhofer 7e3f469b 2022-12-19T15:59:49 entities: Use flags to store '<' check results Instead of abusing the LSB of the "checked" member, store the result of testing for occurrence of '<' character in "flags". Also use the flags in xmlParseStringEntityRef instead of rescanning every time.
Nick Wellnhofer 481d79d4 2022-12-19T15:26:46 entities: Add XML_ENT_PARSED flag To check whether an entity was already parsed, the code previously tested whether "checked" was non-zero or "children" was non-null. The "children" check could be unreliable because an empty entity also results in an empty (NULL) node list. Use a separate flag to make this check more reliable.
Nick Wellnhofer f34f184f 2022-12-19T15:24:53 entities: Add "flags" member to struct xmlEntity This will hold various flags and eventually replace the "checked" member.
Nick Wellnhofer 93a01c46 2022-12-08T03:58:41 libxml.h: Add comments and indentation
Nick Wellnhofer a6debffd 2022-12-08T03:37:24 xmlexports.h: Disable docs for internal macro XMLPUBLIC
Nick Wellnhofer 3b6cc47a 2022-12-08T02:51:52 xmlexports.h: Remove LIBXML_FASTCALL optimization This was an experimental and undocumented micro-optimization for Windows which apparently required different calling conventions for variable-argument functions, making it impossible to maintain without domain knowledge.
Nick Wellnhofer ce9baf94 2022-12-08T02:48:27 Remove XMLCALL and XMLCDECL macros from public headers
Nick Wellnhofer ccb6d544 2022-11-27T02:09:27 Hide internal functions These functions were never declared in public headers, so it should be safe to hide them. Fixes #139.
Nick Wellnhofer c16fd705 2022-11-25T14:52:37 xpath: Make init function private
Nick Wellnhofer 53ab3840 2022-11-25T14:26:59 encoding: Make init function private
Nick Wellnhofer c73d464a 2022-11-24T15:00:03 threads: Deprecate some internal functions
Nick Wellnhofer 65d381f3 2022-11-24T20:54:18 threads: Allocate mutexes statically
Nick Wellnhofer ed053c50 2022-11-25T12:27:14 dict: Make init/cleanup functions private
Nick Wellnhofer 7010d877 2022-11-25T12:06:27 threads: Rework initialization Make init/cleanup functions private. Merge xmlOnceInit into xmlInitThreadsInternal.
Nick Wellnhofer 9dbf1374 2022-11-24T20:52:57 parser: Make some module init/cleanup functions private
Chun-wei Fan 707ade22 2022-11-22T14:56:58 Visual Studio builds: Allow silencing deprecation warnings Define XML_IGNORE_DEPRECATION_WARNINGS and the corresponding XML_POP_WARNINGS for Visual Studio, and consequently define XML_IGNORE_FPTR_CAST_WARNINGS so that we do not get a compiler warning on Visual Studio by doing a __pragma(warning(pop)) without a corresponding __pragma(warning(push)). Also correct the documentation a bit for XML_POP_WARNINGS.
Chun-wei Fan b9590d5d 2022-11-18T11:23:23 Visual Studio: Define XML_DEPRECATED We can mark APIs as deprecated using __declspec(deprecated) with Visual Studio 2005 and later, so add a definition of that so that we can help users avoid using deprecated APIs when using Visual Studio as well. For the existing GCC definition, check whether we are on GCC 3.1+ before enabling the definition.
Nick Wellnhofer 68a6518c 2022-11-15T18:23:33 parser: Rewrite push parser boundary checks Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.
Nick Wellnhofer 2059df53 2022-11-14T22:27:58 buf: Deprecate static/immutable buffers
Nick Wellnhofer 46cd7d22 2022-11-13T16:30:46 io: Remove xmlInputReadCallbackNop In some cases, for example when using encoders, the read callback was set to NULL, in other cases it was set to xmlInputReadCallbackNop. xmlGROW only tested for xmlInputReadCallbackNop, resulting in errors when parsing large encoded content from memory. Always use a NULL callback for memory buffers to avoid ambiguities. Fixes #262.
Nick Wellnhofer b693905f 2022-11-04T14:50:39 doc: Remove xmlDllMain from documentation and version script This is a Windows-only symbol.
Nick Wellnhofer eef0a739 2022-10-30T12:21:20 xinclude: Implement "streaming" mode When using xmlreader, XPointer expressions in XIncludes simply cannot work. Expressions can reference nodes which weren't parsed yet or which were already deleted. After fixing nested XIncludes, we reference includes which were parsed previously. When streaming, these nodes could have been deleted, leading to use-after-free errors. Disallow XPointer expressions and truncate the include table in streaming mode.
Nick Wellnhofer 2fc8d123 2022-10-22T19:08:43 xinclude: Make xmlXIncludeCopyNode non-recursive Avoid call stack overflows. Also switch to xmlStaticCopyNode which avoids duplicate namespace definitions.
Nick Wellnhofer 5bfaf230 2022-10-11T13:00:33 win32: Fix build with VS2013 Should fix #420.
Nick Wellnhofer a9669679 2022-09-09T01:44:00 error: Don't use initGenericErrorDefaultFunc The code in xmlInitParser did only set the error handler if it was NULL which should never happen.
Nick Wellnhofer 30c8d9bb 2022-09-05T02:02:54 http: Simplify IPv6 checks This should also enable IPv6 support on Windows. Untested and mostly useless anyway, since we don't support HTTPS.
Nick Wellnhofer 9e5a016e 2022-09-05T01:08:33 autotools: Fix network checks on Windows
Nick Wellnhofer 0d901258 2022-09-04T16:41:43 Fix Windows compiler warnings in python/types.c
Nick Wellnhofer fe02289f 2022-09-04T03:19:01 Remove arg cast configure checks We can simply cast to non-const char * unconditionally.
Nick Wellnhofer 1e60c768 2022-09-04T01:49:41 Remove HAVE_WIN32_THREADS configuration flag Check for LIBXML_THREAD_ENABLED and _WIN32 instead.
Nick Wellnhofer 59f2f60e 2022-09-02T00:27:57 Remove "runtime debugging" This doesn't seem useful as configuration option.
Nick Wellnhofer 65dc8a63 2022-09-01T00:13:19 Make xmlNewSAXParserCtx take a const sax handler Also improve documentation.
Nick Wellnhofer 39dfb048 2022-08-26T02:41:15 Don't forget to install xmlversion.h Fixes 7f7961df.
Nick Wellnhofer 0f568c0b 2022-08-26T01:22:33 Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.
Nick Wellnhofer 48f84ea8 2022-08-25T21:31:08 Remove internal macros from parserInternals.h Replace MOVETO_ENDTAG with code that updates line and column numbers.
Nick Wellnhofer 58fc89e8 2022-08-25T20:57:30 Deprecate internal parser functions
Nick Wellnhofer a308c0cd 2022-08-25T20:18:16 Deprecate old HTML SAX API
Nick Wellnhofer 51035c53 2022-08-25T19:53:04 Generate deprecation warnings for old SAX API
Nick Wellnhofer 7f7961df 2022-08-25T13:47:16 Remove generated files from distribution - libxml2.spec - libxml-2.0.pc - xml2-config - include/libxml/xmlversion.h - python/libxml2.py - python/libxml2-export.c - python/libxml2-py.c - python/libxml2-py.h - python/libxml2class.py - python/libxml2class.txt - python/setup.py
Nick Wellnhofer 34a050cd 2022-08-24T16:35:58 Move some HTML functions to correct header file
Nick Wellnhofer 9a82b94a 2022-08-24T04:21:58 Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt Add API functions to create a parser context with a custom SAX handler without having to mess with ctxt->sax manually.
Nick Wellnhofer 92bb889b 2022-08-24T00:03:44 Don't index anything in DOC_DISABLE sections Somewhat misleadingly, the DOC_DISABLE directive only disabled warnings. Now we really stop the documentation generator from indexing. This results in additional warnings for xmlThrDef* functions. This should be fixed by documenting or deprecating them.
Nick Wellnhofer 75b5bc2b 2022-08-23T20:57:49 Deprecate some global variables Most of these should be completely unused.
Nick Wellnhofer 5264fa11 2022-08-18T20:17:27 Fix warnings from apibuild.py
Nick Wellnhofer e986d09c 2022-07-15T14:02:26 Skip incorrectly opened HTML comments Commit 4fd69f3e fixed handling of '<' characters not followed by an ASCII letter. But a '<!' sequence followed by invalid characters should be treated as bogus comment and skipped. Fixes #380.
Mehltretter Karl e9270ef0 2022-05-06T10:44:03 fix Schematron spelling
Nick Wellnhofer 67070107 2022-04-20T23:17:14 Add configuration flag for XPointer locations support Add a new configuration flag that controls whether the outdated support for XPointer locations (ranges and points) is enabled. --with-xptr-locs # Autotools LIBXML2_WITH_XPTR_LOCS # CMake The latest spec for what it essentially an XPath extension seems to be this working draft from 2002: https://www.w3.org/TR/xptr-xpointer/ The xpointer() scheme is listed as "being reviewed" in the XPointer registry since at least 2006. libxml2 seems to be the only modern software that tries to implement this spec, but the code has many bugs and quality issues. The flag defaults to "off" and support for this extensions has to be requested explicitly. The relevant API functions are deprecated.
Nick Wellnhofer aab584dc 2022-03-06T23:23:43 Clean up encoding switching code - Remove xmlSwitchToEncodingInt which was basically just a wrapper around xmlSwitchInputEncodingInt. - Simplify xmlSwitchEncoding. - Improve error handling in xmlSwitchInputEncodingInt. - Deprecate xmlSwitchInputEncoding.
Nick Wellnhofer 2070ade6 2022-03-06T23:19:04 Undeprecate schema init functions These functions aren't called from xmlInitParser, so it's better to keep them public for now.
Nick Wellnhofer 40483d0c 2022-03-06T13:55:48 Deprecate module init and cleanup functions These functions shouldn't be part of the public API. Most init functions are only thread-safe when called from xmlInitParser. Global variables should only be cleaned up by calling xmlCleanupParser.
Nick Wellnhofer 422176ad 2022-03-05T03:18:58 Don't set HAVE_WIN32_THREADS in win32config.h This flag must be set on the command line.
Nick Wellnhofer 4a8c71eb 2022-03-04T03:35:57 Remove DOCBparser This code has been broken and deprecated since version 2.6.0, released in 2003. Because of a bug in commit 961b535c, DOCBparser.c was never compiled since 2012. I couldn't find a Debian package using any of its symbols, so it seems safe to remove this module.
Nick Wellnhofer ebb17970 2022-03-04T02:31:59 Remove unneeded #includes
Nick Wellnhofer f025d6eb 2022-03-04T01:32:04 Use stdint.h with newer MSVC stdint.h is available since Visual Studio 2010.
Nick Wellnhofer 84085a26 2022-03-04T01:23:14 Remove cruft from win32config.h
Nick Wellnhofer 21ddad52 2022-03-04T01:07:40 Remove ICONV_CONST test We can simply cast the offending pointer to (void *).
Nick Wellnhofer ae3cf483 2022-03-03T23:57:59 Remove isinf/isnan emulation in win32config.h There's already a better fallback in xpath.c.
Mike Dalessio d7b287b9 2021-07-17T14:36:53 htmlParseComment: handle abruptly-closed comments See guidance provided on abrutply-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment
Nick Wellnhofer 776d15d3 2022-03-02T00:29:17 Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h
Nick Wellnhofer b66ce0bb 2022-03-01T12:39:02 Don't include ICU headers in public headers There's no need to make these implementation details public.
Nick Wellnhofer b094e814 2022-03-01T00:02:59 Remove broken Windows CE support
Nick Wellnhofer 2489c1d0 2022-02-28T22:42:10 Remove useless __CYGWIN__ checks From what I can tell, some really early Cygwin versions from around 1998-2000 used to erroneously define _WIN32. This was eventually fixed, but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is unnecessary. Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether to use __declspec.
Nick Wellnhofer 004fe9de 2022-02-20T19:02:31 Deprecate IDREF-related functions in valid.h These functions are only needed internally for validation. xmlGetRefs is inherently unsafe because the ref table isn't updated if attributes are removed (unlike the ids table). None of the Ubuntu 20.04 packages depending on libxml2 use any of these functions (except xmlFreeRefTable in libxslt), so it seems perfectly safe to deprecate them. Remove xmlIsRef and xmlRemoveRef from the Python bindings.
Nick Wellnhofer 61de9297 2022-02-20T20:59:14 Deprecate all functions in DOCBparser.h
Nick Wellnhofer cf4893f7 2022-02-20T19:56:41 Deprecate legacy functions
Nick Wellnhofer 9e0ca5a1 2022-02-20T19:29:01 Deprecate all functions in nanoftp.h
Nick Wellnhofer a2fe74c0 2022-02-20T18:19:27 Add XML_DEPRECATED macro __attribute__((deprecated)) is available since at least GCC 3.1, so an exact version check is probably unnecessary.
Nick Wellnhofer d7cb33cf 2022-01-13T17:06:14 Rework validation context flags Use a bitmask instead of magic values to - keep track whether the validation context is part of a parser context - keep track whether xmlValidateDtdFinal was called This allows to add addtional flags later. Note that this deliberately changes the name of a public struct member, assuming that this was always private data never to be used by client code.
Nick Wellnhofer b041d829 2022-02-16T19:55:30 Remove xmlwin32version.h This file was undocumented and never used anywhere. Maybe users were supposed to rename this file to xmlversion.h manually. These days, both CMake and win32/configure.js generate xmlversion.h from xmlversion.h.in, just like the Autotools build.
Nick Wellnhofer 7fe9addc 2022-02-13T23:29:51 Remove CVS and SVN-related code
Markus Rickert 7d4060d2 2021-05-16T18:00:21 Add missing file xmlwin32version.h.in to EXTRA_DIST
Nick Wellnhofer ce00c36e 2021-05-08T21:20:05 Store per-element parser state in a struct Make the parser context's "pushTab" point to an array of structs instead of void pointers. This avoids casting unrelated types to void pointers, improving readability and portability, and allows for more efficient packing. Ultimately, the struct could be extended to include the contents of "nameTab" and "spaceTab", further simplifying the code. Historically, "pushTab" was only used by the push parser (hence the name), so the change to the public headers should be safe. Also remove an unused parameter from xmlParseEndTag2.
Nick Wellnhofer fb08d9fe 2021-03-20T22:02:26 Fix include order in c14n.h - Include xmlversion.h before testing feature flags. - Include libxml headers before extern "C". Fixes #226.