kmx git

Commit	Date	Message
dea91c97	2021-07-27T16:12:54	Fix buffering in xmlOutputBufferWrite Fix a regression introduced with commit a697ed1e which caused xmlOutputBufferWrite to flush internal buffers too late. Fixes #296.
ec6e3efb	2021-07-06T21:56:04	Patch to forbid epsilon-reduction of final states When building the internal representation of a regexp, it is possible that a lot of empty transitions are created. Therefore there is a step to reduce them in the function xmlFAEliminateSimpleEpsilonTransitions. There is an error there for this case: * State 1 has a transition with an atom (in this case "a") to state 2. * State 2 is final and has an epsilon transition to state 1. After reduction it looked like: * State 1 has a transition with an atom (in this case "a") to itself and is final. In other words, the empty string is accepted when it shouldn't be. The attached patch skips the reduction step for final states. An alternative would be to insert or increment counters when reducing a final state, but this seemed error prone and unnecessary, since there aren't that many final states. Fixes #282
22f15211	2021-06-04T09:57:46	Use version in configure.ac for CMake Now CMake script reads version from configure.ac to prevent unsynchronized versions
92d9ab4c	2021-06-07T15:09:53	Fix whitespace when serializing empty HTML documents The old, non-recursive HTML serialization code would always terminate the output with a newline. The new implementation omitted the newline if the document node had no children. Readd the newline when serializing empty documents. Fixes #266.
3e1aad4f	2021-06-02T17:31:49	Fix XPath recursion limit Fix accounting of recursion depth when parsing XPath expressions. This silly bug introduced in commit 804c5297 could lead to spurious errors when parsing larger expressions or XSLT documents. Should fix #264.
13ad8736	2021-05-25T10:55:25	Fix regression in xmlNodeDumpOutputInternal Commit 85b1792e could cause additional whitespace if xmlNodeDump was called with a non-zero starting level.
a46e85f6	2021-05-22T15:20:46	Update CMake project version
a1cac3bb	2021-05-22T14:51:26	Add CMake alias targets for embedded projects
2c0f2f03	2021-05-18T09:52:55	Fix some validation errors in the FAQ Move paragraphs inside li elements.
b92b16f6	2021-05-19T10:15:54	Remove unused variable in xmlCharEncOutFunc Fixes a compiler warning: encoding.c: In function 'xmlCharEncOutFunc__internal_alias': encoding.c:2632:9: warning: unused variable 'output' [-Wunused-variable] 2632 \| int output = 0; https://gitlab.gnome.org/GNOME/libxml2/-/issues/254
7d4060d2	2021-05-16T18:00:21	Add missing file xmlwin32version.h.in to EXTRA_DIST
4fc473d7	2021-05-16T17:48:07	Add instructions on how to use CMake to compile libxml
85b1792e	2021-05-18T20:08:28	Work around lxml API abuse Make xmlNodeDumpOutput and htmlNodeDumpFormatOutput work with corrupted parent pointers. This used to work with the old recursive code but the non-recursive rewrite required parent pointers to be set correctly. Unfortunately, lxml relies on the old behavior and passes subtrees with a corrupted structure. Fall back to a recursive function call if an invalid parent pointer is detected. Fixes #255.
a7b9f3eb	2021-05-20T13:38:54	fix: avoid segfault at exit when using custom memory functions This extends the fix introduced by 956534e to Windows processes dynamically loading libxml2. Closes #256.
b48e77cf	2021-05-13T20:56:16	Release of libxml2-2.9.12 Brown paper bag release, some recently added sources were missing from the 2.9.11 tarball: - configure.ac: bump version - fuzz/Makefile.am: add fuzz.h and seed/regexp to EXTRA_DIST
e1bcffea	2021-05-13T15:35:21	Release of libxml2-2.9.11 Prompted by CVE-2021-3541, but this includes an awful lot of serious bug fixes by Nick and others. - configure.ac: bumped to new release - doc/* updated and regenerated
8598060b	2021-05-13T14:55:12	Patch for security issue CVE-2021-3541 This is relapted to parameter entities expansion and following the line of the billion laugh attack. Somehow in that path the counting of parameters was missed and the normal algorithm based on entities "density" was useless.
bfd2f430	2021-05-09T18:56:57	Fix null deref in legacy SAX1 parser Always call nameNsPush instead of namePush. The latter is unused now and should probably be removed from the public API. I can't see how it could be used reasonably from client code and the unprefixed name has always polluted the global namespace. Fixes a null pointer dereference introduced with de5b624f when parsing in SAX1 mode. Found by OSS-Fuzz.
ce00c36e	2021-05-08T21:20:05	Store per-element parser state in a struct Make the parser context's "pushTab" point to an array of structs instead of void pointers. This avoids casting unrelated types to void pointers, improving readability and portability, and allows for more efficient packing. Ultimately, the struct could be extended to include the contents of "nameTab" and "spaceTab", further simplifying the code. Historically, "pushTab" was only used by the push parser (hence the name), so the change to the public headers should be safe. Also remove an unused parameter from xmlParseEndTag2.
de5b624f	2021-05-08T20:21:29	Fix handling of unexpected EOF in xmlParseContent Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was removed in commit 62150ed2. This commit also introduced a regression for direct users of xmlParseContent. Unclosed tags weren't checked.
3e80560d	2021-05-07T10:51:38	Fix line numbers in error messages for mismatched tags Commit 62150ed2 introduced a small regression in the error messages for mismatched tags. This typically only affected messages after the first mismatch, but with custom SAX handlers all line numbers would be off. This also fixes line numbers in the SAX push parser which were never handled correctly.
7279d236	2021-05-06T10:37:07	Fix htmlTagLookup Fix regression introduced with b25acce8. Some users like libxslt may call the HTML output functions on documents with uppercase tag names, so we must keep case-insensitive string comparison. Fixes #248.
33468d7e	2021-05-03T16:09:44	update for xsd:language type check Fixes #242.
babe7503	2021-05-01T16:53:33	Propagate error in xmlParseElementChildrenContentDeclPriv Check return value of recursive calls to xmlParseElementChildrenContentDeclPriv and return immediately in case of errors. Otherwise, struct xmlElementContent could contain unexpected null pointers, leading to a null deref when post-validating documents which aren't well-formed and parsed in recovery mode. Fixes #243.
5465a8e5	2021-04-25T21:19:59	Update INSTALL.libxml2 Fixes #238.
1098c30a	2021-04-22T19:26:28	Fix user-after-free with `xmllint --xinclude --dropdtd` The --dropdtd option can leave dangling pointers in entity reference nodes. Make sure to skip these nodes when processing XIncludes. This also avoids scanning entity declarations and even modifying them inadvertently during XInclude processing. Move from a block list to an allow list approach to avoid descending into other node types that can't contain elements. Fixes #237.
72b3c067	2021-04-22T19:24:50	Fix dangling pointer with `xmllint --dropdtd` Reset doc->intSubset when dropping the DTD.
bf227135	2020-08-16T17:19:35	Validate UTF8 in xmlEncodeEntities Code is currently assuming UTF-8 without validating. Truncated UTF-8 input can cause out-of-bounds array access. Adds further checks to partial fix in 50f06b3e. Fixes #178
1358d157	2021-04-21T13:23:27	Fix use-after-free with `xmllint --html --push` Call htmlCtxtUseOptions to make sure that names aren't stored in dictionaries. Note that this issue only affects xmllint using the HTML push parser. Fixes #230.
fb08d9fe	2021-03-20T22:02:26	Fix include order in c14n.h - Include xmlversion.h before testing feature flags. - Include libxml headers before extern "C". Fixes #226.
d3a02679	2021-03-15T13:44:34	CMake: Only add postfixes if MSVC Currently, it catches mingw-w64 in there as well, but mingw-w64 follows linux-like naming with no weird postfixes Signed-off-by: Christopher Degawa <ccom@randomderp.com>
868e49cf	2021-03-16T10:36:04	Allow FP division by zero in xmlXPathInit
d25460da	2021-03-13T19:12:00	Fix XPath NaN/Inf for older GCC versions The DBL_MAX approach could lead to errors caused by excess precision. Switch back to the division-by-zero approach with a work-around for MSVC and use the extern globals instead of macro expressions.
e20c9c14	2021-03-13T18:41:47	Fix xmlGetNodePath with invalid node types Make xmlGetNodePath return NULL instead of invalid XPath when hitting unsupported node types like DTD content. Reported here: https://mail.gnome.org/archives/xml/2021-January/msg00012.html Original report: https://bugs.php.net/bug.php?id=80680
c3fd8c42	2021-03-13T17:19:32	Fix exponential behavior with recursive entities Fix another case where only recursion depth was limited, but entities would still be expanded over and over again. The test case discovered by fuzzing only affected parsing in recovery mode with XML_PARSE_RECOVER. Found by OSS-Fuzz.
683de7ef	2021-03-04T19:06:04	Fix duplicate xmlStrEqual calls in htmlParseEndTag
8095365b	2021-03-04T18:46:11	Speed up htmlCheckAutoClose Switch to binary search.
b25acce8	2021-03-04T17:44:45	Speed up htmlTagLookup Switch to binary search. This is the first time bsearch is used in the libxml2 code base. But it's a standard library function since C89 and should be portable.
ad101bb5	2021-03-02T13:32:53	Clarify xmlNewDocProp documentation
a6e6498f	2021-03-02T13:09:06	Stop checking attributes for UTF-8 validity I can't see a reason to check attribute content for UTF-8 validity. Other parts of the API like xmlNewText have always assumed valid UTF-8 as extra checks only slow down processing. Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not freeing the old encoding would cause a memory leak. Note that this was last changed in 2008 with commit 6f8611fd which removed unnecessary encoding/decoding steps. Setting attributes should be even faster now. Found by OSS-Fuzz.
8446d459	2021-03-01T20:56:40	Reduce some fuzzer timeouts OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for several hundred hours without hitting the 20s timeout. It seems that most timeouts resulting from accidentally quadratic behavior in the HTML parser have been fixed. Start to gradually reduce the timeout to find new performance issues.
688b41a0	2021-03-01T14:17:42	Fix quadratic behavior when looking up xml:* attributes Add a special case for the predefined XML namespace when looking up DTD attribute defaults in xmlGetPropNodeInternal to avoid calling xmlGetNsList. This fixes quadratic behavior in - xmlNodeGetBase - xmlNodeGetLang - xmlNodeGetSpacePreserve Found by OSS-Fuzz.
ce2fbaa8	2021-02-22T22:01:57	Only run a few CI tests unless scheduled Only run the following tests by default - gcc - clang:asan - cmake:mingw:w64-x86_64:shared - cmake:msvc:v141:x64:shared
85c817a2	2021-02-22T21:28:21	Improve fuzzer stability - Add more calls to xmlInitializeCatalog. - Call xmlResetLastError after fuzzing each input.
f9ccb3b8	2021-02-22T21:26:13	Check for feature flags in fuzzer tests
88c657d6	2021-02-22T21:11:00	Use CMake PROJECT_VERSION
7a90bdfa	2021-02-22T17:58:06	Another attempt at improving fuzzer stability xmlInitializeCatalog is not called from xmlInitParser.
0fb3ae58	2021-02-22T17:31:05	Revert "Improve HTML fuzzer stability" This reverts commit de1b51eddcc17fd7ed1bbcc6d5d7d529407dfbe2.
0987001c	2021-02-22T12:29:56	Add charset names to fuzzing dictionaries
de1b51ed	2021-02-22T12:25:29	Improve HTML fuzzer stability Call htmlInitAutoClose during fuzzer initialization to fix stability issue. Leave a note concerning problems with this function.
09320f05	2021-02-21T14:26:40	Add CI for MSVC x86
dcb80b92	2021-02-20T20:30:43	Fix slow parsing of HTML with encoding errors Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.
02bee4c4	2021-02-02T22:27:52	Add a flag to not output anything when xmllint succeeded
4defa2c2	2021-02-12T09:39:38	Fix warnings in libxml.m4 with autoconf 2.70+. Closes #219.
cbe1212d	2021-02-09T17:07:21	Fix null deref introduced with previous commit Found by OSS-Fuzz.
01411e7c	2021-02-08T20:58:32	Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
07920b43	2021-01-26T05:42:48	Add the copy of type from original xmlDoc in xmlCopyDoc() A bug related to php DOMDocument: https://bugs.php.net/bug.php?id=80665 When copy/clone an html document, the xmlDoc->type goes from XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
2065d340	2021-02-05T23:40:18	Add CI for CMake on MSVC
afad3721	2021-01-31T09:53:56	parser.c: shrink the input buffer when appropriate Fixes GNOME/libxml2#200 Also see discussions at: - GNOME/libxml2#192 - https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e - https://github.com/sparklemotion/nokogiri/issues/2132
ec808a44	2021-02-07T13:57:49	Speed up HTML fuzzer htmlDocDumpMemory uses the "HTML" encoding if no other encoding was specified in the source HTML. This encoding can be extremely slow because of an inefficiency in htmlEntityValueLookup. Stop encoding the output for now.
e6495e47	2021-02-07T13:38:01	Remove unused encoding parameter of HTML output functions The encoding string is unused. Encodings are set by way of the output buffer.
954696e7	2021-02-07T13:23:09	Fix infinite loop in HTML parser introduced with recent commits Check for XML_PARSER_EOF to avoid an infinite loop introduced with recent changes to the HTML push parser. Found by OSS-Fuzz.
acb35667	2021-02-03T13:48:40	Fix quadratic runtime when parsing CDATA sections Use optimized concatenation for CDATA sections in addition to normal text. This also affects HTML script content. Found by OSS-Fuzz.
f93ca3e1	2021-01-15T17:53:27	Update minimum required CMake version
00487289	2020-12-31T16:34:25	Add variables for configured options to CMake config files
95519737	2020-12-31T13:41:19	Check if variables exist when defining targets
c26e4525	2020-12-31T13:18:14	Check if target exists when reading target properties
ec119875	2020-12-30T14:40:43	Add xmlcatalog target and definition to config files
2377a312	2020-12-30T14:40:04	Remove include directories for link-only dependencies
26835480	2020-12-30T14:28:24	Fix ICU build in CMake
296ab61e	2020-11-19T22:06:36	Configure pkgconfig, xml2-config, and xml2Conf.sh file
79301d3d	2020-12-18T12:50:21	Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.
45da175c	2020-12-18T12:14:52	Fix memory leak in xmlParseElementMixedContentDecl Free parsed content if malloc fails to avoid a memory leak. Found with libFuzzer.
1d73f07d	2020-12-18T00:55:00	Fix null deref in xmlStringGetNodeList Check for malloc failure to avoid null deref. Found with libFuzzer.
e2b975c3	2020-12-18T00:50:34	Handle malloc failures in fuzzing code Avoid misdiagnosis in OOM situations.
a67b63d1	2020-10-11T14:15:37	use new htmlParseLookupCommentEnd to find comment ends Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
29f5d20e	2020-08-03T17:36:05	htmlParseComment: treat `--!>` as if it closed the comment See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
e28d9347	2020-08-04T14:53:19	add test coverage for incorrectly-closed comments this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.
9086988f	2020-12-16T15:41:52	Enforce maximum length of fuzz input Remove the libfuzzer max_len option which doesn't apply to other fuzzing engines. Enforce the maximum length directly in the fuzz targets. For the xml target, lower the maximum when expanding entities to avoid timeout and OOM errors.
1fe38530	2020-12-16T15:27:13	Remove temporary members from struct _xmlXPathContext These values are hardcoded now and the struct members, while public, were recently introduced and never part of an official release.
8ca3a59b	2020-12-15T20:14:28	Fix integer overflow in xmlSchemaGetParticleTotalRangeMin The function is only used once and its return value is only checked for zero. Disable the function like its Max counterpart and add an implementation for the special case. Found by OSS-Fuzz.
649d02ea	2020-12-07T20:19:53	encoding: fix memleak in xmlRegisterCharEncodingHandler() The return type of xmlRegisterCharEncodingHandler() is void. The invoker cannot determine whether xmlRegisterCharEncodingHandler() is executed successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the "handler" is not added to the array "handlers". As a result, the memory of "handler" cannot be managed and released: memory leakage. so add "xmlfree(handler)" to fix memory leakage on the failure branch of xmlRegisterCharEncodingHandler(). Reported-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
cb7a572b	2020-12-07T20:17:34	xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val" The xmlSchemaGetFacetValueAsUlong() API is an external API. The validity of external input parameters must be strictly verified. Before accessing "facet->val->value", we need check whether "facet->val" is a null pointer. Signed-off-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
84b76d99	2020-12-06T17:26:23	Update CMake config files
d0ccb3a6	2020-12-06T17:25:52	Add xmlcatalog and xmllint to CMake export
acdc2ff3	2020-06-04T23:02:08	Simplify xmlexports.h All the compiler switches essentially set the same macros. The only exception was MSVC which omitted the "extern" keyword for exported variables. This in turn broke clang-cl. This commit rewrites and simplifies the whole header. Closes #163.
a218ff0e	2020-12-06T17:26:36	Fix null pointer deref in xmlXPtrRangeInsideFunction Found by OSS-Fuzz.
94c2e415	2020-12-06T16:38:00	Fix quadratic runtime in HTML push parser with null bytes Null bytes in the input stream do not necessarily signal an EOF condition. Check the stream pointers for EOF to avoid quadratic rescanning of input data. Note that the CUR_CHAR macro used in functions like htmlParseCharData calls htmlCurrentChar which translates null bytes. Found by OSS-Fuzz.
1c4f9a6d	2020-11-25T18:01:51	Require dependencies based on enabled CMake options
faea2fa9	2020-11-21T01:21:56	Avoid quadratic checking of identity-constraints key/unique/keyref schema attributes currently use qudratic loops to check their various constraints (that keys are unique and that keyrefs refer to existing keys). That becomes extremely slow if there are many elements with keys. This happens in the wild with e.g. the OVAL XML descriptions of security patches. You need the openscap schemata, and then an example xml file: % zypper in openscap-utils % wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml % time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 16m59,857s user 16m55,787s sys 0m1,060s This patch makes libxml use a hash table to avoid the quadratic behaviour. The existing hash table only accepts strings as keys, so we're mostly reusing the canonical representation of key values to derive such strings (with the caveat given in a comment). The alternative would be to rework the hash table code to accept either numbers or free functions as hash workers, but the code is fast enough as is. With the patch we have this then: % time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 0m3,531s user 0m3,427s sys 0m0,103s So, a ~300x speedup. This patch survives 'make check' and 'make tests'.
8272db53	2020-11-28T22:54:40	Use NAMELINK_COMPONENT in CMake install
5c7bdbc9	2020-11-25T18:41:14	Add CMake files to EXTRA_DIST
7a62870a	2020-11-19T22:06:23	Add missing compile definition for static builds to CMake
e028d293	2020-11-19T17:58:46	Add CI for CMake on Linux and MinGW
b516ed18	2020-11-12T12:53:43	Fix building with ICU 68. ICU 68 no longer defines the TRUE macro. Closes #204.
ac5e9991	2020-11-10T15:42:36	Convert python/libxml.c to PY_SSIZE_T_CLEAN Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string length (int len) explicitly to Py_ssize_t when passing a string to a function call using PyObject_CallMethod() with the "s#" format.
f42a0524	2020-11-09T18:19:31	Build the Python extension with PY_SSIZE_T_CLEAN The Python extension module now uses Py_ssize_t rather than int for string lengths. This change makes the extension compatible with Python 3.10. Fixes #203.
0ace6c4d	2020-11-19T17:35:11	Add CI test for Python 3
7c06d99e	2020-10-27T11:29:20	Fix xmlURIEscape memory leaks. Found by running the fuzz/uri.c fuzzer under asan (internal Android bug 171610679). Always free `ret` when exiting on failure. I've moved the definition of NULLCHK down past where ret is always initialized to make it clear that this is safe. This patch also fixes the indentation of two of the NULLCHK call sites to make it more obvious that NULLCHK isn't `if`-like.
31c6ce3b	2020-11-09T17:55:44	Avoid call stack overflow with XML reader and recursive XIncludes Don't process XIncludes in the result of another inclusion to avoid infinite recursion resulting in a call stack overflow. This is something the XInclude engine shouldn't allow but correct handling of intra-document includes would require major changes. Found by OSS-Fuzz.

dea91c97

2021-07-27T16:12:54

Fix buffering in xmlOutputBufferWrite Fix a regression introduced with commit a697ed1e which caused xmlOutputBufferWrite to flush internal buffers too late. Fixes #296.

ec6e3efb

2021-07-06T21:56:04

Patch to forbid epsilon-reduction of final states When building the internal representation of a regexp, it is possible that a lot of empty transitions are created. Therefore there is a step to reduce them in the function xmlFAEliminateSimpleEpsilonTransitions. There is an error there for this case: * State 1 has a transition with an atom (in this case "a") to state 2. * State 2 is final and has an epsilon transition to state 1. After reduction it looked like: * State 1 has a transition with an atom (in this case "a") to itself and is final. In other words, the empty string is accepted when it shouldn't be. The attached patch skips the reduction step for final states. An alternative would be to insert or increment counters when reducing a final state, but this seemed error prone and unnecessary, since there aren't that many final states. Fixes #282

22f15211

2021-06-04T09:57:46

Use version in configure.ac for CMake Now CMake script reads version from configure.ac to prevent unsynchronized versions

92d9ab4c

2021-06-07T15:09:53

Fix whitespace when serializing empty HTML documents The old, non-recursive HTML serialization code would always terminate the output with a newline. The new implementation omitted the newline if the document node had no children. Readd the newline when serializing empty documents. Fixes #266.

3e1aad4f

2021-06-02T17:31:49

Fix XPath recursion limit Fix accounting of recursion depth when parsing XPath expressions. This silly bug introduced in commit 804c5297 could lead to spurious errors when parsing larger expressions or XSLT documents. Should fix #264.

13ad8736

2021-05-25T10:55:25

Fix regression in xmlNodeDumpOutputInternal Commit 85b1792e could cause additional whitespace if xmlNodeDump was called with a non-zero starting level.

a46e85f6

2021-05-22T15:20:46

Update CMake project version

a1cac3bb

2021-05-22T14:51:26

Add CMake alias targets for embedded projects

2c0f2f03

2021-05-18T09:52:55

Fix some validation errors in the FAQ Move paragraphs inside li elements.

b92b16f6

2021-05-19T10:15:54

Remove unused variable in xmlCharEncOutFunc Fixes a compiler warning: encoding.c: In function 'xmlCharEncOutFunc__internal_alias': encoding.c:2632:9: warning: unused variable 'output' [-Wunused-variable] 2632 | int output = 0; https://gitlab.gnome.org/GNOME/libxml2/-/issues/254

7d4060d2

2021-05-16T18:00:21

Add missing file xmlwin32version.h.in to EXTRA_DIST

4fc473d7

2021-05-16T17:48:07

Add instructions on how to use CMake to compile libxml

85b1792e

2021-05-18T20:08:28

Work around lxml API abuse Make xmlNodeDumpOutput and htmlNodeDumpFormatOutput work with corrupted parent pointers. This used to work with the old recursive code but the non-recursive rewrite required parent pointers to be set correctly. Unfortunately, lxml relies on the old behavior and passes subtrees with a corrupted structure. Fall back to a recursive function call if an invalid parent pointer is detected. Fixes #255.

a7b9f3eb

2021-05-20T13:38:54

fix: avoid segfault at exit when using custom memory functions This extends the fix introduced by 956534e to Windows processes dynamically loading libxml2. Closes #256.

b48e77cf

2021-05-13T20:56:16

Release of libxml2-2.9.12 Brown paper bag release, some recently added sources were missing from the 2.9.11 tarball: - configure.ac: bump version - fuzz/Makefile.am: add fuzz.h and seed/regexp to EXTRA_DIST

e1bcffea

2021-05-13T15:35:21

Release of libxml2-2.9.11 Prompted by CVE-2021-3541, but this includes an awful lot of serious bug fixes by Nick and others. - configure.ac: bumped to new release - doc/* updated and regenerated

8598060b

2021-05-13T14:55:12

Patch for security issue CVE-2021-3541 This is relapted to parameter entities expansion and following the line of the billion laugh attack. Somehow in that path the counting of parameters was missed and the normal algorithm based on entities "density" was useless.

bfd2f430

2021-05-09T18:56:57

Fix null deref in legacy SAX1 parser Always call nameNsPush instead of namePush. The latter is unused now and should probably be removed from the public API. I can't see how it could be used reasonably from client code and the unprefixed name has always polluted the global namespace. Fixes a null pointer dereference introduced with de5b624f when parsing in SAX1 mode. Found by OSS-Fuzz.

ce00c36e

2021-05-08T21:20:05

Store per-element parser state in a struct Make the parser context's "pushTab" point to an array of structs instead of void pointers. This avoids casting unrelated types to void pointers, improving readability and portability, and allows for more efficient packing. Ultimately, the struct could be extended to include the contents of "nameTab" and "spaceTab", further simplifying the code. Historically, "pushTab" was only used by the push parser (hence the name), so the change to the public headers should be safe. Also remove an unused parameter from xmlParseEndTag2.

de5b624f

2021-05-08T20:21:29

Fix handling of unexpected EOF in xmlParseContent Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was removed in commit 62150ed2. This commit also introduced a regression for direct users of xmlParseContent. Unclosed tags weren't checked.

3e80560d

2021-05-07T10:51:38

Fix line numbers in error messages for mismatched tags Commit 62150ed2 introduced a small regression in the error messages for mismatched tags. This typically only affected messages after the first mismatch, but with custom SAX handlers all line numbers would be off. This also fixes line numbers in the SAX push parser which were never handled correctly.

7279d236

2021-05-06T10:37:07

Fix htmlTagLookup Fix regression introduced with b25acce8. Some users like libxslt may call the HTML output functions on documents with uppercase tag names, so we must keep case-insensitive string comparison. Fixes #248.

33468d7e

2021-05-03T16:09:44

update for xsd:language type check Fixes #242.

babe7503

2021-05-01T16:53:33

Propagate error in xmlParseElementChildrenContentDeclPriv Check return value of recursive calls to xmlParseElementChildrenContentDeclPriv and return immediately in case of errors. Otherwise, struct xmlElementContent could contain unexpected null pointers, leading to a null deref when post-validating documents which aren't well-formed and parsed in recovery mode. Fixes #243.

5465a8e5

2021-04-25T21:19:59

Update INSTALL.libxml2 Fixes #238.

1098c30a

2021-04-22T19:26:28

Fix user-after-free with `xmllint --xinclude --dropdtd` The --dropdtd option can leave dangling pointers in entity reference nodes. Make sure to skip these nodes when processing XIncludes. This also avoids scanning entity declarations and even modifying them inadvertently during XInclude processing. Move from a block list to an allow list approach to avoid descending into other node types that can't contain elements. Fixes #237.

72b3c067

2021-04-22T19:24:50

Fix dangling pointer with `xmllint --dropdtd` Reset doc->intSubset when dropping the DTD.

bf227135

2020-08-16T17:19:35

Validate UTF8 in xmlEncodeEntities Code is currently assuming UTF-8 without validating. Truncated UTF-8 input can cause out-of-bounds array access. Adds further checks to partial fix in 50f06b3e. Fixes #178

1358d157

2021-04-21T13:23:27

Fix use-after-free with `xmllint --html --push` Call htmlCtxtUseOptions to make sure that names aren't stored in dictionaries. Note that this issue only affects xmllint using the HTML push parser. Fixes #230.

fb08d9fe

2021-03-20T22:02:26

Fix include order in c14n.h - Include xmlversion.h before testing feature flags. - Include libxml headers before extern "C". Fixes #226.

d3a02679

2021-03-15T13:44:34

CMake: Only add postfixes if MSVC Currently, it catches mingw-w64 in there as well, but mingw-w64 follows linux-like naming with no weird postfixes Signed-off-by: Christopher Degawa <ccom@randomderp.com>

868e49cf

2021-03-16T10:36:04

Allow FP division by zero in xmlXPathInit

d25460da

2021-03-13T19:12:00

Fix XPath NaN/Inf for older GCC versions The DBL_MAX approach could lead to errors caused by excess precision. Switch back to the division-by-zero approach with a work-around for MSVC and use the extern globals instead of macro expressions.

e20c9c14

2021-03-13T18:41:47

Fix xmlGetNodePath with invalid node types Make xmlGetNodePath return NULL instead of invalid XPath when hitting unsupported node types like DTD content. Reported here: https://mail.gnome.org/archives/xml/2021-January/msg00012.html Original report: https://bugs.php.net/bug.php?id=80680

c3fd8c42

2021-03-13T17:19:32

Fix exponential behavior with recursive entities Fix another case where only recursion depth was limited, but entities would still be expanded over and over again. The test case discovered by fuzzing only affected parsing in recovery mode with XML_PARSE_RECOVER. Found by OSS-Fuzz.

683de7ef

2021-03-04T19:06:04

Fix duplicate xmlStrEqual calls in htmlParseEndTag

8095365b

2021-03-04T18:46:11

Speed up htmlCheckAutoClose Switch to binary search.

b25acce8

2021-03-04T17:44:45

Speed up htmlTagLookup Switch to binary search. This is the first time bsearch is used in the libxml2 code base. But it's a standard library function since C89 and should be portable.

ad101bb5

2021-03-02T13:32:53

Clarify xmlNewDocProp documentation

a6e6498f

2021-03-02T13:09:06

Stop checking attributes for UTF-8 validity I can't see a reason to check attribute content for UTF-8 validity. Other parts of the API like xmlNewText have always assumed valid UTF-8 as extra checks only slow down processing. Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not freeing the old encoding would cause a memory leak. Note that this was last changed in 2008 with commit 6f8611fd which removed unnecessary encoding/decoding steps. Setting attributes should be even faster now. Found by OSS-Fuzz.

8446d459

2021-03-01T20:56:40

Reduce some fuzzer timeouts OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for several hundred hours without hitting the 20s timeout. It seems that most timeouts resulting from accidentally quadratic behavior in the HTML parser have been fixed. Start to gradually reduce the timeout to find new performance issues.

688b41a0

2021-03-01T14:17:42

Fix quadratic behavior when looking up xml:* attributes Add a special case for the predefined XML namespace when looking up DTD attribute defaults in xmlGetPropNodeInternal to avoid calling xmlGetNsList. This fixes quadratic behavior in - xmlNodeGetBase - xmlNodeGetLang - xmlNodeGetSpacePreserve Found by OSS-Fuzz.

ce2fbaa8

2021-02-22T22:01:57

Only run a few CI tests unless scheduled Only run the following tests by default - gcc - clang:asan - cmake:mingw:w64-x86_64:shared - cmake:msvc:v141:x64:shared

85c817a2

2021-02-22T21:28:21

Improve fuzzer stability - Add more calls to xmlInitializeCatalog. - Call xmlResetLastError after fuzzing each input.

f9ccb3b8

2021-02-22T21:26:13

Check for feature flags in fuzzer tests

88c657d6

2021-02-22T21:11:00

Use CMake PROJECT_VERSION

7a90bdfa

2021-02-22T17:58:06

Another attempt at improving fuzzer stability xmlInitializeCatalog is not called from xmlInitParser.

0fb3ae58

2021-02-22T17:31:05

Revert "Improve HTML fuzzer stability" This reverts commit de1b51eddcc17fd7ed1bbcc6d5d7d529407dfbe2.

0987001c

2021-02-22T12:29:56

Add charset names to fuzzing dictionaries

de1b51ed

2021-02-22T12:25:29

Improve HTML fuzzer stability Call htmlInitAutoClose during fuzzer initialization to fix stability issue. Leave a note concerning problems with this function.

09320f05

2021-02-21T14:26:40

Add CI for MSVC x86

dcb80b92

2021-02-20T20:30:43

Fix slow parsing of HTML with encoding errors Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.

02bee4c4

2021-02-02T22:27:52

Add a flag to not output anything when xmllint succeeded

4defa2c2

2021-02-12T09:39:38

Fix warnings in libxml.m4 with autoconf 2.70+. Closes #219.

cbe1212d

2021-02-09T17:07:21

Fix null deref introduced with previous commit Found by OSS-Fuzz.

01411e7c

2021-02-08T20:58:32

Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.

07920b43

2021-01-26T05:42:48

Add the copy of type from original xmlDoc in xmlCopyDoc() A bug related to php DOMDocument: https://bugs.php.net/bug.php?id=80665 When copy/clone an html document, the xmlDoc->type goes from XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.

2065d340

2021-02-05T23:40:18

Add CI for CMake on MSVC

afad3721

2021-01-31T09:53:56

parser.c: shrink the input buffer when appropriate Fixes GNOME/libxml2#200 Also see discussions at: - GNOME/libxml2#192 - https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e - https://github.com/sparklemotion/nokogiri/issues/2132

ec808a44

2021-02-07T13:57:49

Speed up HTML fuzzer htmlDocDumpMemory uses the "HTML" encoding if no other encoding was specified in the source HTML. This encoding can be extremely slow because of an inefficiency in htmlEntityValueLookup. Stop encoding the output for now.

e6495e47

2021-02-07T13:38:01

Remove unused encoding parameter of HTML output functions The encoding string is unused. Encodings are set by way of the output buffer.

954696e7

2021-02-07T13:23:09

Fix infinite loop in HTML parser introduced with recent commits Check for XML_PARSER_EOF to avoid an infinite loop introduced with recent changes to the HTML push parser. Found by OSS-Fuzz.

acb35667

2021-02-03T13:48:40

Fix quadratic runtime when parsing CDATA sections Use optimized concatenation for CDATA sections in addition to normal text. This also affects HTML script content. Found by OSS-Fuzz.

f93ca3e1

2021-01-15T17:53:27

Update minimum required CMake version

00487289

2020-12-31T16:34:25

Add variables for configured options to CMake config files

95519737

2020-12-31T13:41:19

Check if variables exist when defining targets

c26e4525

2020-12-31T13:18:14

Check if target exists when reading target properties

ec119875

2020-12-30T14:40:43

Add xmlcatalog target and definition to config files

2377a312

2020-12-30T14:40:04

Remove include directories for link-only dependencies

26835480

2020-12-30T14:28:24

Fix ICU build in CMake

296ab61e

2020-11-19T22:06:36

Configure pkgconfig, xml2-config, and xml2Conf.sh file

79301d3d

2020-12-18T12:50:21

Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.

45da175c

2020-12-18T12:14:52

Fix memory leak in xmlParseElementMixedContentDecl Free parsed content if malloc fails to avoid a memory leak. Found with libFuzzer.

1d73f07d

2020-12-18T00:55:00

Fix null deref in xmlStringGetNodeList Check for malloc failure to avoid null deref. Found with libFuzzer.

e2b975c3

2020-12-18T00:50:34

Handle malloc failures in fuzzing code Avoid misdiagnosis in OOM situations.

a67b63d1

2020-10-11T14:15:37

use new htmlParseLookupCommentEnd to find comment ends Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment

29f5d20e

2020-08-03T17:36:05

htmlParseComment: treat `--!>` as if it closed the comment See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment

e28d9347

2020-08-04T14:53:19

add test coverage for incorrectly-closed comments this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.

9086988f

2020-12-16T15:41:52

Enforce maximum length of fuzz input Remove the libfuzzer max_len option which doesn't apply to other fuzzing engines. Enforce the maximum length directly in the fuzz targets. For the xml target, lower the maximum when expanding entities to avoid timeout and OOM errors.

1fe38530

2020-12-16T15:27:13

Remove temporary members from struct _xmlXPathContext These values are hardcoded now and the struct members, while public, were recently introduced and never part of an official release.

8ca3a59b

2020-12-15T20:14:28

Fix integer overflow in xmlSchemaGetParticleTotalRangeMin The function is only used once and its return value is only checked for zero. Disable the function like its Max counterpart and add an implementation for the special case. Found by OSS-Fuzz.

649d02ea

2020-12-07T20:19:53

encoding: fix memleak in xmlRegisterCharEncodingHandler() The return type of xmlRegisterCharEncodingHandler() is void. The invoker cannot determine whether xmlRegisterCharEncodingHandler() is executed successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the "handler" is not added to the array "handlers". As a result, the memory of "handler" cannot be managed and released: memory leakage. so add "xmlfree(handler)" to fix memory leakage on the failure branch of xmlRegisterCharEncodingHandler(). Reported-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>

cb7a572b

2020-12-07T20:17:34

xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val" The xmlSchemaGetFacetValueAsUlong() API is an external API. The validity of external input parameters must be strictly verified. Before accessing "facet->val->value", we need check whether "facet->val" is a null pointer. Signed-off-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>

84b76d99

2020-12-06T17:26:23

Update CMake config files

d0ccb3a6

2020-12-06T17:25:52

Add xmlcatalog and xmllint to CMake export

acdc2ff3

2020-06-04T23:02:08

Simplify xmlexports.h All the compiler switches essentially set the same macros. The only exception was MSVC which omitted the "extern" keyword for exported variables. This in turn broke clang-cl. This commit rewrites and simplifies the whole header. Closes #163.

a218ff0e

2020-12-06T17:26:36

Fix null pointer deref in xmlXPtrRangeInsideFunction Found by OSS-Fuzz.

94c2e415

2020-12-06T16:38:00

Fix quadratic runtime in HTML push parser with null bytes Null bytes in the input stream do not necessarily signal an EOF condition. Check the stream pointers for EOF to avoid quadratic rescanning of input data. Note that the CUR_CHAR macro used in functions like htmlParseCharData calls htmlCurrentChar which translates null bytes. Found by OSS-Fuzz.

1c4f9a6d

2020-11-25T18:01:51

Require dependencies based on enabled CMake options

faea2fa9

2020-11-21T01:21:56

Avoid quadratic checking of identity-constraints key/unique/keyref schema attributes currently use qudratic loops to check their various constraints (that keys are unique and that keyrefs refer to existing keys). That becomes extremely slow if there are many elements with keys. This happens in the wild with e.g. the OVAL XML descriptions of security patches. You need the openscap schemata, and then an example xml file: % zypper in openscap-utils % wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml % time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 16m59,857s user 16m55,787s sys 0m1,060s This patch makes libxml use a hash table to avoid the quadratic behaviour. The existing hash table only accepts strings as keys, so we're mostly reusing the canonical representation of key values to derive such strings (with the caveat given in a comment). The alternative would be to rework the hash table code to accept either numbers or free functions as hash workers, but the code is fast enough as is. With the patch we have this then: % time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 0m3,531s user 0m3,427s sys 0m0,103s So, a ~300x speedup. This patch survives 'make check' and 'make tests'.

8272db53

2020-11-28T22:54:40

Use NAMELINK_COMPONENT in CMake install

5c7bdbc9

2020-11-25T18:41:14

Add CMake files to EXTRA_DIST

7a62870a

2020-11-19T22:06:23

Add missing compile definition for static builds to CMake

e028d293

2020-11-19T17:58:46

Add CI for CMake on Linux and MinGW

b516ed18

2020-11-12T12:53:43

Fix building with ICU 68. ICU 68 no longer defines the TRUE macro. Closes #204.

ac5e9991

2020-11-10T15:42:36

Convert python/libxml.c to PY_SSIZE_T_CLEAN Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string length (int len) explicitly to Py_ssize_t when passing a string to a function call using PyObject_CallMethod() with the "s#" format.

f42a0524

2020-11-09T18:19:31

Build the Python extension with PY_SSIZE_T_CLEAN The Python extension module now uses Py_ssize_t rather than int for string lengths. This change makes the extension compatible with Python 3.10. Fixes #203.

0ace6c4d

2020-11-19T17:35:11

Add CI test for Python 3

7c06d99e

2020-10-27T11:29:20

Fix xmlURIEscape memory leaks. Found by running the fuzz/uri.c fuzzer under asan (internal Android bug 171610679). Always free `ret` when exiting on failure. I've moved the definition of NULLCHK down past where ret is always initialized to make it clear that this is safe. This patch also fixes the indentation of two of the NULLCHK call sites to make it more obvious that NULLCHK isn't `if`-like.

31c6ce3b

2020-11-09T17:55:44

Avoid call stack overflow with XML reader and recursive XIncludes Don't process XIncludes in the result of another inclusion to avoid infinite recursion resulting in a call stack overflow. This is something the XInclude engine shouldn't allow but correct handling of intra-document includes would require major changes. Found by OSS-Fuzz.

kc3-lang/libxml2

Log