kmx git

Commit	Date	Message
de1b51ed	2021-02-22T12:25:29	Improve HTML fuzzer stability Call htmlInitAutoClose during fuzzer initialization to fix stability issue. Leave a note concerning problems with this function.
09320f05	2021-02-21T14:26:40	Add CI for MSVC x86
dcb80b92	2021-02-20T20:30:43	Fix slow parsing of HTML with encoding errors Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.
02bee4c4	2021-02-02T22:27:52	Add a flag to not output anything when xmllint succeeded
4defa2c2	2021-02-12T09:39:38	Fix warnings in libxml.m4 with autoconf 2.70+. Closes #219.
cbe1212d	2021-02-09T17:07:21	Fix null deref introduced with previous commit Found by OSS-Fuzz.
01411e7c	2021-02-08T20:58:32	Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
07920b43	2021-01-26T05:42:48	Add the copy of type from original xmlDoc in xmlCopyDoc() A bug related to php DOMDocument: https://bugs.php.net/bug.php?id=80665 When copy/clone an html document, the xmlDoc->type goes from XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
2065d340	2021-02-05T23:40:18	Add CI for CMake on MSVC
afad3721	2021-01-31T09:53:56	parser.c: shrink the input buffer when appropriate Fixes GNOME/libxml2#200 Also see discussions at: - GNOME/libxml2#192 - https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e - https://github.com/sparklemotion/nokogiri/issues/2132
ec808a44	2021-02-07T13:57:49	Speed up HTML fuzzer htmlDocDumpMemory uses the "HTML" encoding if no other encoding was specified in the source HTML. This encoding can be extremely slow because of an inefficiency in htmlEntityValueLookup. Stop encoding the output for now.
e6495e47	2021-02-07T13:38:01	Remove unused encoding parameter of HTML output functions The encoding string is unused. Encodings are set by way of the output buffer.
954696e7	2021-02-07T13:23:09	Fix infinite loop in HTML parser introduced with recent commits Check for XML_PARSER_EOF to avoid an infinite loop introduced with recent changes to the HTML push parser. Found by OSS-Fuzz.
acb35667	2021-02-03T13:48:40	Fix quadratic runtime when parsing CDATA sections Use optimized concatenation for CDATA sections in addition to normal text. This also affects HTML script content. Found by OSS-Fuzz.
f93ca3e1	2021-01-15T17:53:27	Update minimum required CMake version
00487289	2020-12-31T16:34:25	Add variables for configured options to CMake config files
95519737	2020-12-31T13:41:19	Check if variables exist when defining targets
c26e4525	2020-12-31T13:18:14	Check if target exists when reading target properties
ec119875	2020-12-30T14:40:43	Add xmlcatalog target and definition to config files
2377a312	2020-12-30T14:40:04	Remove include directories for link-only dependencies
26835480	2020-12-30T14:28:24	Fix ICU build in CMake
296ab61e	2020-11-19T22:06:36	Configure pkgconfig, xml2-config, and xml2Conf.sh file
79301d3d	2020-12-18T12:50:21	Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.
45da175c	2020-12-18T12:14:52	Fix memory leak in xmlParseElementMixedContentDecl Free parsed content if malloc fails to avoid a memory leak. Found with libFuzzer.
1d73f07d	2020-12-18T00:55:00	Fix null deref in xmlStringGetNodeList Check for malloc failure to avoid null deref. Found with libFuzzer.
e2b975c3	2020-12-18T00:50:34	Handle malloc failures in fuzzing code Avoid misdiagnosis in OOM situations.
a67b63d1	2020-10-11T14:15:37	use new htmlParseLookupCommentEnd to find comment ends Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
29f5d20e	2020-08-03T17:36:05	htmlParseComment: treat `--!>` as if it closed the comment See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
e28d9347	2020-08-04T14:53:19	add test coverage for incorrectly-closed comments this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.
9086988f	2020-12-16T15:41:52	Enforce maximum length of fuzz input Remove the libfuzzer max_len option which doesn't apply to other fuzzing engines. Enforce the maximum length directly in the fuzz targets. For the xml target, lower the maximum when expanding entities to avoid timeout and OOM errors.
1fe38530	2020-12-16T15:27:13	Remove temporary members from struct _xmlXPathContext These values are hardcoded now and the struct members, while public, were recently introduced and never part of an official release.
8ca3a59b	2020-12-15T20:14:28	Fix integer overflow in xmlSchemaGetParticleTotalRangeMin The function is only used once and its return value is only checked for zero. Disable the function like its Max counterpart and add an implementation for the special case. Found by OSS-Fuzz.
649d02ea	2020-12-07T20:19:53	encoding: fix memleak in xmlRegisterCharEncodingHandler() The return type of xmlRegisterCharEncodingHandler() is void. The invoker cannot determine whether xmlRegisterCharEncodingHandler() is executed successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the "handler" is not added to the array "handlers". As a result, the memory of "handler" cannot be managed and released: memory leakage. so add "xmlfree(handler)" to fix memory leakage on the failure branch of xmlRegisterCharEncodingHandler(). Reported-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
cb7a572b	2020-12-07T20:17:34	xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val" The xmlSchemaGetFacetValueAsUlong() API is an external API. The validity of external input parameters must be strictly verified. Before accessing "facet->val->value", we need check whether "facet->val" is a null pointer. Signed-off-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
84b76d99	2020-12-06T17:26:23	Update CMake config files
d0ccb3a6	2020-12-06T17:25:52	Add xmlcatalog and xmllint to CMake export
acdc2ff3	2020-06-04T23:02:08	Simplify xmlexports.h All the compiler switches essentially set the same macros. The only exception was MSVC which omitted the "extern" keyword for exported variables. This in turn broke clang-cl. This commit rewrites and simplifies the whole header. Closes #163.
a218ff0e	2020-12-06T17:26:36	Fix null pointer deref in xmlXPtrRangeInsideFunction Found by OSS-Fuzz.
94c2e415	2020-12-06T16:38:00	Fix quadratic runtime in HTML push parser with null bytes Null bytes in the input stream do not necessarily signal an EOF condition. Check the stream pointers for EOF to avoid quadratic rescanning of input data. Note that the CUR_CHAR macro used in functions like htmlParseCharData calls htmlCurrentChar which translates null bytes. Found by OSS-Fuzz.
1c4f9a6d	2020-11-25T18:01:51	Require dependencies based on enabled CMake options
faea2fa9	2020-11-21T01:21:56	Avoid quadratic checking of identity-constraints key/unique/keyref schema attributes currently use qudratic loops to check their various constraints (that keys are unique and that keyrefs refer to existing keys). That becomes extremely slow if there are many elements with keys. This happens in the wild with e.g. the OVAL XML descriptions of security patches. You need the openscap schemata, and then an example xml file: % zypper in openscap-utils % wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml % time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 16m59,857s user 16m55,787s sys 0m1,060s This patch makes libxml use a hash table to avoid the quadratic behaviour. The existing hash table only accepts strings as keys, so we're mostly reusing the canonical representation of key values to derive such strings (with the caveat given in a comment). The alternative would be to rework the hash table code to accept either numbers or free functions as hash workers, but the code is fast enough as is. With the patch we have this then: % time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 0m3,531s user 0m3,427s sys 0m0,103s So, a ~300x speedup. This patch survives 'make check' and 'make tests'.
8272db53	2020-11-28T22:54:40	Use NAMELINK_COMPONENT in CMake install
5c7bdbc9	2020-11-25T18:41:14	Add CMake files to EXTRA_DIST
7a62870a	2020-11-19T22:06:23	Add missing compile definition for static builds to CMake
e028d293	2020-11-19T17:58:46	Add CI for CMake on Linux and MinGW
b516ed18	2020-11-12T12:53:43	Fix building with ICU 68. ICU 68 no longer defines the TRUE macro. Closes #204.
ac5e9991	2020-11-10T15:42:36	Convert python/libxml.c to PY_SSIZE_T_CLEAN Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string length (int len) explicitly to Py_ssize_t when passing a string to a function call using PyObject_CallMethod() with the "s#" format.
f42a0524	2020-11-09T18:19:31	Build the Python extension with PY_SSIZE_T_CLEAN The Python extension module now uses Py_ssize_t rather than int for string lengths. This change makes the extension compatible with Python 3.10. Fixes #203.
0ace6c4d	2020-11-19T17:35:11	Add CI test for Python 3
7c06d99e	2020-10-27T11:29:20	Fix xmlURIEscape memory leaks. Found by running the fuzz/uri.c fuzzer under asan (internal Android bug 171610679). Always free `ret` when exiting on failure. I've moved the definition of NULLCHK down past where ret is always initialized to make it clear that this is safe. This patch also fixes the indentation of two of the NULLCHK call sites to make it more obvious that NULLCHK isn't `if`-like.
31c6ce3b	2020-11-09T17:55:44	Avoid call stack overflow with XML reader and recursive XIncludes Don't process XIncludes in the result of another inclusion to avoid infinite recursion resulting in a call stack overflow. This is something the XInclude engine shouldn't allow but correct handling of intra-document includes would require major changes. Found by OSS-Fuzz.
7d6837ba	2020-10-25T20:21:43	Fix caret in regexp character group Apply Per Hedeland's patch from https://bugzilla.gnome.org/show_bug.cgi?id=779751 Fixes #188.
8a85263f	2020-10-25T20:08:16	Add fuzzing dictionaries to EXTRA_DIST Also add static seed corpus for the URI fuzzer.
1bde1040	2020-10-25T20:02:23	Add 'fuzz' subdirectory to DIST_SUBDIRS Fixes #191.
c0c26ff2	2020-10-11T16:33:07	parser.c: xmlParseCharData peek behavior fixed wrt newlines Previously, xmlParseCharData and xmlParseComment would consider 0xA to be unhandleable when seen as the first byte of an input chunk, and fall back to xmlParseCharDataComplex and xmlParseCommentComplex, which have different memory and performance characteristics. Fixes GNOME/libxml2#192
b46016b8	2020-10-17T18:03:09	Allow port numbers up to INT_MAX Also return an error on overflow.
46837d47	2020-10-03T01:13:35	Fix memory leaks in XPointer string-range function Found by OSS-Fuzz.
0b3c64d9	2020-09-29T18:08:37	Handle dumps of corrupted documents more gracefully Check parent pointers for NULL after the non-recursive rewrite of the serialization code. This avoids segfaults with corrupted documents which can apparently be seen with lxml, see issue #187.
847a3a11	2020-09-28T12:28:29	Fix use-after-free when XIncluding text from Reader The XML Reader can free text nodes coming from the XInclude engine before parsing has finished. Cache a copy of the text string, not the included node to avoid use after free. Found by OSS-Fuzz.
7929f057	2020-08-30T10:34:01	Fix SEGV in xmlSAXParseFileWithData Fixes #181.
e6ec58ec	2020-09-21T12:49:36	Fix null deref in XPointer expression error path Make sure that the filter functions introduced with commit c2f4da1a return node-sets without NULL pointers also in the error case. Found by OSS-Fuzz.
4e9cc18b	2020-09-21T11:00:23	Fix variable name in win32/configure.js Fix copy/paste error from previous commit.
5614c078	2020-09-21T10:55:45	Fix version parsing in win32/configure.js Adjust to configure.ac changes. Should fix #185.
8b88503a	2020-09-18T19:15:27	Don't call xmlXPathInit directly Call xmlInitParser which uses a lock to avoid race conditions. Fixes #184.
b215c270	2020-09-13T12:19:48	Fix cleanup of attributes in XML reader xml:id creates ID attributes even in documents without a DTD, so the check in xmlTextReaderFreeProp must be changed to avoid use after free. Found by OSS-Fuzz.
f0fd1b67	2020-08-26T00:16:38	Limit size of free lists in XML reader when fuzzing Keeping objects on a free list can hide memory errors. Only allow a single node on free lists used by the XML reader when fuzzing. This should hide fewer errors while still exercising the free list logic.
ba589adc	2020-08-25T23:50:39	Fix double free in XML reader with XIncludes An XInclude with empty fallback could lead to a double free in xmlTextReaderRead. Found by OSS-Fuzz.
6f1470a5	2020-08-25T18:50:45	Hardcode maximum XPath recursion depth Always limit nested functions calls to 5000. This avoids call stack overflows with deeply nested expressions. The expression parser produces about 10 nested function calls when parsing a subexpression in parentheses, so the effective nesting limit is about 500 which should be more than enough. Use a lower limit when fuzzing to account for increased memory usage when using sanitizers.
8c3ef083	2020-08-24T23:17:34	Pass URL of main entity in XML fuzzer
0d5f3710	2020-08-24T16:28:54	Consolidate seed corpus generation Implement file handling in C to speed up corpus generation.
0d9da029	2020-08-24T03:16:25	Test fuzz targets with dummy driver Run fuzz targets with files in seed corpus during test.
3fcf3193	2020-08-22T00:43:18	Fix regression introduced with commit d88df4b Revert the commit and use a different approach. Found by OSS-Fuzz.
87d20b55	2020-08-19T13:52:08	Fix regression introduced with commit 74dcc10b The code wasn't dead after all, but I can see no reason in delaying the XPointer evaluation. This could lead to nodes included earlier appearing in XPointer results.
fbb7fa9a	2020-08-19T13:13:20	Fix memory leak in xmlXIncludeAddNode error paths Found by OSS-Fuzz.
19cae17f	2020-08-19T13:07:28	Revert "Fix quadratic runtime in xi:fallback processing" This reverts commit 27119ec33c9f6b9830efa1e0da0acfa353dfa55a. Not copying fallback children didn't fix up namespaces and could lead to use-after-free errors. Found by OSS-Fuzz.
d63cfeca	2020-08-17T15:40:06	Add TODO comment in xinclude.c Add some thoughts on the major remaining problems with the XInclude implementation.
804c5297	2020-08-17T03:37:18	Stop using maxParserDepth in xpath.c Only use a single maxDepth value.
74dcc10b	2020-08-17T03:24:56	Remove dead code in xinclude.c 'doc' is checked for NULL in xmlXIncludeLoadDoc, so several code paths can be eliminated.
0ff52748	2020-08-17T02:54:28	Fix autotools warnings
2c747129	2020-08-17T00:54:12	Fix error reporting with xi:fallback When reporting errors, don't use href of xi:include if xi:fallback was used. I think this can only be reproduced with "xmllint --postvalid", see the original bug report: https://bugzilla.gnome.org/show_bug.cgi?id=152623
27119ec3	2020-08-17T00:05:19	Fix quadratic runtime in xi:fallback processing Copying the tree would lead to runtime quadratic in nested fallback depth, similar to naive string concatenation.
d88df4bd	2020-08-16T23:38:48	Fix corner case with empty xi:fallback xi:fallback could become empty after recursive expansion. Use a flag to track whether nodes should be skipped.
00a86d41	2020-08-16T23:38:00	Don't add formatting newlines to XInclude nodes
dba82a8c	2020-08-16T23:02:20	Fix XInclude regression introduced with recent commit The change to xmlXIncludeLoadFallback in commit 11b57459 could process already freed nodes if text nodes were merged after deleting nodes with an empty fallback. Found by OSS-Fuzz.
e1c2d0ad	2020-08-16T22:22:57	Fix memory leak in runtest.c
2b4769a6	2020-08-16T22:02:04	Make "xmllint --push --recovery" work
99fc048d	2020-08-14T14:18:50	Don't use SAX1 if all element handlers are NULL Running xmllint with "--sax --noout" installs a SAX2 handler with all callbacks set to NULL. In this case or similar situations, we don't want to switch to SAX1 parsing.
c1ba6f54	2020-08-15T18:32:29	Revert "Do not URI escape in server side includes" This reverts commit 960f0e275616cadc29671a218d7fb9b69eb35588. This commit introduced - an infinite loop, found by OSS-Fuzz, which could be easily fixed. - an algorithm with quadratic runtime - a security issue, see https://bugzilla.gnome.org/show_bug.cgi?id=769760 A better approach is to add an option not to escape URLs at all which libxml2 should have possibly done in the first place.
b82fa3dd	2020-08-09T14:50:46	Fix column number accounting in xmlParse*NameAndCompare Thanks to Frederic Vancraeyveldt for the report.
438e595a	2020-08-09T14:43:53	Stop counting nbChars in parser context The value was inaccurate and never used.
f6a9541f	2020-08-09T14:29:35	Remove unneeded progress checks in HTML parser The HTML parser should now be guaranteed to make progress, so the checks became unnecessary.
9de7b94d	2020-08-08T20:37:30	Use strcmp when fuzzing This should improve data-flow-guided fuzzing.
10a07948	2020-08-08T17:46:11	Fix XPath fuzzer
6c128fd5	2020-06-05T13:43:45	Fuzz XInclude engine
50f06b3e	2020-08-07T21:54:27	Fix out-of-bounds read with 'xmllint --htmlout' Make sure that truncated UTF-8 sequences don't cause an out-of-bounds array access. Thanks to @SuhwanSong and the Agency for Defense Development (ADD) for the report. Fixes #178.
1abf2967	2020-08-06T17:51:57	Fix exponential runtime and memory in xi:fallback processing When creating XML_XINCLUDE_START nodes, the children of the original xi:include node must be freed, otherwise fallback content is copied twice, doubling runtime and memory consumption for each nested xi:fallback/xi:include pair. Found with libFuzzer.
11b57459	2020-08-07T18:39:19	Don't process siblings of root in xmlXIncludeProcess xmlXIncludeDoProcess would follow the siblings of the tree root and also expand these nodes. When using an XML reader, this could lead to siblings of the current node being expanded without having been parsed completely.
0f9817c7	2020-06-10T16:34:52	Don't recurse into xi:include children in xmlXIncludeDoProcess Otherwise, nested xi:include nodes might result in a use-after-free if XML_PARSE_NOXINCNODE is specified. Found with libFuzzer and ASan.
5725c115	2020-06-10T15:11:40	Fix memory leak in xmlXIncludeIncludeNode error paths Found with libFuzzer and ASan.
ad26a60f	2020-08-06T13:20:01	Add XPath and XPointer fuzzer

de1b51ed

2021-02-22T12:25:29

Improve HTML fuzzer stability Call htmlInitAutoClose during fuzzer initialization to fix stability issue. Leave a note concerning problems with this function.

09320f05

2021-02-21T14:26:40

Add CI for MSVC x86

dcb80b92

2021-02-20T20:30:43

Fix slow parsing of HTML with encoding errors Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.

02bee4c4

2021-02-02T22:27:52

Add a flag to not output anything when xmllint succeeded

4defa2c2

2021-02-12T09:39:38

Fix warnings in libxml.m4 with autoconf 2.70+. Closes #219.

cbe1212d

2021-02-09T17:07:21

Fix null deref introduced with previous commit Found by OSS-Fuzz.

01411e7c

2021-02-08T20:58:32

Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.

07920b43

2021-01-26T05:42:48

Add the copy of type from original xmlDoc in xmlCopyDoc() A bug related to php DOMDocument: https://bugs.php.net/bug.php?id=80665 When copy/clone an html document, the xmlDoc->type goes from XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.

2065d340

2021-02-05T23:40:18

Add CI for CMake on MSVC

afad3721

2021-01-31T09:53:56

parser.c: shrink the input buffer when appropriate Fixes GNOME/libxml2#200 Also see discussions at: - GNOME/libxml2#192 - https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e - https://github.com/sparklemotion/nokogiri/issues/2132

ec808a44

2021-02-07T13:57:49

Speed up HTML fuzzer htmlDocDumpMemory uses the "HTML" encoding if no other encoding was specified in the source HTML. This encoding can be extremely slow because of an inefficiency in htmlEntityValueLookup. Stop encoding the output for now.

e6495e47

2021-02-07T13:38:01

Remove unused encoding parameter of HTML output functions The encoding string is unused. Encodings are set by way of the output buffer.

954696e7

2021-02-07T13:23:09

Fix infinite loop in HTML parser introduced with recent commits Check for XML_PARSER_EOF to avoid an infinite loop introduced with recent changes to the HTML push parser. Found by OSS-Fuzz.

acb35667

2021-02-03T13:48:40

Fix quadratic runtime when parsing CDATA sections Use optimized concatenation for CDATA sections in addition to normal text. This also affects HTML script content. Found by OSS-Fuzz.

f93ca3e1

2021-01-15T17:53:27

Update minimum required CMake version

00487289

2020-12-31T16:34:25

Add variables for configured options to CMake config files

95519737

2020-12-31T13:41:19

Check if variables exist when defining targets

c26e4525

2020-12-31T13:18:14

Check if target exists when reading target properties

ec119875

2020-12-30T14:40:43

Add xmlcatalog target and definition to config files

2377a312

2020-12-30T14:40:04

Remove include directories for link-only dependencies

26835480

2020-12-30T14:28:24

Fix ICU build in CMake

296ab61e

2020-11-19T22:06:36

Configure pkgconfig, xml2-config, and xml2Conf.sh file

79301d3d

2020-12-18T12:50:21

Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.

45da175c

2020-12-18T12:14:52

Fix memory leak in xmlParseElementMixedContentDecl Free parsed content if malloc fails to avoid a memory leak. Found with libFuzzer.

1d73f07d

2020-12-18T00:55:00

Fix null deref in xmlStringGetNodeList Check for malloc failure to avoid null deref. Found with libFuzzer.

e2b975c3

2020-12-18T00:50:34

Handle malloc failures in fuzzing code Avoid misdiagnosis in OOM situations.

a67b63d1

2020-10-11T14:15:37

use new htmlParseLookupCommentEnd to find comment ends Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment

29f5d20e

2020-08-03T17:36:05

htmlParseComment: treat `--!>` as if it closed the comment See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment

e28d9347

2020-08-04T14:53:19

add test coverage for incorrectly-closed comments this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.

9086988f

2020-12-16T15:41:52

Enforce maximum length of fuzz input Remove the libfuzzer max_len option which doesn't apply to other fuzzing engines. Enforce the maximum length directly in the fuzz targets. For the xml target, lower the maximum when expanding entities to avoid timeout and OOM errors.

1fe38530

2020-12-16T15:27:13

Remove temporary members from struct _xmlXPathContext These values are hardcoded now and the struct members, while public, were recently introduced and never part of an official release.

8ca3a59b

2020-12-15T20:14:28

Fix integer overflow in xmlSchemaGetParticleTotalRangeMin The function is only used once and its return value is only checked for zero. Disable the function like its Max counterpart and add an implementation for the special case. Found by OSS-Fuzz.

649d02ea

2020-12-07T20:19:53

encoding: fix memleak in xmlRegisterCharEncodingHandler() The return type of xmlRegisterCharEncodingHandler() is void. The invoker cannot determine whether xmlRegisterCharEncodingHandler() is executed successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the "handler" is not added to the array "handlers". As a result, the memory of "handler" cannot be managed and released: memory leakage. so add "xmlfree(handler)" to fix memory leakage on the failure branch of xmlRegisterCharEncodingHandler(). Reported-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>

cb7a572b

2020-12-07T20:17:34

xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val" The xmlSchemaGetFacetValueAsUlong() API is an external API. The validity of external input parameters must be strictly verified. Before accessing "facet->val->value", we need check whether "facet->val" is a null pointer. Signed-off-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>

84b76d99

2020-12-06T17:26:23

Update CMake config files

d0ccb3a6

2020-12-06T17:25:52

Add xmlcatalog and xmllint to CMake export

acdc2ff3

2020-06-04T23:02:08

Simplify xmlexports.h All the compiler switches essentially set the same macros. The only exception was MSVC which omitted the "extern" keyword for exported variables. This in turn broke clang-cl. This commit rewrites and simplifies the whole header. Closes #163.

a218ff0e

2020-12-06T17:26:36

Fix null pointer deref in xmlXPtrRangeInsideFunction Found by OSS-Fuzz.

94c2e415

2020-12-06T16:38:00

Fix quadratic runtime in HTML push parser with null bytes Null bytes in the input stream do not necessarily signal an EOF condition. Check the stream pointers for EOF to avoid quadratic rescanning of input data. Note that the CUR_CHAR macro used in functions like htmlParseCharData calls htmlCurrentChar which translates null bytes. Found by OSS-Fuzz.

1c4f9a6d

2020-11-25T18:01:51

Require dependencies based on enabled CMake options

faea2fa9

2020-11-21T01:21:56

Avoid quadratic checking of identity-constraints key/unique/keyref schema attributes currently use qudratic loops to check their various constraints (that keys are unique and that keyrefs refer to existing keys). That becomes extremely slow if there are many elements with keys. This happens in the wild with e.g. the OVAL XML descriptions of security patches. You need the openscap schemata, and then an example xml file: % zypper in openscap-utils % wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml % time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 16m59,857s user 16m55,787s sys 0m1,060s This patch makes libxml use a hash table to avoid the quadratic behaviour. The existing hash table only accepts strings as keys, so we're mostly reusing the canonical representation of key values to derive such strings (with the caveat given in a comment). The alternative would be to rework the hash table code to accept either numbers or free functions as hash workers, but the code is fast enough as is. With the patch we have this then: % time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 0m3,531s user 0m3,427s sys 0m0,103s So, a ~300x speedup. This patch survives 'make check' and 'make tests'.

8272db53

2020-11-28T22:54:40

Use NAMELINK_COMPONENT in CMake install

5c7bdbc9

2020-11-25T18:41:14

Add CMake files to EXTRA_DIST

7a62870a

2020-11-19T22:06:23

Add missing compile definition for static builds to CMake

e028d293

2020-11-19T17:58:46

Add CI for CMake on Linux and MinGW

b516ed18

2020-11-12T12:53:43

Fix building with ICU 68. ICU 68 no longer defines the TRUE macro. Closes #204.

ac5e9991

2020-11-10T15:42:36

Convert python/libxml.c to PY_SSIZE_T_CLEAN Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string length (int len) explicitly to Py_ssize_t when passing a string to a function call using PyObject_CallMethod() with the "s#" format.

f42a0524

2020-11-09T18:19:31

Build the Python extension with PY_SSIZE_T_CLEAN The Python extension module now uses Py_ssize_t rather than int for string lengths. This change makes the extension compatible with Python 3.10. Fixes #203.

0ace6c4d

2020-11-19T17:35:11

Add CI test for Python 3

7c06d99e

2020-10-27T11:29:20

Fix xmlURIEscape memory leaks. Found by running the fuzz/uri.c fuzzer under asan (internal Android bug 171610679). Always free `ret` when exiting on failure. I've moved the definition of NULLCHK down past where ret is always initialized to make it clear that this is safe. This patch also fixes the indentation of two of the NULLCHK call sites to make it more obvious that NULLCHK isn't `if`-like.

31c6ce3b

2020-11-09T17:55:44

Avoid call stack overflow with XML reader and recursive XIncludes Don't process XIncludes in the result of another inclusion to avoid infinite recursion resulting in a call stack overflow. This is something the XInclude engine shouldn't allow but correct handling of intra-document includes would require major changes. Found by OSS-Fuzz.

7d6837ba

2020-10-25T20:21:43

Fix caret in regexp character group Apply Per Hedeland's patch from https://bugzilla.gnome.org/show_bug.cgi?id=779751 Fixes #188.

8a85263f

2020-10-25T20:08:16

Add fuzzing dictionaries to EXTRA_DIST Also add static seed corpus for the URI fuzzer.

1bde1040

2020-10-25T20:02:23

Add 'fuzz' subdirectory to DIST_SUBDIRS Fixes #191.

c0c26ff2

2020-10-11T16:33:07

parser.c: xmlParseCharData peek behavior fixed wrt newlines Previously, xmlParseCharData and xmlParseComment would consider 0xA to be unhandleable when seen as the first byte of an input chunk, and fall back to xmlParseCharDataComplex and xmlParseCommentComplex, which have different memory and performance characteristics. Fixes GNOME/libxml2#192

b46016b8

2020-10-17T18:03:09

Allow port numbers up to INT_MAX Also return an error on overflow.

46837d47

2020-10-03T01:13:35

Fix memory leaks in XPointer string-range function Found by OSS-Fuzz.

0b3c64d9

2020-09-29T18:08:37

Handle dumps of corrupted documents more gracefully Check parent pointers for NULL after the non-recursive rewrite of the serialization code. This avoids segfaults with corrupted documents which can apparently be seen with lxml, see issue #187.

847a3a11

2020-09-28T12:28:29

Fix use-after-free when XIncluding text from Reader The XML Reader can free text nodes coming from the XInclude engine before parsing has finished. Cache a copy of the text string, not the included node to avoid use after free. Found by OSS-Fuzz.

7929f057

2020-08-30T10:34:01

Fix SEGV in xmlSAXParseFileWithData Fixes #181.

e6ec58ec

2020-09-21T12:49:36

Fix null deref in XPointer expression error path Make sure that the filter functions introduced with commit c2f4da1a return node-sets without NULL pointers also in the error case. Found by OSS-Fuzz.

4e9cc18b

2020-09-21T11:00:23

Fix variable name in win32/configure.js Fix copy/paste error from previous commit.

5614c078

2020-09-21T10:55:45

Fix version parsing in win32/configure.js Adjust to configure.ac changes. Should fix #185.

8b88503a

2020-09-18T19:15:27

Don't call xmlXPathInit directly Call xmlInitParser which uses a lock to avoid race conditions. Fixes #184.

b215c270

2020-09-13T12:19:48

Fix cleanup of attributes in XML reader xml:id creates ID attributes even in documents without a DTD, so the check in xmlTextReaderFreeProp must be changed to avoid use after free. Found by OSS-Fuzz.

f0fd1b67

2020-08-26T00:16:38

Limit size of free lists in XML reader when fuzzing Keeping objects on a free list can hide memory errors. Only allow a single node on free lists used by the XML reader when fuzzing. This should hide fewer errors while still exercising the free list logic.

ba589adc

2020-08-25T23:50:39

Fix double free in XML reader with XIncludes An XInclude with empty fallback could lead to a double free in xmlTextReaderRead. Found by OSS-Fuzz.

6f1470a5

2020-08-25T18:50:45

Hardcode maximum XPath recursion depth Always limit nested functions calls to 5000. This avoids call stack overflows with deeply nested expressions. The expression parser produces about 10 nested function calls when parsing a subexpression in parentheses, so the effective nesting limit is about 500 which should be more than enough. Use a lower limit when fuzzing to account for increased memory usage when using sanitizers.

8c3ef083

2020-08-24T23:17:34

Pass URL of main entity in XML fuzzer

0d5f3710

2020-08-24T16:28:54

Consolidate seed corpus generation Implement file handling in C to speed up corpus generation.

0d9da029

2020-08-24T03:16:25

Test fuzz targets with dummy driver Run fuzz targets with files in seed corpus during test.

3fcf3193

2020-08-22T00:43:18

Fix regression introduced with commit d88df4b Revert the commit and use a different approach. Found by OSS-Fuzz.

87d20b55

2020-08-19T13:52:08

Fix regression introduced with commit 74dcc10b The code wasn't dead after all, but I can see no reason in delaying the XPointer evaluation. This could lead to nodes included earlier appearing in XPointer results.

fbb7fa9a

2020-08-19T13:13:20

Fix memory leak in xmlXIncludeAddNode error paths Found by OSS-Fuzz.

19cae17f

2020-08-19T13:07:28

Revert "Fix quadratic runtime in xi:fallback processing" This reverts commit 27119ec33c9f6b9830efa1e0da0acfa353dfa55a. Not copying fallback children didn't fix up namespaces and could lead to use-after-free errors. Found by OSS-Fuzz.

d63cfeca

2020-08-17T15:40:06

Add TODO comment in xinclude.c Add some thoughts on the major remaining problems with the XInclude implementation.

804c5297

2020-08-17T03:37:18

Stop using maxParserDepth in xpath.c Only use a single maxDepth value.

74dcc10b

2020-08-17T03:24:56

Remove dead code in xinclude.c 'doc' is checked for NULL in xmlXIncludeLoadDoc, so several code paths can be eliminated.

0ff52748

2020-08-17T02:54:28

Fix autotools warnings

2c747129

2020-08-17T00:54:12

Fix error reporting with xi:fallback When reporting errors, don't use href of xi:include if xi:fallback was used. I think this can only be reproduced with "xmllint --postvalid", see the original bug report: https://bugzilla.gnome.org/show_bug.cgi?id=152623

27119ec3

2020-08-17T00:05:19

Fix quadratic runtime in xi:fallback processing Copying the tree would lead to runtime quadratic in nested fallback depth, similar to naive string concatenation.

d88df4bd

2020-08-16T23:38:48

Fix corner case with empty xi:fallback xi:fallback could become empty after recursive expansion. Use a flag to track whether nodes should be skipped.

00a86d41

2020-08-16T23:38:00

Don't add formatting newlines to XInclude nodes

dba82a8c

2020-08-16T23:02:20

Fix XInclude regression introduced with recent commit The change to xmlXIncludeLoadFallback in commit 11b57459 could process already freed nodes if text nodes were merged after deleting nodes with an empty fallback. Found by OSS-Fuzz.

e1c2d0ad

2020-08-16T22:22:57

Fix memory leak in runtest.c

2b4769a6

2020-08-16T22:02:04

Make "xmllint --push --recovery" work

99fc048d

2020-08-14T14:18:50

Don't use SAX1 if all element handlers are NULL Running xmllint with "--sax --noout" installs a SAX2 handler with all callbacks set to NULL. In this case or similar situations, we don't want to switch to SAX1 parsing.

c1ba6f54

2020-08-15T18:32:29

Revert "Do not URI escape in server side includes" This reverts commit 960f0e275616cadc29671a218d7fb9b69eb35588. This commit introduced - an infinite loop, found by OSS-Fuzz, which could be easily fixed. - an algorithm with quadratic runtime - a security issue, see https://bugzilla.gnome.org/show_bug.cgi?id=769760 A better approach is to add an option not to escape URLs at all which libxml2 should have possibly done in the first place.

b82fa3dd

2020-08-09T14:50:46

Fix column number accounting in xmlParse*NameAndCompare Thanks to Frederic Vancraeyveldt for the report.

438e595a

2020-08-09T14:43:53

Stop counting nbChars in parser context The value was inaccurate and never used.

f6a9541f

2020-08-09T14:29:35

Remove unneeded progress checks in HTML parser The HTML parser should now be guaranteed to make progress, so the checks became unnecessary.

9de7b94d

2020-08-08T20:37:30

Use strcmp when fuzzing This should improve data-flow-guided fuzzing.

10a07948

2020-08-08T17:46:11

Fix XPath fuzzer

6c128fd5

2020-06-05T13:43:45

Fuzz XInclude engine

50f06b3e

2020-08-07T21:54:27

Fix out-of-bounds read with 'xmllint --htmlout' Make sure that truncated UTF-8 sequences don't cause an out-of-bounds array access. Thanks to @SuhwanSong and the Agency for Defense Development (ADD) for the report. Fixes #178.

1abf2967

2020-08-06T17:51:57

Fix exponential runtime and memory in xi:fallback processing When creating XML_XINCLUDE_START nodes, the children of the original xi:include node must be freed, otherwise fallback content is copied twice, doubling runtime and memory consumption for each nested xi:fallback/xi:include pair. Found with libFuzzer.

11b57459

2020-08-07T18:39:19

Don't process siblings of root in xmlXIncludeProcess xmlXIncludeDoProcess would follow the siblings of the tree root and also expand these nodes. When using an XML reader, this could lead to siblings of the current node being expanded without having been parsed completely.

0f9817c7

2020-06-10T16:34:52

Don't recurse into xi:include children in xmlXIncludeDoProcess Otherwise, nested xi:include nodes might result in a use-after-free if XML_PARSE_NOXINCNODE is specified. Found with libFuzzer and ASan.

5725c115

2020-06-10T15:11:40

Fix memory leak in xmlXIncludeIncludeNode error paths Found with libFuzzer and ASan.

ad26a60f

2020-08-06T13:20:01

Add XPath and XPointer fuzzer

kc3-lang/libxml2

Log