kmx git

Commit	Date	Message
19cae17f	2020-08-19T13:07:28	Revert "Fix quadratic runtime in xi:fallback processing" This reverts commit 27119ec33c9f6b9830efa1e0da0acfa353dfa55a. Not copying fallback children didn't fix up namespaces and could lead to use-after-free errors. Found by OSS-Fuzz.
d63cfeca	2020-08-17T15:40:06	Add TODO comment in xinclude.c Add some thoughts on the major remaining problems with the XInclude implementation.
804c5297	2020-08-17T03:37:18	Stop using maxParserDepth in xpath.c Only use a single maxDepth value.
74dcc10b	2020-08-17T03:24:56	Remove dead code in xinclude.c 'doc' is checked for NULL in xmlXIncludeLoadDoc, so several code paths can be eliminated.
0ff52748	2020-08-17T02:54:28	Fix autotools warnings
d88df4bd	2020-08-16T23:38:48	Fix corner case with empty xi:fallback xi:fallback could become empty after recursive expansion. Use a flag to track whether nodes should be skipped.
00a86d41	2020-08-16T23:38:00	Don't add formatting newlines to XInclude nodes
dba82a8c	2020-08-16T23:02:20	Fix XInclude regression introduced with recent commit The change to xmlXIncludeLoadFallback in commit 11b57459 could process already freed nodes if text nodes were merged after deleting nodes with an empty fallback. Found by OSS-Fuzz.
e1c2d0ad	2020-08-16T22:22:57	Fix memory leak in runtest.c
2c747129	2020-08-17T00:54:12	Fix error reporting with xi:fallback When reporting errors, don't use href of xi:include if xi:fallback was used. I think this can only be reproduced with "xmllint --postvalid", see the original bug report: https://bugzilla.gnome.org/show_bug.cgi?id=152623
2b4769a6	2020-08-16T22:02:04	Make "xmllint --push --recovery" work
99fc048d	2020-08-14T14:18:50	Don't use SAX1 if all element handlers are NULL Running xmllint with "--sax --noout" installs a SAX2 handler with all callbacks set to NULL. In this case or similar situations, we don't want to switch to SAX1 parsing.
27119ec3	2020-08-17T00:05:19	Fix quadratic runtime in xi:fallback processing Copying the tree would lead to runtime quadratic in nested fallback depth, similar to naive string concatenation.
c1ba6f54	2020-08-15T18:32:29	Revert "Do not URI escape in server side includes" This reverts commit 960f0e275616cadc29671a218d7fb9b69eb35588. This commit introduced - an infinite loop, found by OSS-Fuzz, which could be easily fixed. - an algorithm with quadratic runtime - a security issue, see https://bugzilla.gnome.org/show_bug.cgi?id=769760 A better approach is to add an option not to escape URLs at all which libxml2 should have possibly done in the first place.
b82fa3dd	2020-08-09T14:50:46	Fix column number accounting in xmlParse*NameAndCompare Thanks to Frederic Vancraeyveldt for the report.
438e595a	2020-08-09T14:43:53	Stop counting nbChars in parser context The value was inaccurate and never used.
f6a9541f	2020-08-09T14:29:35	Remove unneeded progress checks in HTML parser The HTML parser should now be guaranteed to make progress, so the checks became unnecessary.
9de7b94d	2020-08-08T20:37:30	Use strcmp when fuzzing This should improve data-flow-guided fuzzing.
10a07948	2020-08-08T17:46:11	Fix XPath fuzzer
6c128fd5	2020-06-05T13:43:45	Fuzz XInclude engine
50f06b3e	2020-08-07T21:54:27	Fix out-of-bounds read with 'xmllint --htmlout' Make sure that truncated UTF-8 sequences don't cause an out-of-bounds array access. Thanks to @SuhwanSong and the Agency for Defense Development (ADD) for the report. Fixes #178.
1abf2967	2020-08-06T17:51:57	Fix exponential runtime and memory in xi:fallback processing When creating XML_XINCLUDE_START nodes, the children of the original xi:include node must be freed, otherwise fallback content is copied twice, doubling runtime and memory consumption for each nested xi:fallback/xi:include pair. Found with libFuzzer.
11b57459	2020-08-07T18:39:19	Don't process siblings of root in xmlXIncludeProcess xmlXIncludeDoProcess would follow the siblings of the tree root and also expand these nodes. When using an XML reader, this could lead to siblings of the current node being expanded without having been parsed completely.
0f9817c7	2020-06-10T16:34:52	Don't recurse into xi:include children in xmlXIncludeDoProcess Otherwise, nested xi:include nodes might result in a use-after-free if XML_PARSE_NOXINCNODE is specified. Found with libFuzzer and ASan.
5725c115	2020-06-10T15:11:40	Fix memory leak in xmlXIncludeIncludeNode error paths Found with libFuzzer and ASan.
ad26a60f	2020-08-06T13:20:01	Add XPath and XPointer fuzzer
956534e0	2020-08-04T19:27:13	Check for custom free function in global destructor Calling a custom deallocation function in the global destructor could cause all kinds of unexpected problems. See for example https://github.com/sparklemotion/nokogiri/issues/2059 Only clean up if memory is managed with malloc/free.
8e7c20a1	2020-08-03T17:30:41	Fix integer overflow when comparing schema dates Found by OSS-Fuzz.
905820a4	2020-07-12T22:59:39	Update fuzzing code - Shorten timeouts - Align options from Makefile and options files - Add section headers to Makefile - Skip invalid UTF-8 in regexp fuzzer - Update regexp.dict - Generate HTML seed corpus in correct format
68eadabd	2020-07-11T21:32:10	Fix exponential runtime in xmlFARecurseDeterminism In order to prevent visiting a state twice, states must be marked as visited for the whole duration of graph traversal because states might be reached by different paths. Otherwise state graphs like the following can lead to exponential runtime: ->O-->O-->O-->O-->O-> \ / \ / \ / \ / O O O O Reset the "visited" flag only after the graph was traversed. xmlFAComputesDeterminism still has massive performance problems when handling fuzzed input. By design, it has quadratic time complexity in the number of reachable states. Some issues might also stem from redundant epsilon transitions. With this fix, fuzzing regexes with a maximum length of 100 becomes feasible at least. Found with libFuzzer.
1a360c1c	2020-07-29T00:39:15	More *NodeDumpOutput fixes When leaving nodes, restrict more operations to XML_ELEMENT_NODEs.
7b2e5172	2020-07-28T21:52:55	Fix *NodeDumpOutput functions Only output end tag for elements. Should fix serialization of document fragments.
dc6f0092	2020-07-28T19:07:19	Make xmlNodeDumpOutputInternal non-recursive Fixes stack overflow with deeply nested documents.
5330153d	2020-07-28T18:33:50	Make xhtmlNodeDumpOutput non-recursive Fixes stack overflow with deeply nested documents.
b79ab6e6	2020-07-28T02:42:37	Make htmlNodeDumpFormatOutput non-recursive Fixes stack overflow with deeply nested HTML documents. Found by OSS-Fuzz.
21ca8829	2020-07-25T17:57:29	Don't try to handle namespaces when building HTML documents Don't try to resolve namespace in xmlSAX2StartElement when parsing HTML documents. This useless operation could slow down the parser considerably. Found by OSS-Fuzz.
93ce33c2	2020-07-23T17:34:08	Fix several quadratic runtime issues in HTML push parser Fix a few remaining cases where the HTML push parser would scan more content during lookahead than being parsed later. Make sure that htmlParseDocTypeDecl consumes all content up to the final '>' in case of errors. The old comment said "We shouldn't try to resynchronize", but ignoring invalid content is also what the HTML5 spec mandates. Likewise, make htmlParseEndTag skip to the final '>' in invalid end tags even if not in recovery mode. This is probably the most visible change in practice and leads to different output for some tests but is also more in line with HTML5. Make sure that htmlParsePI and htmlParseComment don't abort if invalid characters are encountered but log an error and ignore the character. Change some other end-of-buffer checks to test for a zero byte instead of relying on IS_CHAR. Fix usage of IS_CHAR macro in htmlParseScript.
10d09472	2020-07-23T19:16:21	Fix .gitattributes The files in 'test' and 'result' have mixed line endings, so disable end-of-line conversion.
173a0830	2020-07-22T23:15:35	Fix quadratic runtime when push parsing HTML start tags Make sure that htmlParseStartTag doesn't terminate on characters for which IS_CHAR_CH is false like control chars. In htmlParseTryOrFinish, only switch to START_TAG if the next character starts a valid name. Otherwise, htmlParseStartTag might return without consuming all characters up to the final '>'. Found by OSS-Fuzz.
0e5c4fec	2020-07-13T15:20:45	Reset XML parser input before reporting errors Apply changes to htmlParseChunk() in 13ba5b61 and 3f18e748 to xmlParseChunk().
6995eed0	2020-07-19T13:54:52	Fix quadratic runtime when push parsing HTML entity refs The HTML push parser would look ahead for characters in "; >/" to terminate an entity reference but actual parsing could stop earlier, potentially resulting in quadratic runtime. Parse char data and references alternately in htmlParseTryOrFinish and only look ahead once for a terminating '<' character. Found by OSS-Fuzz.
8e219b15	2020-07-12T21:43:44	Fix HTML push parser lookahead The parsing rules when looking for terminating chars or sequences in the push parser differed from the actual parsing code. This could result in the lookahead to overshoot and data being rescanned, potentially leading to quadratic runtime. Comments must never be handled during lookahead. Attribute values must only be skipped for start tags and doctype declarations, not for end tags, comments, PIs and script content.
e050062c	2020-07-15T14:38:55	Make htmlCurrentChar always translate U+0000 The general assumption is that htmlCurrentChar only returns 0 if the end of the input buffer is reached. The UTF-8 path already logged an error if a zero byte U+0000 was found and returned a space character instead. Make the ASCII code path do the same. htmlParseTryOrFinish skips zero bytes at the beginning of a buffer, so even if 0 was returned from htmlCurrentChar, the push parser would make progress. But rescanning the input could cause performance problems. The pull parser would abort parsing and now handles zero bytes in ASCII mode the same way as the push parser or as in UTF-8 mode. It would be better to return the replacement character U+FFFD instead, but some of the client code assumes that the UTF-8 length of input and output matches.
dfd4e330	2020-07-15T14:22:08	Rework control flow in htmlCurrentChar Don't call xmlCurrentChar after switching encodings. Rearrange code blocks and fall through to normal UTF-8 handling.
922bebcc	2020-07-15T14:20:42	Make 'xmllint --html --push -' read from stdin
1493130e	2020-07-15T12:54:25	Fix UTF-8 decoder in HTML parser Reject sequences starting with a continuation byte as well as overlong sequences like the XML parser. Also fixes an infinite loop in connection with previous commit 50078922 since htmlCurrentChar would return 0 even if not at the end of the buffer. Found by OSS-Fuzz.
beb7d71a	2020-07-13T12:41:19	Remove misleading comments in xpath.c Fixes #169
50078922	2020-07-12T20:28:47	Fix quadratic runtime when parsing HTML script content If htmlParseScript returns upon hitting an invalid character, htmlParseLookupSequence will be called again with checkIndex reset to zero, potentially resulting in quadratic runtime. Make sure that htmlParseScript consumes all input in one go and simply skips over invalid characters similar to htmlParseCharDataInternal. Found by OSS-Fuzz.
d6761e70	2020-07-13T11:59:45	Update to Devhelp index file format version 2 Fixes #89
d514e2bd	2020-07-12T18:42:49	Set project language to C
5ddf02f2	2020-06-07T16:06:17	Update config.h.cmake.in
8bec210d	2020-06-04T17:37:21	Add variable for working directory of XML Conformance Test Suite
270e1655	2020-06-04T14:45:48	Add additional tests and XML Conformance Test Suite
e6ba4bd7	2020-06-04T11:58:04	Add command line option for temp directory in runtest
40e7ceaa	2020-06-04T11:57:28	Ensure LF line endings for test files
9ecf5ad6	2020-06-04T00:16:15	Enable runtests and testThreads
3f18e748	2020-07-11T14:34:57	Reset HTML parser input before reporting error Avoid use-after-free, similar to 13ba5b61. Also make sure that xmlBufSetInputBaseCur sets valid pointers in case of buffer errors. Found by OSS-Fuzz.
3da8d947	2020-07-09T16:08:38	Fix more quadratic runtime issues in HTML push parser Make sure that checkIndex is set when returning without match from inside a comment. Also track parser state in htmlParseLookupChars. Found by OSS-Fuzz.
741b0d0a	2020-07-07T12:54:34	Fix regression introduced with 477c7f6a The 'inSubset' member is actually used by the SAX2 handlers. Store extra parser state in 'hasPErefs'.
fc842f6e	2020-07-06T15:22:12	Limit regexp nesting depth Enforce a maximum nesting depth of 50 for regular expressions. Avoids stack overflows with deeply nested regexes. Found by OSS-Fuzz.
1e41e4fa	2020-06-30T02:43:57	Fix return values and documentation in encoding.c Make xmlEncInputChunk and xmlEncOutputChunk return 0 on success and never a positive value. Make xmlCharEncFirstLineInt, xmlCharEncFirstLineInt and xmlCharEncOutFunc return the number of bytes written.
6b4717d6	2020-07-06T12:36:27	Add regexp regression tests - Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup <https://bugzilla.gnome.org/show_bug.cgi?id=757711> - Bug 783015 - Integer-overflow in xmlFAParseQuantExact <https://bugzilla.gnome.org/show_bug.cgi?id=783015> (Regexptests): Add support for checking stderr output when running regexp tests. This makes it possible to check in test cases that fail and not see false-positive error output when running the tests. Unlike other libxml2 test suites, if there is no stderr output, no *.err file needs to be created.
477c7f6a	2020-06-28T15:54:23	Fix quadratic runtime in HTML parser Commit eeb99329 removed an important optimization avoiding quadratic runtime when repeatedly scanning the input buffer for terminating characters in the HTML push parser. The related bug is https://bugzilla.gnome.org/show_bug.cgi?id=444994 Make sure that ctxt->checkIndex is always written and store additional parser state in ctxt->inSubset which is unused in the HTML parser. Found by OSS-Fuzz.
f8329fdc	2020-07-02T11:51:31	Report error for invalid regexp quantifiers
13ba5b61	2020-06-28T13:16:46	Reset HTML parser input before reporting encoding error If charset conversion fails, reset the input pointers before reporting the error and bailing out. Otherwise, the input pointers are left in an invalid state which could lead to use-after-free and other memory errors. Similar to f9e7997e. Found by OSS-Fuzz.
1e7851b5	2020-06-25T12:17:50	Fix integer overflow in xmlFAParseQuantExact Found by OSS-Fuzz.
84bab955	2020-06-24T20:07:32	Fix return value of xmlC14NDocDumpMemory Make sure to return -1 in case of buffer errors. Fixes #174.
43a8836c	2020-05-31T18:46:21	Fix rebuilding docs, by hiding __attribute__((...)) behind a macro. When enabled via `./configure --enable-rebuild-docs`, `make -C doc libxml2-api.xml` will invoke apibuild.py to rebuild libxml2-api.xml from the sources. But the code added in 9fa3200cb366c726f7c8ef234282603bb9e8816d made it error out with ``` Parsing ../parser.c Parse Error: parsing type : expecting a name ('Got token ', ('sep', '(')) ('Last token: ', ('sep', '(')) ('Token queue: ', [('name', 'destructor'), ('sep', ')'), ('sep', ')')]) ('Line 14689 end: ', '') ```
9f42f6ba	2020-06-24T15:33:38	Don't follow next pointer on documents in xmlXPathRunStreamEval RVTs from libxslt are document nodes which are linked using the 'next' pointer. These pointers must never be used to navigate the document tree. Otherwise, random content from other RVTs could be returned when evaluating XPath expressions. It's interesting that this seemingly long-standing bug wasn't discovered earlier. This issue could also cause severe performance degradation. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/37
c0440868	2020-06-22T13:08:11	Copy xs:duration parser from libexslt The duration parser in libexslt checks for integer overflows.
18425d3a	2020-06-21T19:14:23	Fix integer overflow in _xmlSchemaParseGYear Found with libFuzzer and UBSan.
070d635e	2020-06-21T16:26:38	Fix integer overflow when parsing {min,max}Occurs Clamp value to INT_MAX. Found with libFuzzer and UBSan.
50f18830	2020-06-21T15:21:45	Fix another memory leak in xmlSchemaValAtomicType Don't collapse language IDs twice. Found with libFuzzer and ASan.
eac1c7e2	2020-06-21T14:42:00	Fuzz target for XML Schemas This only tests the schema parser for now.
ffd31dbe	2020-06-21T12:14:19	Move entity recorder to fuzz.c
681f094e	2020-06-15T15:23:05	Fix unsigned integer overflow in htmlParseTryOrFinish Cast to signed type before subtraction to avoid unsigned integer overflow. Also use ptrdiff_t to avoid potential integer truncation. Found with libFuzzer and UBSan.
31ca4a72	2020-06-15T18:47:53	Fix integer overflow in htmlParseCharRef Fixes #115.
2f938203	2020-06-15T15:45:47	Fix undefined behavior in UTF16LEToUTF8 Don't perform arithmetic on null pointer. Found with libFuzzer and UBSan.
536f421d	2020-06-15T12:20:54	Fuzz target for HTML parser
a697ed1e	2020-06-15T14:49:22	Fix return value of xmlCharEncOutput Commit 407b393d introduced a regression caused by xmlCharEncOutput returning 0 in case of success instead of the number of bytes written. Always use its return value for nbchars in xmlOutputBufferWrite. Fixes #166.
af893a58	2020-06-11T16:08:16	Update GitLab CI container
a28f7d87	2020-06-10T13:41:13	Never expand parameter entities in text declaration When parsing the text declaration of external DTDs or entities, make sure that parameter entities are not expanded. This also fixes a memory leak in certain error cases. The change to xmlSkipBlankChars assumes that the parser state is maintained correctly when parsing external DTDs or parameter entities, and might expose bugs in the code that were hidden previously. Found by OSS-Fuzz.
487871b0	2020-06-10T13:23:43	Fix undefined behavior in xmlXPathTryStreamCompile &NULL[0] is undefined behavior.
e98150d4	2020-06-09T13:45:31	Add options file for xml fuzzer This will be picked up OSS-Fuzz, limiting the maximum input size to 80 KB and hopefully avoiding timeouts. Some of the timeouts seem to be related to our suboptimal handling of excessive entity expansion. The new fuzzers support external entities and make this problem even more prominent.
2af3c2a8	2020-06-08T12:49:51	Fix use-after-free with validating reader Just like IDs, IDREF attributes must be removed from the document's refs table when they're freed by a reader. This bug is often hidden because xmlAttr structs are reused and strings are stored in a dictionary unless XML_PARSE_NODICT is specified. Found by OSS-Fuzz.
00ed736e	2020-06-05T12:49:25	Add a couple of libFuzzer targets - XML fuzzer Currently tests the pull parser, push parser and reader, as well as serialization. Supports splitting fuzz data into multiple documents for things like external DTDs or entities. The seed corpus is built from parts of the test suite. - Regexp fuzzer Seed corpus was statically generated from test suite. - URI fuzzer Tests parsing and most other functions from uri.c.
2e8cc66d	2020-05-30T15:40:08	xmlParseBalancedChunkMemory must not be called with NULL doc There is no way to avoid memory leaks without a document to hold the namespace list.
a0a8059b	2020-05-30T15:33:03	Revert "Fix memory leak in xmlParseBalancedChunkMemoryRecover" This reverts commit 5a02583c7e683896d84878bd90641d8d9b0d0549. Fixes #161.
ff009f99	2020-05-30T15:32:25	Fix memory leak in xmlXIncludeLoadDoc error path Found by OSS-Fuzz.
a230b728	2020-04-10T19:22:07	win32: allow passing *FLAGS on command line nmake is a primitive tool, so this is a primitive implementation: append EXTRA_CFLAGS etc. variables. Command line variables should be appended to allow overriding flags set in the makefile. It doesn't work to pass in CFLAGS like in make because that always overrides the assignments in the makefile.
4f2aee18	2020-05-04T14:03:52	Make schema validation fail with multiple top-level elements Closes #126.
106757e8	2020-04-10T14:52:03	Guard new calls to xmlValidatePopElement in xml_reader.c Closes #154.
386fb276	2020-04-28T17:00:37	Add LIBXML_VALID_ENABLED to xmlreader There are already LIBXML_VALID_ENABLED in this file to guard against "--without-valid" at "./configure" step, but here they were missing.
e7ff2efc	2020-04-21T21:16:07	Configure file xmlwin32version.h.in on MSVC
e2f10494	2020-04-21T21:04:23	List headers individually
2a2c38f3	2020-04-21T00:53:12	Add CMake build files Closes #24.
9fa3200c	2020-03-31T23:18:25	Call xmlCleanupParser on ELF destruction Fixes #153.
e4fb3684	2020-02-28T12:48:14	Parenthesize Py<type>_Check() in ifs In C, if expressions should be parenthesized. PyLong_Check, PyUnicode_Check etc. happened to expand to a parenthesized expression before, but that's not API to rely on. Since Python 3.9.0a4 it needs to be parenthesized explicitly. Fixes https://gitlab.gnome.org/GNOME/libxml2/issues/149
20c60886	2020-03-08T17:19:42	Fix typos Resolves #133.
2a7b6684	2020-03-02T11:52:52	Disable LeakSanitizer The GitLab runner doesn't run in privileged mode anymore [1], at least for projects outside the GNOME group. Disable LeakSanitizer for now as it needs the ptrace capability. [1] https://gitlab.gnome.org/Infrastructure/Infrastructure/issues/251

19cae17f

2020-08-19T13:07:28

Revert "Fix quadratic runtime in xi:fallback processing" This reverts commit 27119ec33c9f6b9830efa1e0da0acfa353dfa55a. Not copying fallback children didn't fix up namespaces and could lead to use-after-free errors. Found by OSS-Fuzz.

d63cfeca

2020-08-17T15:40:06

Add TODO comment in xinclude.c Add some thoughts on the major remaining problems with the XInclude implementation.

804c5297

2020-08-17T03:37:18

Stop using maxParserDepth in xpath.c Only use a single maxDepth value.

74dcc10b

2020-08-17T03:24:56

Remove dead code in xinclude.c 'doc' is checked for NULL in xmlXIncludeLoadDoc, so several code paths can be eliminated.

0ff52748

2020-08-17T02:54:28

Fix autotools warnings

d88df4bd

2020-08-16T23:38:48

Fix corner case with empty xi:fallback xi:fallback could become empty after recursive expansion. Use a flag to track whether nodes should be skipped.

00a86d41

2020-08-16T23:38:00

Don't add formatting newlines to XInclude nodes

dba82a8c

2020-08-16T23:02:20

Fix XInclude regression introduced with recent commit The change to xmlXIncludeLoadFallback in commit 11b57459 could process already freed nodes if text nodes were merged after deleting nodes with an empty fallback. Found by OSS-Fuzz.

e1c2d0ad

2020-08-16T22:22:57

Fix memory leak in runtest.c

2c747129

2020-08-17T00:54:12

Fix error reporting with xi:fallback When reporting errors, don't use href of xi:include if xi:fallback was used. I think this can only be reproduced with "xmllint --postvalid", see the original bug report: https://bugzilla.gnome.org/show_bug.cgi?id=152623

2b4769a6

2020-08-16T22:02:04

Make "xmllint --push --recovery" work

99fc048d

2020-08-14T14:18:50

Don't use SAX1 if all element handlers are NULL Running xmllint with "--sax --noout" installs a SAX2 handler with all callbacks set to NULL. In this case or similar situations, we don't want to switch to SAX1 parsing.

27119ec3

2020-08-17T00:05:19

Fix quadratic runtime in xi:fallback processing Copying the tree would lead to runtime quadratic in nested fallback depth, similar to naive string concatenation.

c1ba6f54

2020-08-15T18:32:29

Revert "Do not URI escape in server side includes" This reverts commit 960f0e275616cadc29671a218d7fb9b69eb35588. This commit introduced - an infinite loop, found by OSS-Fuzz, which could be easily fixed. - an algorithm with quadratic runtime - a security issue, see https://bugzilla.gnome.org/show_bug.cgi?id=769760 A better approach is to add an option not to escape URLs at all which libxml2 should have possibly done in the first place.

b82fa3dd

2020-08-09T14:50:46

Fix column number accounting in xmlParse*NameAndCompare Thanks to Frederic Vancraeyveldt for the report.

438e595a

2020-08-09T14:43:53

Stop counting nbChars in parser context The value was inaccurate and never used.

f6a9541f

2020-08-09T14:29:35

Remove unneeded progress checks in HTML parser The HTML parser should now be guaranteed to make progress, so the checks became unnecessary.

9de7b94d

2020-08-08T20:37:30

Use strcmp when fuzzing This should improve data-flow-guided fuzzing.

10a07948

2020-08-08T17:46:11

Fix XPath fuzzer

6c128fd5

2020-06-05T13:43:45

Fuzz XInclude engine

50f06b3e

2020-08-07T21:54:27

Fix out-of-bounds read with 'xmllint --htmlout' Make sure that truncated UTF-8 sequences don't cause an out-of-bounds array access. Thanks to @SuhwanSong and the Agency for Defense Development (ADD) for the report. Fixes #178.

1abf2967

2020-08-06T17:51:57

Fix exponential runtime and memory in xi:fallback processing When creating XML_XINCLUDE_START nodes, the children of the original xi:include node must be freed, otherwise fallback content is copied twice, doubling runtime and memory consumption for each nested xi:fallback/xi:include pair. Found with libFuzzer.

11b57459

2020-08-07T18:39:19

Don't process siblings of root in xmlXIncludeProcess xmlXIncludeDoProcess would follow the siblings of the tree root and also expand these nodes. When using an XML reader, this could lead to siblings of the current node being expanded without having been parsed completely.

0f9817c7

2020-06-10T16:34:52

Don't recurse into xi:include children in xmlXIncludeDoProcess Otherwise, nested xi:include nodes might result in a use-after-free if XML_PARSE_NOXINCNODE is specified. Found with libFuzzer and ASan.

5725c115

2020-06-10T15:11:40

Fix memory leak in xmlXIncludeIncludeNode error paths Found with libFuzzer and ASan.

ad26a60f

2020-08-06T13:20:01

Add XPath and XPointer fuzzer

956534e0

2020-08-04T19:27:13

Check for custom free function in global destructor Calling a custom deallocation function in the global destructor could cause all kinds of unexpected problems. See for example https://github.com/sparklemotion/nokogiri/issues/2059 Only clean up if memory is managed with malloc/free.

8e7c20a1

2020-08-03T17:30:41

Fix integer overflow when comparing schema dates Found by OSS-Fuzz.

905820a4

2020-07-12T22:59:39

Update fuzzing code - Shorten timeouts - Align options from Makefile and options files - Add section headers to Makefile - Skip invalid UTF-8 in regexp fuzzer - Update regexp.dict - Generate HTML seed corpus in correct format

68eadabd

2020-07-11T21:32:10

Fix exponential runtime in xmlFARecurseDeterminism In order to prevent visiting a state twice, states must be marked as visited for the whole duration of graph traversal because states might be reached by different paths. Otherwise state graphs like the following can lead to exponential runtime: ->O-->O-->O-->O-->O-> \ / \ / \ / \ / O O O O Reset the "visited" flag only after the graph was traversed. xmlFAComputesDeterminism still has massive performance problems when handling fuzzed input. By design, it has quadratic time complexity in the number of reachable states. Some issues might also stem from redundant epsilon transitions. With this fix, fuzzing regexes with a maximum length of 100 becomes feasible at least. Found with libFuzzer.

1a360c1c

2020-07-29T00:39:15

More *NodeDumpOutput fixes When leaving nodes, restrict more operations to XML_ELEMENT_NODEs.

7b2e5172

2020-07-28T21:52:55

Fix *NodeDumpOutput functions Only output end tag for elements. Should fix serialization of document fragments.

dc6f0092

2020-07-28T19:07:19

Make xmlNodeDumpOutputInternal non-recursive Fixes stack overflow with deeply nested documents.

5330153d

2020-07-28T18:33:50

Make xhtmlNodeDumpOutput non-recursive Fixes stack overflow with deeply nested documents.

b79ab6e6

2020-07-28T02:42:37

Make htmlNodeDumpFormatOutput non-recursive Fixes stack overflow with deeply nested HTML documents. Found by OSS-Fuzz.

21ca8829

2020-07-25T17:57:29

Don't try to handle namespaces when building HTML documents Don't try to resolve namespace in xmlSAX2StartElement when parsing HTML documents. This useless operation could slow down the parser considerably. Found by OSS-Fuzz.

93ce33c2

2020-07-23T17:34:08

Fix several quadratic runtime issues in HTML push parser Fix a few remaining cases where the HTML push parser would scan more content during lookahead than being parsed later. Make sure that htmlParseDocTypeDecl consumes all content up to the final '>' in case of errors. The old comment said "We shouldn't try to resynchronize", but ignoring invalid content is also what the HTML5 spec mandates. Likewise, make htmlParseEndTag skip to the final '>' in invalid end tags even if not in recovery mode. This is probably the most visible change in practice and leads to different output for some tests but is also more in line with HTML5. Make sure that htmlParsePI and htmlParseComment don't abort if invalid characters are encountered but log an error and ignore the character. Change some other end-of-buffer checks to test for a zero byte instead of relying on IS_CHAR. Fix usage of IS_CHAR macro in htmlParseScript.

10d09472

2020-07-23T19:16:21

Fix .gitattributes The files in 'test' and 'result' have mixed line endings, so disable end-of-line conversion.

173a0830

2020-07-22T23:15:35

Fix quadratic runtime when push parsing HTML start tags Make sure that htmlParseStartTag doesn't terminate on characters for which IS_CHAR_CH is false like control chars. In htmlParseTryOrFinish, only switch to START_TAG if the next character starts a valid name. Otherwise, htmlParseStartTag might return without consuming all characters up to the final '>'. Found by OSS-Fuzz.

0e5c4fec

2020-07-13T15:20:45

Reset XML parser input before reporting errors Apply changes to htmlParseChunk() in 13ba5b61 and 3f18e748 to xmlParseChunk().

6995eed0

2020-07-19T13:54:52

Fix quadratic runtime when push parsing HTML entity refs The HTML push parser would look ahead for characters in "; >/" to terminate an entity reference but actual parsing could stop earlier, potentially resulting in quadratic runtime. Parse char data and references alternately in htmlParseTryOrFinish and only look ahead once for a terminating '<' character. Found by OSS-Fuzz.

8e219b15

2020-07-12T21:43:44

Fix HTML push parser lookahead The parsing rules when looking for terminating chars or sequences in the push parser differed from the actual parsing code. This could result in the lookahead to overshoot and data being rescanned, potentially leading to quadratic runtime. Comments must never be handled during lookahead. Attribute values must only be skipped for start tags and doctype declarations, not for end tags, comments, PIs and script content.

e050062c

2020-07-15T14:38:55

Make htmlCurrentChar always translate U+0000 The general assumption is that htmlCurrentChar only returns 0 if the end of the input buffer is reached. The UTF-8 path already logged an error if a zero byte U+0000 was found and returned a space character instead. Make the ASCII code path do the same. htmlParseTryOrFinish skips zero bytes at the beginning of a buffer, so even if 0 was returned from htmlCurrentChar, the push parser would make progress. But rescanning the input could cause performance problems. The pull parser would abort parsing and now handles zero bytes in ASCII mode the same way as the push parser or as in UTF-8 mode. It would be better to return the replacement character U+FFFD instead, but some of the client code assumes that the UTF-8 length of input and output matches.

dfd4e330

2020-07-15T14:22:08

Rework control flow in htmlCurrentChar Don't call xmlCurrentChar after switching encodings. Rearrange code blocks and fall through to normal UTF-8 handling.

922bebcc

2020-07-15T14:20:42

Make 'xmllint --html --push -' read from stdin

1493130e

2020-07-15T12:54:25

Fix UTF-8 decoder in HTML parser Reject sequences starting with a continuation byte as well as overlong sequences like the XML parser. Also fixes an infinite loop in connection with previous commit 50078922 since htmlCurrentChar would return 0 even if not at the end of the buffer. Found by OSS-Fuzz.

beb7d71a

2020-07-13T12:41:19

Remove misleading comments in xpath.c Fixes #169

50078922

2020-07-12T20:28:47

Fix quadratic runtime when parsing HTML script content If htmlParseScript returns upon hitting an invalid character, htmlParseLookupSequence will be called again with checkIndex reset to zero, potentially resulting in quadratic runtime. Make sure that htmlParseScript consumes all input in one go and simply skips over invalid characters similar to htmlParseCharDataInternal. Found by OSS-Fuzz.

d6761e70

2020-07-13T11:59:45

Update to Devhelp index file format version 2 Fixes #89

d514e2bd

2020-07-12T18:42:49

Set project language to C

5ddf02f2

2020-06-07T16:06:17

Update config.h.cmake.in

8bec210d

2020-06-04T17:37:21

Add variable for working directory of XML Conformance Test Suite

270e1655

2020-06-04T14:45:48

Add additional tests and XML Conformance Test Suite

e6ba4bd7

2020-06-04T11:58:04

Add command line option for temp directory in runtest

40e7ceaa

2020-06-04T11:57:28

Ensure LF line endings for test files

9ecf5ad6

2020-06-04T00:16:15

Enable runtests and testThreads

3f18e748

2020-07-11T14:34:57

Reset HTML parser input before reporting error Avoid use-after-free, similar to 13ba5b61. Also make sure that xmlBufSetInputBaseCur sets valid pointers in case of buffer errors. Found by OSS-Fuzz.

3da8d947

2020-07-09T16:08:38

Fix more quadratic runtime issues in HTML push parser Make sure that checkIndex is set when returning without match from inside a comment. Also track parser state in htmlParseLookupChars. Found by OSS-Fuzz.

741b0d0a

2020-07-07T12:54:34

Fix regression introduced with 477c7f6a The 'inSubset' member is actually used by the SAX2 handlers. Store extra parser state in 'hasPErefs'.

fc842f6e

2020-07-06T15:22:12

Limit regexp nesting depth Enforce a maximum nesting depth of 50 for regular expressions. Avoids stack overflows with deeply nested regexes. Found by OSS-Fuzz.

1e41e4fa

2020-06-30T02:43:57

Fix return values and documentation in encoding.c Make xmlEncInputChunk and xmlEncOutputChunk return 0 on success and never a positive value. Make xmlCharEncFirstLineInt, xmlCharEncFirstLineInt and xmlCharEncOutFunc return the number of bytes written.

6b4717d6

2020-07-06T12:36:27

Add regexp regression tests - Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup <https://bugzilla.gnome.org/show_bug.cgi?id=757711> - Bug 783015 - Integer-overflow in xmlFAParseQuantExact <https://bugzilla.gnome.org/show_bug.cgi?id=783015> (Regexptests): Add support for checking stderr output when running regexp tests. This makes it possible to check in test cases that fail and not see false-positive error output when running the tests. Unlike other libxml2 test suites, if there is no stderr output, no *.err file needs to be created.

477c7f6a

2020-06-28T15:54:23

Fix quadratic runtime in HTML parser Commit eeb99329 removed an important optimization avoiding quadratic runtime when repeatedly scanning the input buffer for terminating characters in the HTML push parser. The related bug is https://bugzilla.gnome.org/show_bug.cgi?id=444994 Make sure that ctxt->checkIndex is always written and store additional parser state in ctxt->inSubset which is unused in the HTML parser. Found by OSS-Fuzz.

f8329fdc

2020-07-02T11:51:31

Report error for invalid regexp quantifiers

13ba5b61

2020-06-28T13:16:46

Reset HTML parser input before reporting encoding error If charset conversion fails, reset the input pointers before reporting the error and bailing out. Otherwise, the input pointers are left in an invalid state which could lead to use-after-free and other memory errors. Similar to f9e7997e. Found by OSS-Fuzz.

1e7851b5

2020-06-25T12:17:50

Fix integer overflow in xmlFAParseQuantExact Found by OSS-Fuzz.

84bab955

2020-06-24T20:07:32

Fix return value of xmlC14NDocDumpMemory Make sure to return -1 in case of buffer errors. Fixes #174.

43a8836c

2020-05-31T18:46:21

Fix rebuilding docs, by hiding __attribute__((...)) behind a macro. When enabled via `./configure --enable-rebuild-docs`, `make -C doc libxml2-api.xml` will invoke apibuild.py to rebuild libxml2-api.xml from the sources. But the code added in 9fa3200cb366c726f7c8ef234282603bb9e8816d made it error out with ``` Parsing ../parser.c Parse Error: parsing type : expecting a name ('Got token ', ('sep', '(')) ('Last token: ', ('sep', '(')) ('Token queue: ', [('name', 'destructor'), ('sep', ')'), ('sep', ')')]) ('Line 14689 end: ', '') ```

9f42f6ba

2020-06-24T15:33:38

Don't follow next pointer on documents in xmlXPathRunStreamEval RVTs from libxslt are document nodes which are linked using the 'next' pointer. These pointers must never be used to navigate the document tree. Otherwise, random content from other RVTs could be returned when evaluating XPath expressions. It's interesting that this seemingly long-standing bug wasn't discovered earlier. This issue could also cause severe performance degradation. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/37

c0440868

2020-06-22T13:08:11

Copy xs:duration parser from libexslt The duration parser in libexslt checks for integer overflows.

18425d3a

2020-06-21T19:14:23

Fix integer overflow in _xmlSchemaParseGYear Found with libFuzzer and UBSan.

070d635e

2020-06-21T16:26:38

Fix integer overflow when parsing {min,max}Occurs Clamp value to INT_MAX. Found with libFuzzer and UBSan.

50f18830

2020-06-21T15:21:45

Fix another memory leak in xmlSchemaValAtomicType Don't collapse language IDs twice. Found with libFuzzer and ASan.

eac1c7e2

2020-06-21T14:42:00

Fuzz target for XML Schemas This only tests the schema parser for now.

ffd31dbe

2020-06-21T12:14:19

Move entity recorder to fuzz.c

681f094e

2020-06-15T15:23:05

Fix unsigned integer overflow in htmlParseTryOrFinish Cast to signed type before subtraction to avoid unsigned integer overflow. Also use ptrdiff_t to avoid potential integer truncation. Found with libFuzzer and UBSan.

31ca4a72

2020-06-15T18:47:53

Fix integer overflow in htmlParseCharRef Fixes #115.

2f938203

2020-06-15T15:45:47

Fix undefined behavior in UTF16LEToUTF8 Don't perform arithmetic on null pointer. Found with libFuzzer and UBSan.

536f421d

2020-06-15T12:20:54

Fuzz target for HTML parser

a697ed1e

2020-06-15T14:49:22

Fix return value of xmlCharEncOutput Commit 407b393d introduced a regression caused by xmlCharEncOutput returning 0 in case of success instead of the number of bytes written. Always use its return value for nbchars in xmlOutputBufferWrite. Fixes #166.

af893a58

2020-06-11T16:08:16

Update GitLab CI container

a28f7d87

2020-06-10T13:41:13

Never expand parameter entities in text declaration When parsing the text declaration of external DTDs or entities, make sure that parameter entities are not expanded. This also fixes a memory leak in certain error cases. The change to xmlSkipBlankChars assumes that the parser state is maintained correctly when parsing external DTDs or parameter entities, and might expose bugs in the code that were hidden previously. Found by OSS-Fuzz.

487871b0

2020-06-10T13:23:43

Fix undefined behavior in xmlXPathTryStreamCompile &NULL[0] is undefined behavior.

e98150d4

2020-06-09T13:45:31

Add options file for xml fuzzer This will be picked up OSS-Fuzz, limiting the maximum input size to 80 KB and hopefully avoiding timeouts. Some of the timeouts seem to be related to our suboptimal handling of excessive entity expansion. The new fuzzers support external entities and make this problem even more prominent.

2af3c2a8

2020-06-08T12:49:51

Fix use-after-free with validating reader Just like IDs, IDREF attributes must be removed from the document's refs table when they're freed by a reader. This bug is often hidden because xmlAttr structs are reused and strings are stored in a dictionary unless XML_PARSE_NODICT is specified. Found by OSS-Fuzz.

00ed736e

2020-06-05T12:49:25

Add a couple of libFuzzer targets - XML fuzzer Currently tests the pull parser, push parser and reader, as well as serialization. Supports splitting fuzz data into multiple documents for things like external DTDs or entities. The seed corpus is built from parts of the test suite. - Regexp fuzzer Seed corpus was statically generated from test suite. - URI fuzzer Tests parsing and most other functions from uri.c.

2e8cc66d

2020-05-30T15:40:08

xmlParseBalancedChunkMemory must not be called with NULL doc There is no way to avoid memory leaks without a document to hold the namespace list.

a0a8059b

2020-05-30T15:33:03

Revert "Fix memory leak in xmlParseBalancedChunkMemoryRecover" This reverts commit 5a02583c7e683896d84878bd90641d8d9b0d0549. Fixes #161.

ff009f99

2020-05-30T15:32:25

Fix memory leak in xmlXIncludeLoadDoc error path Found by OSS-Fuzz.

a230b728

2020-04-10T19:22:07

win32: allow passing *FLAGS on command line nmake is a primitive tool, so this is a primitive implementation: append EXTRA_CFLAGS etc. variables. Command line variables should be appended to allow overriding flags set in the makefile. It doesn't work to pass in CFLAGS like in make because that always overrides the assignments in the makefile.

4f2aee18

2020-05-04T14:03:52

Make schema validation fail with multiple top-level elements Closes #126.

106757e8

2020-04-10T14:52:03

Guard new calls to xmlValidatePopElement in xml_reader.c Closes #154.

386fb276

2020-04-28T17:00:37

Add LIBXML_VALID_ENABLED to xmlreader There are already LIBXML_VALID_ENABLED in this file to guard against "--without-valid" at "./configure" step, but here they were missing.

e7ff2efc

2020-04-21T21:16:07

Configure file xmlwin32version.h.in on MSVC

e2f10494

2020-04-21T21:04:23

List headers individually

2a2c38f3

2020-04-21T00:53:12

Add CMake build files Closes #24.

9fa3200c

2020-03-31T23:18:25

Call xmlCleanupParser on ELF destruction Fixes #153.

e4fb3684

2020-02-28T12:48:14

Parenthesize Py<type>_Check() in ifs In C, if expressions should be parenthesized. PyLong_Check, PyUnicode_Check etc. happened to expand to a parenthesized expression before, but that's not API to rely on. Since Python 3.9.0a4 it needs to be parenthesized explicitly. Fixes https://gitlab.gnome.org/GNOME/libxml2/issues/149

20c60886

2020-03-08T17:19:42

Fix typos Resolves #133.

2a7b6684

2020-03-02T11:52:52

Disable LeakSanitizer The GitLab runner doesn't run in privileged mode anymore [1], at least for projects outside the GNOME group. Disable LeakSanitizer for now as it needs the ptrace capability. [1] https://gitlab.gnome.org/Infrastructure/Infrastructure/issues/251

kc3-lang/libxml2

Log