|
de1b51ed
|
2021-02-22T12:25:29
|
|
Improve HTML fuzzer stability
Call htmlInitAutoClose during fuzzer initialization to fix stability
issue. Leave a note concerning problems with this function.
|
|
09320f05
|
2021-02-21T14:26:40
|
|
Add CI for MSVC x86
|
|
dcb80b92
|
2021-02-20T20:30:43
|
|
Fix slow parsing of HTML with encoding errors
Under certain circumstances, the HTML parser would try to guess and
switch input encodings multiple times, leading to slow processing of
documents with encoding errors. The repeated scanning of the input
buffer when guessing encodings could even lead to quadratic behavior.
The code htmlCurrentChar probably assumed that if there's an encoding
handler, it is guaranteed to produce valid UTF-8. This holds true in
general, but if the detected encoding was "UTF-8", the UTF8ToUTF8
encoding handler simply invoked memcpy without checking for invalid
UTF-8. This still must be fixed, preferably by not using this handler
at all.
Also leave a note that switching encodings twice seems impossible to
implement correctly. Add a check when handling UTF-8 encoding errors
in htmlCurrentChar to avoid this situation, even if encoders produce
invalid UTF-8.
Found by OSS-Fuzz.
|
|
02bee4c4
|
2021-02-02T22:27:52
|
|
Add a flag to not output anything when xmllint succeeded
|
|
4defa2c2
|
2021-02-12T09:39:38
|
|
Fix warnings in libxml.m4 with autoconf 2.70+.
Closes #219.
|
|
cbe1212d
|
2021-02-09T17:07:21
|
|
Fix null deref introduced with previous commit
Found by OSS-Fuzz.
|
|
01411e7c
|
2021-02-08T20:58:32
|
|
Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.
Note that some test cases declared
<!ENTITY lt "<">
But the XML spec clearly states that this is illegal:
> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.
Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.
As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
|
|
07920b43
|
2021-01-26T05:42:48
|
|
Add the copy of type from original xmlDoc in xmlCopyDoc()
A bug related to php DOMDocument:
https://bugs.php.net/bug.php?id=80665
When copy/clone an html document, the xmlDoc->type goes from
XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
|
|
2065d340
|
2021-02-05T23:40:18
|
|
Add CI for CMake on MSVC
|
|
afad3721
|
2021-01-31T09:53:56
|
|
parser.c: shrink the input buffer when appropriate
Fixes GNOME/libxml2#200
Also see discussions at:
- GNOME/libxml2#192
- https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e
- https://github.com/sparklemotion/nokogiri/issues/2132
|
|
ec808a44
|
2021-02-07T13:57:49
|
|
Speed up HTML fuzzer
htmlDocDumpMemory uses the "HTML" encoding if no other encoding was
specified in the source HTML. This encoding can be extremely slow
because of an inefficiency in htmlEntityValueLookup. Stop encoding
the output for now.
|
|
e6495e47
|
2021-02-07T13:38:01
|
|
Remove unused encoding parameter of HTML output functions
The encoding string is unused. Encodings are set by way of the output
buffer.
|
|
954696e7
|
2021-02-07T13:23:09
|
|
Fix infinite loop in HTML parser introduced with recent commits
Check for XML_PARSER_EOF to avoid an infinite loop introduced with
recent changes to the HTML push parser.
Found by OSS-Fuzz.
|
|
acb35667
|
2021-02-03T13:48:40
|
|
Fix quadratic runtime when parsing CDATA sections
Use optimized concatenation for CDATA sections in addition to normal
text. This also affects HTML script content.
Found by OSS-Fuzz.
|
|
f93ca3e1
|
2021-01-15T17:53:27
|
|
Update minimum required CMake version
|
|
00487289
|
2020-12-31T16:34:25
|
|
Add variables for configured options to CMake config files
|
|
95519737
|
2020-12-31T13:41:19
|
|
Check if variables exist when defining targets
|
|
c26e4525
|
2020-12-31T13:18:14
|
|
Check if target exists when reading target properties
|
|
ec119875
|
2020-12-30T14:40:43
|
|
Add xmlcatalog target and definition to config files
|
|
2377a312
|
2020-12-30T14:40:04
|
|
Remove include directories for link-only dependencies
|
|
26835480
|
2020-12-30T14:28:24
|
|
Fix ICU build in CMake
|
|
296ab61e
|
2020-11-19T22:06:36
|
|
Configure pkgconfig, xml2-config, and xml2Conf.sh file
|
|
79301d3d
|
2020-12-18T12:50:21
|
|
Fix timeout when handling recursive entities
Abort parsing early to avoid an almost infinite loop in certain error
cases involving recursive entities.
Found with libFuzzer.
|
|
45da175c
|
2020-12-18T12:14:52
|
|
Fix memory leak in xmlParseElementMixedContentDecl
Free parsed content if malloc fails to avoid a memory leak.
Found with libFuzzer.
|
|
1d73f07d
|
2020-12-18T00:55:00
|
|
Fix null deref in xmlStringGetNodeList
Check for malloc failure to avoid null deref.
Found with libFuzzer.
|
|
e2b975c3
|
2020-12-18T00:50:34
|
|
Handle malloc failures in fuzzing code
Avoid misdiagnosis in OOM situations.
|
|
a67b63d1
|
2020-10-11T14:15:37
|
|
use new htmlParseLookupCommentEnd to find comment ends
Note that the caret in error messages generated during comment parsing
may have moved by one byte.
See guidance provided on incorrectly-closed comments here:
https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
|
|
29f5d20e
|
2020-08-03T17:36:05
|
|
htmlParseComment: treat `--!>` as if it closed the comment
See guidance provided on incorrectly-closed comments here:
https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
|
|
e28d9347
|
2020-08-04T14:53:19
|
|
add test coverage for incorrectly-closed comments
this establishes the baseline behavior so that subsequent commits
which modify this behavior are clear about what's being changed.
|
|
9086988f
|
2020-12-16T15:41:52
|
|
Enforce maximum length of fuzz input
Remove the libfuzzer max_len option which doesn't apply to other
fuzzing engines. Enforce the maximum length directly in the fuzz
targets. For the xml target, lower the maximum when expanding entities
to avoid timeout and OOM errors.
|
|
1fe38530
|
2020-12-16T15:27:13
|
|
Remove temporary members from struct _xmlXPathContext
These values are hardcoded now and the struct members, while public,
were recently introduced and never part of an official release.
|
|
8ca3a59b
|
2020-12-15T20:14:28
|
|
Fix integer overflow in xmlSchemaGetParticleTotalRangeMin
The function is only used once and its return value is only checked for
zero. Disable the function like its Max counterpart and add an
implementation for the special case.
Found by OSS-Fuzz.
|
|
649d02ea
|
2020-12-07T20:19:53
|
|
encoding: fix memleak in xmlRegisterCharEncodingHandler()
The return type of xmlRegisterCharEncodingHandler() is void. The invoker
cannot determine whether xmlRegisterCharEncodingHandler() is executed
successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the
"handler" is not added to the array "handlers". As a result, the memory
of "handler" cannot be managed and released: memory leakage.
so add "xmlfree(handler)" to fix memory leakage on the failure branch of
xmlRegisterCharEncodingHandler().
Reported-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
|
|
cb7a572b
|
2020-12-07T20:17:34
|
|
xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val"
The xmlSchemaGetFacetValueAsUlong() API is an external API.
The validity of external input parameters must be strictly verified.
Before accessing "facet->val->value", we need check whether "facet->val" is
a null pointer.
Signed-off-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
|
|
84b76d99
|
2020-12-06T17:26:23
|
|
Update CMake config files
|
|
d0ccb3a6
|
2020-12-06T17:25:52
|
|
Add xmlcatalog and xmllint to CMake export
|
|
acdc2ff3
|
2020-06-04T23:02:08
|
|
Simplify xmlexports.h
All the compiler switches essentially set the same macros. The only
exception was MSVC which omitted the "extern" keyword for exported
variables. This in turn broke clang-cl.
This commit rewrites and simplifies the whole header.
Closes #163.
|
|
a218ff0e
|
2020-12-06T17:26:36
|
|
Fix null pointer deref in xmlXPtrRangeInsideFunction
Found by OSS-Fuzz.
|
|
94c2e415
|
2020-12-06T16:38:00
|
|
Fix quadratic runtime in HTML push parser with null bytes
Null bytes in the input stream do not necessarily signal an EOF
condition. Check the stream pointers for EOF to avoid quadratic
rescanning of input data.
Note that the CUR_CHAR macro used in functions like htmlParseCharData
calls htmlCurrentChar which translates null bytes.
Found by OSS-Fuzz.
|
|
1c4f9a6d
|
2020-11-25T18:01:51
|
|
Require dependencies based on enabled CMake options
|
|
faea2fa9
|
2020-11-21T01:21:56
|
|
Avoid quadratic checking of identity-constraints
key/unique/keyref schema attributes currently use qudratic loops
to check their various constraints (that keys are unique and that
keyrefs refer to existing keys). That becomes extremely slow if
there are many elements with keys. This happens in the wild with
e.g. the OVAL XML descriptions of security patches. You need the
openscap schemata, and then an example xml file:
% zypper in openscap-utils
% wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml
% time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates
real 16m59,857s
user 16m55,787s
sys 0m1,060s
This patch makes libxml use a hash table to avoid the quadratic
behaviour. The existing hash table only accepts strings as keys, so
we're mostly reusing the canonical representation of key values to derive
such strings (with the caveat given in a comment). The alternative
would be to rework the hash table code to accept either numbers or free
functions as hash workers, but the code is fast enough as is.
With the patch we have this then:
% time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates
real 0m3,531s
user 0m3,427s
sys 0m0,103s
So, a ~300x speedup. This patch survives 'make check' and 'make tests'.
|
|
8272db53
|
2020-11-28T22:54:40
|
|
Use NAMELINK_COMPONENT in CMake install
|
|
5c7bdbc9
|
2020-11-25T18:41:14
|
|
Add CMake files to EXTRA_DIST
|
|
7a62870a
|
2020-11-19T22:06:23
|
|
Add missing compile definition for static builds to CMake
|
|
e028d293
|
2020-11-19T17:58:46
|
|
Add CI for CMake on Linux and MinGW
|
|
b516ed18
|
2020-11-12T12:53:43
|
|
Fix building with ICU 68.
ICU 68 no longer defines the TRUE macro.
Closes #204.
|
|
ac5e9991
|
2020-11-10T15:42:36
|
|
Convert python/libxml.c to PY_SSIZE_T_CLEAN
Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string
length (int len) explicitly to Py_ssize_t when passing a string to a
function call using PyObject_CallMethod() with the "s#" format.
|
|
f42a0524
|
2020-11-09T18:19:31
|
|
Build the Python extension with PY_SSIZE_T_CLEAN
The Python extension module now uses Py_ssize_t rather than int for
string lengths. This change makes the extension compatible with
Python 3.10.
Fixes #203.
|
|
0ace6c4d
|
2020-11-19T17:35:11
|
|
Add CI test for Python 3
|
|
7c06d99e
|
2020-10-27T11:29:20
|
|
Fix xmlURIEscape memory leaks.
Found by running the fuzz/uri.c fuzzer under asan (internal Android bug
171610679).
Always free `ret` when exiting on failure. I've moved the definition of
NULLCHK down past where ret is always initialized to make it clear that
this is safe.
This patch also fixes the indentation of two of the NULLCHK call sites
to make it more obvious that NULLCHK isn't `if`-like.
|
|
31c6ce3b
|
2020-11-09T17:55:44
|
|
Avoid call stack overflow with XML reader and recursive XIncludes
Don't process XIncludes in the result of another inclusion to avoid
infinite recursion resulting in a call stack overflow.
This is something the XInclude engine shouldn't allow but correct
handling of intra-document includes would require major changes.
Found by OSS-Fuzz.
|
|
7d6837ba
|
2020-10-25T20:21:43
|
|
Fix caret in regexp character group
Apply Per Hedeland's patch from
https://bugzilla.gnome.org/show_bug.cgi?id=779751
Fixes #188.
|
|
8a85263f
|
2020-10-25T20:08:16
|
|
Add fuzzing dictionaries to EXTRA_DIST
Also add static seed corpus for the URI fuzzer.
|
|
1bde1040
|
2020-10-25T20:02:23
|
|
Add 'fuzz' subdirectory to DIST_SUBDIRS
Fixes #191.
|
|
c0c26ff2
|
2020-10-11T16:33:07
|
|
parser.c: xmlParseCharData peek behavior fixed wrt newlines
Previously, xmlParseCharData and xmlParseComment would consider 0xA to
be unhandleable when seen as the first byte of an input chunk, and
fall back to xmlParseCharDataComplex and xmlParseCommentComplex, which
have different memory and performance characteristics.
Fixes GNOME/libxml2#192
|
|
b46016b8
|
2020-10-17T18:03:09
|
|
Allow port numbers up to INT_MAX
Also return an error on overflow.
|
|
46837d47
|
2020-10-03T01:13:35
|
|
Fix memory leaks in XPointer string-range function
Found by OSS-Fuzz.
|
|
0b3c64d9
|
2020-09-29T18:08:37
|
|
Handle dumps of corrupted documents more gracefully
Check parent pointers for NULL after the non-recursive rewrite of the
serialization code. This avoids segfaults with corrupted documents
which can apparently be seen with lxml, see issue #187.
|
|
847a3a11
|
2020-09-28T12:28:29
|
|
Fix use-after-free when XIncluding text from Reader
The XML Reader can free text nodes coming from the XInclude engine
before parsing has finished. Cache a copy of the text string, not the
included node to avoid use after free.
Found by OSS-Fuzz.
|
|
7929f057
|
2020-08-30T10:34:01
|
|
Fix SEGV in xmlSAXParseFileWithData
Fixes #181.
|
|
e6ec58ec
|
2020-09-21T12:49:36
|
|
Fix null deref in XPointer expression error path
Make sure that the filter functions introduced with commit c2f4da1a
return node-sets without NULL pointers also in the error case.
Found by OSS-Fuzz.
|
|
4e9cc18b
|
2020-09-21T11:00:23
|
|
Fix variable name in win32/configure.js
Fix copy/paste error from previous commit.
|
|
5614c078
|
2020-09-21T10:55:45
|
|
Fix version parsing in win32/configure.js
Adjust to configure.ac changes.
Should fix #185.
|
|
8b88503a
|
2020-09-18T19:15:27
|
|
Don't call xmlXPathInit directly
Call xmlInitParser which uses a lock to avoid race conditions.
Fixes #184.
|
|
b215c270
|
2020-09-13T12:19:48
|
|
Fix cleanup of attributes in XML reader
xml:id creates ID attributes even in documents without a DTD, so the
check in xmlTextReaderFreeProp must be changed to avoid use after free.
Found by OSS-Fuzz.
|
|
f0fd1b67
|
2020-08-26T00:16:38
|
|
Limit size of free lists in XML reader when fuzzing
Keeping objects on a free list can hide memory errors. Only allow a
single node on free lists used by the XML reader when fuzzing. This
should hide fewer errors while still exercising the free list logic.
|
|
ba589adc
|
2020-08-25T23:50:39
|
|
Fix double free in XML reader with XIncludes
An XInclude with empty fallback could lead to a double free in
xmlTextReaderRead.
Found by OSS-Fuzz.
|
|
6f1470a5
|
2020-08-25T18:50:45
|
|
Hardcode maximum XPath recursion depth
Always limit nested functions calls to 5000. This avoids call stack
overflows with deeply nested expressions.
The expression parser produces about 10 nested function calls when
parsing a subexpression in parentheses, so the effective nesting limit
is about 500 which should be more than enough.
Use a lower limit when fuzzing to account for increased memory usage
when using sanitizers.
|
|
8c3ef083
|
2020-08-24T23:17:34
|
|
Pass URL of main entity in XML fuzzer
|
|
0d5f3710
|
2020-08-24T16:28:54
|
|
Consolidate seed corpus generation
Implement file handling in C to speed up corpus generation.
|
|
0d9da029
|
2020-08-24T03:16:25
|
|
Test fuzz targets with dummy driver
Run fuzz targets with files in seed corpus during test.
|
|
3fcf3193
|
2020-08-22T00:43:18
|
|
Fix regression introduced with commit d88df4b
Revert the commit and use a different approach.
Found by OSS-Fuzz.
|
|
87d20b55
|
2020-08-19T13:52:08
|
|
Fix regression introduced with commit 74dcc10b
The code wasn't dead after all, but I can see no reason in delaying
the XPointer evaluation. This could lead to nodes included earlier
appearing in XPointer results.
|
|
fbb7fa9a
|
2020-08-19T13:13:20
|
|
Fix memory leak in xmlXIncludeAddNode error paths
Found by OSS-Fuzz.
|
|
19cae17f
|
2020-08-19T13:07:28
|
|
Revert "Fix quadratic runtime in xi:fallback processing"
This reverts commit 27119ec33c9f6b9830efa1e0da0acfa353dfa55a.
Not copying fallback children didn't fix up namespaces and could lead
to use-after-free errors.
Found by OSS-Fuzz.
|
|
d63cfeca
|
2020-08-17T15:40:06
|
|
Add TODO comment in xinclude.c
Add some thoughts on the major remaining problems with the XInclude
implementation.
|
|
804c5297
|
2020-08-17T03:37:18
|
|
Stop using maxParserDepth in xpath.c
Only use a single maxDepth value.
|
|
74dcc10b
|
2020-08-17T03:24:56
|
|
Remove dead code in xinclude.c
'doc' is checked for NULL in xmlXIncludeLoadDoc, so several code
paths can be eliminated.
|
|
0ff52748
|
2020-08-17T02:54:28
|
|
Fix autotools warnings
|
|
2c747129
|
2020-08-17T00:54:12
|
|
Fix error reporting with xi:fallback
When reporting errors, don't use href of xi:include if xi:fallback
was used. I think this can only be reproduced with
"xmllint --postvalid", see the original bug report:
https://bugzilla.gnome.org/show_bug.cgi?id=152623
|
|
27119ec3
|
2020-08-17T00:05:19
|
|
Fix quadratic runtime in xi:fallback processing
Copying the tree would lead to runtime quadratic in nested fallback
depth, similar to naive string concatenation.
|
|
d88df4bd
|
2020-08-16T23:38:48
|
|
Fix corner case with empty xi:fallback
xi:fallback could become empty after recursive expansion. Use a flag
to track whether nodes should be skipped.
|
|
00a86d41
|
2020-08-16T23:38:00
|
|
Don't add formatting newlines to XInclude nodes
|
|
dba82a8c
|
2020-08-16T23:02:20
|
|
Fix XInclude regression introduced with recent commit
The change to xmlXIncludeLoadFallback in commit 11b57459 could
process already freed nodes if text nodes were merged after deleting
nodes with an empty fallback.
Found by OSS-Fuzz.
|
|
e1c2d0ad
|
2020-08-16T22:22:57
|
|
Fix memory leak in runtest.c
|
|
2b4769a6
|
2020-08-16T22:02:04
|
|
Make "xmllint --push --recovery" work
|
|
99fc048d
|
2020-08-14T14:18:50
|
|
Don't use SAX1 if all element handlers are NULL
Running xmllint with "--sax --noout" installs a SAX2 handler with all
callbacks set to NULL. In this case or similar situations, we don't want
to switch to SAX1 parsing.
|
|
c1ba6f54
|
2020-08-15T18:32:29
|
|
Revert "Do not URI escape in server side includes"
This reverts commit 960f0e275616cadc29671a218d7fb9b69eb35588.
This commit introduced
- an infinite loop, found by OSS-Fuzz, which could be easily fixed.
- an algorithm with quadratic runtime
- a security issue, see
https://bugzilla.gnome.org/show_bug.cgi?id=769760
A better approach is to add an option not to escape URLs at all
which libxml2 should have possibly done in the first place.
|
|
b82fa3dd
|
2020-08-09T14:50:46
|
|
Fix column number accounting in xmlParse*NameAndCompare
Thanks to Frederic Vancraeyveldt for the report.
|
|
438e595a
|
2020-08-09T14:43:53
|
|
Stop counting nbChars in parser context
The value was inaccurate and never used.
|
|
f6a9541f
|
2020-08-09T14:29:35
|
|
Remove unneeded progress checks in HTML parser
The HTML parser should now be guaranteed to make progress, so the
checks became unnecessary.
|
|
9de7b94d
|
2020-08-08T20:37:30
|
|
Use strcmp when fuzzing
This should improve data-flow-guided fuzzing.
|
|
10a07948
|
2020-08-08T17:46:11
|
|
Fix XPath fuzzer
|
|
6c128fd5
|
2020-06-05T13:43:45
|
|
Fuzz XInclude engine
|
|
50f06b3e
|
2020-08-07T21:54:27
|
|
Fix out-of-bounds read with 'xmllint --htmlout'
Make sure that truncated UTF-8 sequences don't cause an out-of-bounds
array access.
Thanks to @SuhwanSong and the Agency for Defense Development (ADD) for
the report.
Fixes #178.
|
|
1abf2967
|
2020-08-06T17:51:57
|
|
Fix exponential runtime and memory in xi:fallback processing
When creating XML_XINCLUDE_START nodes, the children of the original
xi:include node must be freed, otherwise fallback content is copied
twice, doubling runtime and memory consumption for each nested
xi:fallback/xi:include pair.
Found with libFuzzer.
|
|
11b57459
|
2020-08-07T18:39:19
|
|
Don't process siblings of root in xmlXIncludeProcess
xmlXIncludeDoProcess would follow the siblings of the tree root and
also expand these nodes. When using an XML reader, this could lead to
siblings of the current node being expanded without having been parsed
completely.
|
|
0f9817c7
|
2020-06-10T16:34:52
|
|
Don't recurse into xi:include children in xmlXIncludeDoProcess
Otherwise, nested xi:include nodes might result in a use-after-free
if XML_PARSE_NOXINCNODE is specified.
Found with libFuzzer and ASan.
|
|
5725c115
|
2020-06-10T15:11:40
|
|
Fix memory leak in xmlXIncludeIncludeNode error paths
Found with libFuzzer and ASan.
|
|
ad26a60f
|
2020-08-06T13:20:01
|
|
Add XPath and XPointer fuzzer
|