|
e4c91f74
|
2021-11-03T11:41:11
|
|
Fix Null-deref-in-xmlSchemaGetComponentTargetNs
|
|
9277abe2
|
2022-01-16T15:50:56
|
|
Fix libxml2.doap
Add description.
Change category to "infrastructure". Apparently, "platform" isn't
allowed anymore.
Add programming language.
|
|
87a99270
|
2021-08-26T11:50:41
|
|
Added regression tests for xmlReadFd() and htmlReadFd()
|
|
fe6890e2
|
2021-07-27T13:20:20
|
|
Fix htmlReadFd, which was using a mix of xml and html context functions
|
|
67953a9f
|
2022-01-16T15:30:02
|
|
Fix memory leak in xmlXPathCompNodeTest
Found by Coverity.
|
|
1b7d4e2b
|
2021-07-22T14:46:48
|
|
tstmem.py: Try importing from libxmlmods.libxml2mod if needed
Distutils builds place libxml2mod.pyd under the libxmlmods subdir, so try this
directory if 'import libxml2mod' failed.
|
|
6e169c14
|
2021-03-30T16:11:13
|
|
python: Port python 3.x module to Windows
On Windows, we don't have fcntl() which helps us to find out how a file was
opened, so we need to resort to the Windows API NtQueryInformationFile() in
ntdll.dll to help us, and compare the file access modes as appropriate to
deduce the modes we want to pass into fdopen().
As all official Python 3.x releases are built against newer Windows CRTs that
toughen checks on the validity of the file descriptor when we convert the fd to
a native Windows File Handle using _get_osfhandle(), we need to define an empty
handler so that the program does not abort if the fd that was passed in was
invalid; instead, we just return NULL if _get_osfhandle() could not return us a
valid Windows File Handle.
|
|
3cc64a88
|
2021-07-22T15:46:38
|
|
setup.py.in: Try to import setuptools
This way, we can build binary wheels easily if needed
|
|
dbfe6151
|
2021-07-22T15:36:15
|
|
Python distutils: Make DLL packaging more flexible
This updates setup.py.in to pack the DLLs according to the options we specified
to configure.js or CMake (or, even configure, although autotools builds are not
likely to build the libxml2 Python module via distutils).
At this point, we can pack only the DLLs that libxml2 really depends on, and
pack the libxslt DLLs only if we really built the libxslt Python modules.
Also make the DLL filenames more easily configured
|
|
eb4c1bf8
|
2021-11-03T09:48:13
|
|
Fix random dropping of characters on dumping ASCII encoded XML
Fix a bug in xmlCharEncOutput return value which will cause
xmlNodeDumpOutput to drop characters randomly.
xmlCharEncOutput returns zero if the length of the input buffer is
zero but ignores the fact that it may already encoded the input buffer
and the input's length is zero due to the fact that xmlEncOutputChunk
returned -2 errors and underlying code tries to fix the error by
encoding the input.
xmlCharEncOutput is collecting the number of bytes written to the
output buffer but is returning zero instead of the total number of
bytes in this situation. This commit will fix this issue by returning
the total number of bytes instead. So the xmlNodeDumpOutput will also
continue writing and will not stop due to the fact that it mistakenly
thinks the output buffer is not changed in that iteration.
Fixes #314
|
|
66fb340a
|
2021-10-14T15:01:24
|
|
Update URL for libxml++ C++ binding
Fixes #267
|
|
ae728bb8
|
2022-01-16T15:05:41
|
|
Fix null pointer deref in xmlStringGetNodeList
Check for malloc failure to avoid null deref.
|
|
46c658b0
|
2021-08-06T08:48:24
|
|
move current position before possible calling of ctxt->sax->characters.
|
|
96753450
|
2021-07-29T12:14:03
|
|
Correctly install the HTML examples into their subdirectory.
Previous to this commit, the examples where installed haphazardly within
all the other html documents, also overwriting index.html, for example.
Signed-off-by: Mattia Rizzolo <mattia@mapreri.org>
|
|
7c0253aa
|
2021-07-29T12:11:08
|
|
Refactor the settings of $docdir
This is a completely noop change for this project, since before this
commit nothing was using $docdir nor PROGRAM_TARNAME.
Setting the fourth parameter of AC_INIT() makes it set PROGRAM_TARNAME,
which then used as the last path component of the default docdir,
effectively making $docdir be the same as the previous
$BASE_DIR/$DOC_MODULE.
Signed-off-by: Mattia Rizzolo <mattia@mapreri.org>
|
|
51c88c6f
|
2021-07-26T20:12:45
|
|
configure: remove unused checks for functions
Nothing uses the results from these checks, so remove the checks. There
are some "uses" in order to suppress macro shadowing in MSVC's
implementation of `isinf` and `isnan` as macros, but those are
hard-coded and do not require checks to manage.
|
|
1a013ba7
|
2021-07-26T20:11:56
|
|
configure: remove unused checks for libraries
These libraries are queried for, but no code cares about the results, so
remove the checks.
|
|
0aad075c
|
2021-07-26T20:10:52
|
|
cmake: remove unused checks
Even the configured `config.h` did not forward the results of these
checks.
|
|
9669bd68
|
2021-07-26T20:09:32
|
|
configure: remove unused checks for headers
These headers are checked for at configure time, but the code never
cares about the results of these checks, so skip them.
|
|
f8608235
|
2021-07-26T20:06:18
|
|
cmake: fix `ATTRIBUTE_DESTRUCTOR` definition
The code expects it to be set to the attribute for `xmlDestructor`, but
in CMake, it is only ever available as `1` or undefined. Instead, match
the behavior or autoconf.
|
|
3ba59b93
|
2021-07-23T22:34:29
|
|
Generate devhelp2 index file
The devhelp2 format was introduced in 2005, and the devhelp format was
deprecated in 2017.
Fixes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/295
|
|
91b3d3f9
|
2021-07-14T17:12:11
|
|
Remove duplicated code in xmlcatalog
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
d7f11fd0
|
2021-07-14T17:03:46
|
|
Fix leak in __xmlOutputBufferCreateFilename
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
477f6de3
|
2021-07-14T15:35:31
|
|
Fix memory leak in xmlRelaxNGNewDocParserCtxt
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
483de2c2
|
2021-07-14T15:31:55
|
|
Fix memory leak in xmlRelaxNGParseData
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
9a9dd31b
|
2021-07-14T15:28:56
|
|
Fix memory leak in libxml_C14NDocSaveTo
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
d68c1637
|
2021-07-14T15:23:11
|
|
Fix memory leak in libxml_saveNodeTo
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
328456bf
|
2021-07-14T14:43:59
|
|
Fix memory leak in xmlNewInputFromFile
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
fe564967
|
2021-07-14T14:35:17
|
|
Fix memory leak in xmlCreateIOParserCtxt
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
f0904f32
|
2021-07-14T14:14:34
|
|
Fix memory leak in xmlParseSGMLCatalog
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
2510f43c
|
2021-07-14T14:03:44
|
|
Fix memory leak in xmlParseCatalogFile
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
92bce68c
|
2021-07-14T11:37:07
|
|
Fix memory leak in xmlSAX2AttributeDecl
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
e7d1c53a
|
2021-07-14T11:32:57
|
|
Fix memory leak in xmlFreeParserInputBuffer
Found by Coverity.
https://bugzilla.redhat.com/show_bug.cgi?id=1938806
|
|
03bb9293
|
2021-07-07T18:23:18
|
|
Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8().
* encoding.c:
(UTF16LEToUTF8):
- Fix comment to describe what the code does.
(UTF16BEToUTF8):
- Fix undefined behavior which was applied to UTF16LEToUTF8() in
2f9382033e.
- Add bounds check to while() loop which was applied to
UTF16LEToUTF8() in be803967db.
- Do not return -2 when (in >= inend) to fix the bug. This was
applied to UTF16LEToUTF8() in 496a1cf592.
- Inline (<< 8) statements to match UTF16LEToUTF8().
Add the following tests and results:
test/text-4-byte-UTF-16-BE-offset.xml
test/text-4-byte-UTF-16-BE.xml
test/text-4-byte-UTF-16-LE-offset.xml
test/text-4-byte-UTF-16-LE.xml
|
|
e6adc19f
|
2021-07-05T13:40:54
|
|
man: Mention XML_CATALOG_FILES is space-separated
Fixes: https://bugzilla.gnome.org/show_bug.cgi?id=781274
|
|
bdd482c2
|
2021-07-05T18:48:10
|
|
add documentaiton for xmllint exit code 10
Closes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/280
|
|
a0f9211b
|
2021-06-28T02:03:15
|
|
python/Makefile.am: use *_LIBADD, not *_LDFLAGS for LIBS
This fixes over-linking in the built Python modules with various libraries.
*_LIBADD is intended for adding additional libraries for linking, while
*_LDFLAGS is for miscellaneous extra flags (possibly user-supplied).
If using -Wl,-as-needed within user-supplied LDFLAGS, it is passed too
late (after the library link line) and therefore has no effect.
Notes:
* Noticed while working on Gentoo's migration to libxcrypt because
libxml2's Python modules were linking to libcrypt (and other libraries)
unexpectedly.
* It was suggested we could actually stop linking explicitly with all
of Python's libraries / don't copy its LDFLAGS, but this resolves
the original issue downstream and is a separate discussion. I couldn't
find any clear documentation for/against such a change.
Bug: https://bugs.gentoo.org/798942
Signed-off-by: Sam James <sam@gentoo.org>
|
|
ff05c94a
|
2022-01-16T13:56:17
|
|
Fix check for libtool in autogen.sh
libtoolize is named glibtoolize on some macOS systems.
|
|
343bf0d3
|
2022-01-16T13:52:21
|
|
Add myself to maintainers
Fixes #319.
|
|
c35628a2
|
2022-01-15T18:18:22
|
|
Revert "Make schema validation fail with multiple top-level elements"
This reverts commit 4f2aee18f6e2d40e58eb224f4f7935dc2400fe25.
Fixes #305.
|
|
798bdf13
|
2022-01-10T14:50:20
|
|
Different approach to fix quadratic behavior in HTML push parser
The old approach introduced a regression, see issue #312 and the
previous commit. Disable code that tries to recover from invalid start
tags. This only affects "recovery" mode.
Add a comment outlining a better fix in accordance with the HTML5 spec.
|
|
094fc08a
|
2022-01-10T14:02:10
|
|
Fix regression when parsing invalid HTML tags in push mode
Revert part of commit 173a0830 that changed behavior when parsing
malformed start tags with the push parser. This reintroduces quadratic
behavior in recovery mode which will be worked around in the next
commit.
Fixes #312.
|
|
2732b234
|
2022-01-10T13:32:14
|
|
Fix regression parsing public IDs literals in HTML
Fix regression introduced when reworking htmlParsePubidLiteral in
commit 93ce33c2.
Fixes #318.
|
|
dea91c97
|
2021-07-27T16:12:54
|
|
Fix buffering in xmlOutputBufferWrite
Fix a regression introduced with commit a697ed1e which caused
xmlOutputBufferWrite to flush internal buffers too late.
Fixes #296.
|
|
ec6e3efb
|
2021-07-06T21:56:04
|
|
Patch to forbid epsilon-reduction of final states
When building the internal representation of a regexp, it is possible
that a lot of empty transitions are created. Therefore there is a step
to reduce them in the function xmlFAEliminateSimpleEpsilonTransitions.
There is an error there for this case:
* State 1 has a transition with an atom (in this case "a") to state 2.
* State 2 is final and has an epsilon transition to state 1.
After reduction it looked like:
* State 1 has a transition with an atom (in this case "a") to itself
and is final.
In other words, the empty string is accepted when it shouldn't be.
The attached patch skips the reduction step for final states.
An alternative would be to insert or increment counters when reducing a
final state, but this seemed error prone and unnecessary, since there
aren't that many final states.
Fixes #282
|
|
22f15211
|
2021-06-04T09:57:46
|
|
Use version in configure.ac for CMake
Now CMake script reads version from configure.ac to prevent unsynchronized versions
|
|
92d9ab4c
|
2021-06-07T15:09:53
|
|
Fix whitespace when serializing empty HTML documents
The old, non-recursive HTML serialization code would always terminate
the output with a newline. The new implementation omitted the newline
if the document node had no children. Readd the newline when
serializing empty documents.
Fixes #266.
|
|
3e1aad4f
|
2021-06-02T17:31:49
|
|
Fix XPath recursion limit
Fix accounting of recursion depth when parsing XPath expressions.
This silly bug introduced in commit 804c5297 could lead to spurious
errors when parsing larger expressions or XSLT documents.
Should fix #264.
|
|
13ad8736
|
2021-05-25T10:55:25
|
|
Fix regression in xmlNodeDumpOutputInternal
Commit 85b1792e could cause additional whitespace if xmlNodeDump was
called with a non-zero starting level.
|
|
a46e85f6
|
2021-05-22T15:20:46
|
|
Update CMake project version
|
|
a1cac3bb
|
2021-05-22T14:51:26
|
|
Add CMake alias targets for embedded projects
|
|
2c0f2f03
|
2021-05-18T09:52:55
|
|
Fix some validation errors in the FAQ
Move paragraphs inside li elements.
|
|
b92b16f6
|
2021-05-19T10:15:54
|
|
Remove unused variable in xmlCharEncOutFunc
Fixes a compiler warning:
encoding.c: In function 'xmlCharEncOutFunc__internal_alias':
encoding.c:2632:9: warning: unused variable 'output' [-Wunused-variable]
2632 | int output = 0;
https://gitlab.gnome.org/GNOME/libxml2/-/issues/254
|
|
7d4060d2
|
2021-05-16T18:00:21
|
|
Add missing file xmlwin32version.h.in to EXTRA_DIST
|
|
4fc473d7
|
2021-05-16T17:48:07
|
|
Add instructions on how to use CMake to compile libxml
|
|
85b1792e
|
2021-05-18T20:08:28
|
|
Work around lxml API abuse
Make xmlNodeDumpOutput and htmlNodeDumpFormatOutput work with corrupted
parent pointers. This used to work with the old recursive code but the
non-recursive rewrite required parent pointers to be set correctly.
Unfortunately, lxml relies on the old behavior and passes subtrees with
a corrupted structure. Fall back to a recursive function call if an
invalid parent pointer is detected.
Fixes #255.
|
|
a7b9f3eb
|
2021-05-20T13:38:54
|
|
fix: avoid segfault at exit when using custom memory functions
This extends the fix introduced by 956534e to Windows processes
dynamically loading libxml2.
Closes #256.
|
|
b48e77cf
|
2021-05-13T20:56:16
|
|
Release of libxml2-2.9.12
Brown paper bag release, some recently added sources were missing from
the 2.9.11 tarball:
- configure.ac: bump version
- fuzz/Makefile.am: add fuzz.h and seed/regexp to EXTRA_DIST
|
|
e1bcffea
|
2021-05-13T15:35:21
|
|
Release of libxml2-2.9.11
Prompted by CVE-2021-3541, but this includes an awful lot of serious bug
fixes by Nick and others.
- configure.ac: bumped to new release
- doc/* updated and regenerated
|
|
8598060b
|
2021-05-13T14:55:12
|
|
Patch for security issue CVE-2021-3541
This is relapted to parameter entities expansion and following
the line of the billion laugh attack. Somehow in that path the
counting of parameters was missed and the normal algorithm based
on entities "density" was useless.
|
|
bfd2f430
|
2021-05-09T18:56:57
|
|
Fix null deref in legacy SAX1 parser
Always call nameNsPush instead of namePush. The latter is unused now
and should probably be removed from the public API. I can't see how
it could be used reasonably from client code and the unprefixed name
has always polluted the global namespace.
Fixes a null pointer dereference introduced with de5b624f when parsing
in SAX1 mode.
Found by OSS-Fuzz.
|
|
ce00c36e
|
2021-05-08T21:20:05
|
|
Store per-element parser state in a struct
Make the parser context's "pushTab" point to an array of structs
instead of void pointers. This avoids casting unrelated types to void
pointers, improving readability and portability, and allows for more
efficient packing. Ultimately, the struct could be extended to include
the contents of "nameTab" and "spaceTab", further simplifying the code.
Historically, "pushTab" was only used by the push parser (hence the
name), so the change to the public headers should be safe.
Also remove an unused parameter from xmlParseEndTag2.
|
|
de5b624f
|
2021-05-08T20:21:29
|
|
Fix handling of unexpected EOF in xmlParseContent
Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was
removed in commit 62150ed2.
This commit also introduced a regression for direct users of
xmlParseContent. Unclosed tags weren't checked.
|
|
3e80560d
|
2021-05-07T10:51:38
|
|
Fix line numbers in error messages for mismatched tags
Commit 62150ed2 introduced a small regression in the error messages for
mismatched tags. This typically only affected messages after the first
mismatch, but with custom SAX handlers all line numbers would be off.
This also fixes line numbers in the SAX push parser which were never
handled correctly.
|
|
7279d236
|
2021-05-06T10:37:07
|
|
Fix htmlTagLookup
Fix regression introduced with b25acce8. Some users like libxslt may
call the HTML output functions on documents with uppercase tag names,
so we must keep case-insensitive string comparison.
Fixes #248.
|
|
33468d7e
|
2021-05-03T16:09:44
|
|
update for xsd:language type check
Fixes #242.
|
|
babe7503
|
2021-05-01T16:53:33
|
|
Propagate error in xmlParseElementChildrenContentDeclPriv
Check return value of recursive calls to
xmlParseElementChildrenContentDeclPriv and return immediately in case
of errors. Otherwise, struct xmlElementContent could contain unexpected
null pointers, leading to a null deref when post-validating documents
which aren't well-formed and parsed in recovery mode.
Fixes #243.
|
|
5465a8e5
|
2021-04-25T21:19:59
|
|
Update INSTALL.libxml2
Fixes #238.
|
|
1098c30a
|
2021-04-22T19:26:28
|
|
Fix user-after-free with `xmllint --xinclude --dropdtd`
The --dropdtd option can leave dangling pointers in entity reference
nodes. Make sure to skip these nodes when processing XIncludes.
This also avoids scanning entity declarations and even modifying
them inadvertently during XInclude processing.
Move from a block list to an allow list approach to avoid descending
into other node types that can't contain elements.
Fixes #237.
|
|
72b3c067
|
2021-04-22T19:24:50
|
|
Fix dangling pointer with `xmllint --dropdtd`
Reset doc->intSubset when dropping the DTD.
|
|
bf227135
|
2020-08-16T17:19:35
|
|
Validate UTF8 in xmlEncodeEntities
Code is currently assuming UTF-8 without validating. Truncated UTF-8
input can cause out-of-bounds array access.
Adds further checks to partial fix in 50f06b3e.
Fixes #178
|
|
1358d157
|
2021-04-21T13:23:27
|
|
Fix use-after-free with `xmllint --html --push`
Call htmlCtxtUseOptions to make sure that names aren't stored in
dictionaries.
Note that this issue only affects xmllint using the HTML push parser.
Fixes #230.
|
|
fb08d9fe
|
2021-03-20T22:02:26
|
|
Fix include order in c14n.h
- Include xmlversion.h before testing feature flags.
- Include libxml headers before extern "C".
Fixes #226.
|
|
d3a02679
|
2021-03-15T13:44:34
|
|
CMake: Only add postfixes if MSVC
Currently, it catches mingw-w64 in there as well, but mingw-w64 follows
linux-like naming with no weird postfixes
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
|
|
868e49cf
|
2021-03-16T10:36:04
|
|
Allow FP division by zero in xmlXPathInit
|
|
d25460da
|
2021-03-13T19:12:00
|
|
Fix XPath NaN/Inf for older GCC versions
The DBL_MAX approach could lead to errors caused by excess precision.
Switch back to the division-by-zero approach with a work-around for
MSVC and use the extern globals instead of macro expressions.
|
|
e20c9c14
|
2021-03-13T18:41:47
|
|
Fix xmlGetNodePath with invalid node types
Make xmlGetNodePath return NULL instead of invalid XPath when hitting
unsupported node types like DTD content.
Reported here:
https://mail.gnome.org/archives/xml/2021-January/msg00012.html
Original report:
https://bugs.php.net/bug.php?id=80680
|
|
c3fd8c42
|
2021-03-13T17:19:32
|
|
Fix exponential behavior with recursive entities
Fix another case where only recursion depth was limited, but entities
would still be expanded over and over again.
The test case discovered by fuzzing only affected parsing in recovery
mode with XML_PARSE_RECOVER.
Found by OSS-Fuzz.
|
|
683de7ef
|
2021-03-04T19:06:04
|
|
Fix duplicate xmlStrEqual calls in htmlParseEndTag
|
|
8095365b
|
2021-03-04T18:46:11
|
|
Speed up htmlCheckAutoClose
Switch to binary search.
|
|
b25acce8
|
2021-03-04T17:44:45
|
|
Speed up htmlTagLookup
Switch to binary search. This is the first time bsearch is used in the
libxml2 code base. But it's a standard library function since C89 and
should be portable.
|
|
ad101bb5
|
2021-03-02T13:32:53
|
|
Clarify xmlNewDocProp documentation
|
|
a6e6498f
|
2021-03-02T13:09:06
|
|
Stop checking attributes for UTF-8 validity
I can't see a reason to check attribute content for UTF-8 validity.
Other parts of the API like xmlNewText have always assumed valid UTF-8
as extra checks only slow down processing.
Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not
freeing the old encoding would cause a memory leak.
Note that this was last changed in 2008 with commit 6f8611fd which
removed unnecessary encoding/decoding steps. Setting attributes should
be even faster now.
Found by OSS-Fuzz.
|
|
8446d459
|
2021-03-01T20:56:40
|
|
Reduce some fuzzer timeouts
OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for
several hundred hours without hitting the 20s timeout. It seems that
most timeouts resulting from accidentally quadratic behavior in the
HTML parser have been fixed. Start to gradually reduce the timeout to
find new performance issues.
|
|
688b41a0
|
2021-03-01T14:17:42
|
|
Fix quadratic behavior when looking up xml:* attributes
Add a special case for the predefined XML namespace when looking up DTD
attribute defaults in xmlGetPropNodeInternal to avoid calling
xmlGetNsList.
This fixes quadratic behavior in
- xmlNodeGetBase
- xmlNodeGetLang
- xmlNodeGetSpacePreserve
Found by OSS-Fuzz.
|
|
ce2fbaa8
|
2021-02-22T22:01:57
|
|
Only run a few CI tests unless scheduled
Only run the following tests by default
- gcc
- clang:asan
- cmake:mingw:w64-x86_64:shared
- cmake:msvc:v141:x64:shared
|
|
85c817a2
|
2021-02-22T21:28:21
|
|
Improve fuzzer stability
- Add more calls to xmlInitializeCatalog.
- Call xmlResetLastError after fuzzing each input.
|
|
f9ccb3b8
|
2021-02-22T21:26:13
|
|
Check for feature flags in fuzzer tests
|
|
88c657d6
|
2021-02-22T21:11:00
|
|
Use CMake PROJECT_VERSION
|
|
7a90bdfa
|
2021-02-22T17:58:06
|
|
Another attempt at improving fuzzer stability
xmlInitializeCatalog is not called from xmlInitParser.
|
|
0fb3ae58
|
2021-02-22T17:31:05
|
|
Revert "Improve HTML fuzzer stability"
This reverts commit de1b51eddcc17fd7ed1bbcc6d5d7d529407dfbe2.
|
|
0987001c
|
2021-02-22T12:29:56
|
|
Add charset names to fuzzing dictionaries
|
|
de1b51ed
|
2021-02-22T12:25:29
|
|
Improve HTML fuzzer stability
Call htmlInitAutoClose during fuzzer initialization to fix stability
issue. Leave a note concerning problems with this function.
|
|
09320f05
|
2021-02-21T14:26:40
|
|
Add CI for MSVC x86
|
|
dcb80b92
|
2021-02-20T20:30:43
|
|
Fix slow parsing of HTML with encoding errors
Under certain circumstances, the HTML parser would try to guess and
switch input encodings multiple times, leading to slow processing of
documents with encoding errors. The repeated scanning of the input
buffer when guessing encodings could even lead to quadratic behavior.
The code htmlCurrentChar probably assumed that if there's an encoding
handler, it is guaranteed to produce valid UTF-8. This holds true in
general, but if the detected encoding was "UTF-8", the UTF8ToUTF8
encoding handler simply invoked memcpy without checking for invalid
UTF-8. This still must be fixed, preferably by not using this handler
at all.
Also leave a note that switching encodings twice seems impossible to
implement correctly. Add a check when handling UTF-8 encoding errors
in htmlCurrentChar to avoid this situation, even if encoders produce
invalid UTF-8.
Found by OSS-Fuzz.
|
|
02bee4c4
|
2021-02-02T22:27:52
|
|
Add a flag to not output anything when xmllint succeeded
|
|
4defa2c2
|
2021-02-12T09:39:38
|
|
Fix warnings in libxml.m4 with autoconf 2.70+.
Closes #219.
|
|
cbe1212d
|
2021-02-09T17:07:21
|
|
Fix null deref introduced with previous commit
Found by OSS-Fuzz.
|
|
01411e7c
|
2021-02-08T20:58:32
|
|
Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.
Note that some test cases declared
<!ENTITY lt "<">
But the XML spec clearly states that this is illegal:
> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.
Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.
As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
|
|
07920b43
|
2021-01-26T05:42:48
|
|
Add the copy of type from original xmlDoc in xmlCopyDoc()
A bug related to php DOMDocument:
https://bugs.php.net/bug.php?id=80665
When copy/clone an html document, the xmlDoc->type goes from
XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
|