|
9fc5090c
|
2023-09-16T19:58:42
|
|
hash: Clean up libxml/hash.h
Rename variables, fix subincludes, whitespace.
|
|
de4b270a
|
2023-09-21T14:31:31
|
|
autotools: Make --with-minimum disable lzma support
Fix an oversight when handling the --with-minimum option.
|
|
f9d717af
|
2023-09-21T13:05:49
|
|
fuzz: Allow to fuzz without push, reader or output modules
|
|
fe1bfb34
|
2023-09-21T12:33:46
|
|
gitlab-ci: Add a "medium" config build
Also run CI tests with a build where most modules except a few are
disabled. This is the minimum configuration required for libxslt:
--with-tree --with-xpath --with-output --with-html
Also add --with-threads.
|
|
e7f0d88b
|
2023-09-21T01:38:26
|
|
build: Remove some GCC warnings
-Wnested-externs produces spurious warnings after implicit
declaration of functions.
-Winline is useless since we don't use inlines.
-Wredundant-decls was already removed for autotools.
|
|
da274bfa
|
2023-09-21T01:29:40
|
|
build: Fix build when certain modules are disabled
|
|
9b5cce7a
|
2023-09-21T00:44:50
|
|
include: Remove more unnecessary includes
|
|
f0e8358e
|
2023-09-20T23:07:58
|
|
globals: Final fixes
|
|
d6ba4033
|
2023-09-20T20:49:59
|
|
globals: Move remaining declarations to correct places
globals.h is now deprecated. Sanity is restored.
|
|
1117fae0
|
2023-09-20T19:20:41
|
|
include: Remove unneeded includes
|
|
736327df
|
2023-09-20T19:09:15
|
|
include: Break inclusion cycle between tree.h and xmlregexp.h
|
|
699299ca
|
2023-09-20T18:54:39
|
|
globals: Stop including globals.h
|
|
d1336fd3
|
2023-09-20T17:00:50
|
|
globals: Move malloc hooks back to xmlmemory.h
|
|
39a275a5
|
2023-09-18T21:25:35
|
|
globals: Define globals using macros
Declare and define globals and helper functions by (ab)using the
preprocessor.
|
|
a77f9ab8
|
2023-09-20T16:57:22
|
|
globals: Don't include SAX2.h from globals.h
|
|
2e6c49a7
|
2023-09-20T14:43:14
|
|
globals: Don't store xmlParserVersion in global state
This is a constant.
|
|
0830fcfa
|
2023-09-20T14:30:12
|
|
globals: Deprecate xmlLastError
The last error should be accessed with xmlGetLastError.
|
|
db8b9722
|
2023-09-20T13:56:16
|
|
parser: Deprecate global parser options
Note that setting global options has no effect anyway when using any of
the modern parser API functions which take an option argument like
xmlReadMemory or when using xmlCtxtUseOptions.
Global options only have an effect when using old API functions
xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling
xmlCtxtUseOptions.
Unfortunately, many downstream projects still modify global parser
options often without realizing that it has no effect. If necessary,
switch to the modern API. Then you can safely remove all code that
changes global options.
Here's a list of deprecated functions and global variables together with
the corresponding parser options.
- xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue
Parser option XML_PARSE_NOENT
- xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue
Inverse of parser option XML_PARSE_NOBLANKS
- xmlPedanticParserDefault, xmlPedanticParserDefaultValue
Parser option XML_PARSE_PEDANTIC
- xmlLineNumbersDefault, xmlLineNumbersDefaultValue
Always enabled by new API
- xmlDoValidityCheckingDefaultValue
Parser option XML_PARSE_DTDVALID
- xmlGetWarningsDefaultValue
Inverse of parser option XML_PARSE_NOWARNING
- xmlLoadExtDtdDefaultValue
Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR
|
|
209516ac
|
2023-09-20T15:49:03
|
|
tests: Don't use deprecated symbols
|
|
692a5c40
|
2023-09-20T13:51:26
|
|
xmllint: Don't set deprecated globals
|
|
ea29b951
|
2023-09-20T13:30:01
|
|
globals: Abort if lazy allocation of global state failed
There's really nothing we can do in this situation, so it's better to
abort with an error message.
|
|
868b94b8
|
2023-09-20T13:10:29
|
|
globals: Reformat libxml/globals.h
|
|
bbf08608
|
2023-09-20T13:05:02
|
|
globals: Move buffer callback declarations to xmlIO.h
|
|
11a1839d
|
2023-09-20T17:54:48
|
|
globals: Move remaining globals back to correct header files
This undoes a lot of damage.
|
|
dc3382ef
|
2023-09-20T12:58:03
|
|
globals: Move xmlRegisterNodeDefault to tree.c
Code in globals.c must not try to access globals itself since the
accessor macros aren't defined and we would only see the main
variable.
|
|
75976742
|
2023-09-20T12:45:14
|
|
globals: Add a few comments
|
|
7909ff08
|
2023-09-20T17:38:26
|
|
include: Remove unnecessary includes
- Don't include tree.h from encoding.h
- Don't include parser.h from xmlIO.h
|
|
ecbd634c
|
2023-09-19T17:21:30
|
|
threads: Fix double-checked locking in xmlInitParser
Hopefully work around the classic problem with double-checked locking:
Another thread could read xmlParserInitialized == 1 but doesn't see
other initialization results yet due to compiler or hardware reordering.
While unlikely, this seems theoretically possible.
The solution is to add a memory barrier after initializing the data but
before setting xmlParserInitialized. It might be enough to use a second
initialization flag which is only used inside the locked section and
update xmlParserInitialized after unlocking. But I haven't seen this
approach in many articles discussing this issue, so it's possibly
flawed as well.
|
|
f7a403c2
|
2023-09-19T13:52:53
|
|
globals: Move xmlIsMainThread to globals.c
xmlIsMainThread is mainly needed for global variables.
|
|
eb985d6f
|
2023-09-20T17:17:49
|
|
globals: Move error globals back to xmlerror.c
|
|
b173b724
|
2023-09-19T13:17:00
|
|
globals: Use thread-local storage if available
Also use thread-local storage to store globals on POSIX platforms.
Most importantly, this makes sure that global variable access can't fail
when allocating the global state struct.
|
|
e7b6ca15
|
2023-09-18T13:25:06
|
|
globals: Rework global state destruction on Windows
If DllMain is used, rely on it working as expected. The old code seemed
to attempt to free global state of other threads if, for some reason,
the DllMain mechanism didn't work.
In a static build, register a destructor with
RegisterWaitForSingleObject.
Make public functions xmlGetGlobalState and xmlInitializeGlobalState
no-ops.
Move initialization and registration of global state objects to
xmlInitGlobalState. Lookup global state with xmlGetThreadLocalStorage
which can be inlined nicely.
Also cleanup global state when using TLS. xmlLastError must be reset.
|
|
bf6bd161
|
2023-09-18T19:53:31
|
|
globals: Introduce xmlCheckThreadLocalStorage
Checks whether (emulated) thread-local storage could be allocated.
|
|
89f49767
|
2023-09-18T18:44:32
|
|
globals: Make xmlGlobalState private
This removes a public struct but it seems impossible to use its members
in a sensible way from external code.
|
|
a07ec7c1
|
2023-09-18T17:39:13
|
|
threads: Move library initialization code to threads.c
This allows to consolidate the initialization code since the global init
lock was already implemented in threads.c.
|
|
4e1c13eb
|
2023-09-18T14:45:10
|
|
debug: Remove debugging code
This is barely useful these days and only clutters the code base.
|
|
c19771c1
|
2023-09-18T00:54:39
|
|
globals: Move code from threads.c to globals.c
Move all code that handles globals to the place where it belongs.
|
|
2a4b8114
|
2023-09-17T23:16:49
|
|
globals: Rename members of xmlGlobalState
This is a deliberate first step to remove some internals from the
public API and to avoid issues when redefining tokens.
|
|
d7cfe356
|
2023-09-14T20:52:24
|
|
parser: Avoid undefined behavior in xmlParseStartTag2
Instead of using arithmetic on dangling pointers, store ptrdiff_t values
in void pointers which is at least implementation-defined.
|
|
90d5b799
|
2023-09-14T15:30:38
|
|
schemas: Fix memory leak of annotations in notations
Found by OSS-Fuzz.
|
|
99cba4b3
|
2023-09-09T17:46:34
|
|
Handle NOCONFIG case when setting locations from CMake target properties
|
|
4aa08c80
|
2023-09-08T14:52:22
|
|
xinclude: Fix 'last' pointer in xmlXIncludeCopyNode
Also set the 'last' pointer for the root node.
Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/93
|
|
f369154f
|
2023-09-03T22:14:01
|
|
cmake: Generate better pkg-config file for SYSROOT builds under CMake
I recently fixed this for Autotools but said that fixing this for CMake
was not feasible due to it using `find_package` rather than
`pkg_check_modules`. I then thought about it and couldn't find any
reason why CMake couldn't try `pkg_check_modules` first and then fall
back to `find_package`, as that's basically what Autotools does.
I had wanted to use the linker flags generated by CMake when it does
fall back to `find_package`, but it only returns direct paths to the
libraries, as opposed to `-l` flags. Baking these library paths into the
pkg-config and xml2-config files would break static linking and
cross-compiling, so I've stuck with the `-l` flags we already have.
There is no need to set `CMAKE_REQUIRED_LIBRARIES` because we already
add the dependencies to the library target.
|
|
5a18c505
|
2023-09-04T09:30:38
|
|
autoconf: Include non-pkg-config dependency flags in the pkg-config file
These were present before, but I accidentally dropped them in my recent
build improvements.
|
|
6864d92f
|
2023-09-04T09:25:44
|
|
autoconf: Don't bake build time CFLAGS into pkg-config file
Having slept on it, I've realised that baking the dependency CFLAGS into
the pkg-config file is pointless when it is only used to link against
them. It may even cause problems.
|
|
efcaeadc
|
2023-09-04T16:00:53
|
|
hash: Fix use-of-uninitialized-value
Short-lived regression.
|
|
05c28305
|
2023-09-04T15:50:22
|
|
dict: Stop using uint32_t
stdint.h is a C99 header.
|
|
f45abbd3
|
2023-09-04T15:31:04
|
|
dict: Fix integer overflow of string lengths
Fixes #546.
|
|
edc2dd48
|
2023-09-04T16:07:23
|
|
dict: Update hash function
Update hash function from classic Jenkins OAAT (dict.c) and a variant of
DJB2 (hash.c) to "GoodOAAT" taken from the SMHasher repo. This hash
function passes all SMHasher tests.
|
|
93e8bb2a
|
2023-09-02T17:12:58
|
|
build: Generate better pkg-config files for static-only builds
pkg-config supports `Requires.private` and `Libs.private` fields for
static linking. However, if you're building a dynamic binary, then
pkg-config will use the non-private fields, even if just the static
libxml2 is available. This will result in libxml2 being underlinked,
causing the build to fail. The solution is to fold the private fields
into the non-private fields when the shared libxml2 is not being built.
This works for Autotools and CMake. Meson also knows how to handle this
when it automatically generates pkg-config files.
|
|
4640ccac
|
2023-09-02T16:18:30
|
|
build: Generate better pkg-config file for SYSROOT builds
The -I and -L flags you use to build should not necessarily be the same
ones you bake into installed files. If you are building with
dependencies located under a SYSROOT then the installed files should
have no knowledge of that SYSROOT. For example, if the build requires
`-L/path/to/sysroot/usr/lib/foo` then only `-L/usr/lib/foo` should be
baked into the installed files.
pkg-config is SYSROOT-aware, so this issue can be sidestepped by using
the `Requires` field rather than the `Libs` and `Cflags` fields. This is
easily resolved if you rely solely on pkg-config, but this project falls
back to standard Autoconf checks, so a little more effort is required.
Unfortunately, this issue cannot feasibly be resolved for CMake.
`find_package` is used rather than `pkg_check_modules`, so we cannot
tell whether a pkg-config file for each dependency is present or not,
even if `find_package` uses pkg-config behind the scenes. The CMake
build does not record any dependency -I or -L flags into the pkg-config
file anyway. This is a problem in itself, although these dependencies
are most likely installed to standard locations.
Meson is very much better at handling this, as it generates the
pkg-config file automatically using the correct logic.
|
|
54a0b19a
|
2023-09-01T14:52:14
|
|
autoconf: Allow custom --with-icu configure option
|
|
c5989473
|
2023-09-01T14:52:11
|
|
dict: Use thread-local storage for PRNG state
|
|
57cfd221
|
2023-09-01T14:52:04
|
|
dict: Use xoroshiro64** as PRNG
Stop using rand_r. This enables hash randomization on all platforms.
|
|
6d7aaaa8
|
2023-09-01T14:51:55
|
|
dict: Tune hash table growth
Introduce load factor as main trigger and increase MAX_HASH_LEN. This
should make growth behavior more predictable.
Raise size limit to INT_MAX. This avoids quadratic behavior with larger
tables.
|
|
4b8f7cf0
|
2023-09-01T13:07:27
|
|
hash: Fix integer overflow of nbElems
|
|
bfd7d286
|
2023-08-29T21:16:34
|
|
xmllint: Fix more error messages
|
|
373244bc
|
2023-08-29T21:05:32
|
|
xmllint: Fix error message when push parsing empty documents
|
|
53050b1d
|
2023-08-29T20:06:43
|
|
parser: More fixes to push parser error handling
|
|
bbd918b2
|
2023-08-29T15:56:37
|
|
parser: Fix detection of null bytes
Also suppress misleading extra errors.
Fixes #122.
|
|
c6083a32
|
2023-08-29T16:30:22
|
|
parser: Improve error handling in push parser
- Report errors earlier
- Align error messages with pull parser
|
|
1edae30f
|
2023-08-29T15:58:22
|
|
parser: Don't check inputNr in xmlParseTryOrFinish
There's no apparent reason for this check. inputNr should always be 1
here.
|
|
e48f2695
|
2023-08-29T17:41:18
|
|
parser: Remove push parser debugging code
|
|
cde44997
|
2023-08-27T16:35:23
|
|
SAX2: Allow multiple top-level elements
When parsing with HTML_PARSE_NOIMPLIED, the result document can contain
multiple top-level elements. Rework xmlSAX2StartElement to simply add
the element as a child of ctxt->node or ctxt->myDoc.
Don't invoke xmlAddSibling for non-element parents. The context node
should always be an element node.
Fixes #584.
|
|
d39f7806
|
2023-08-23T20:24:24
|
|
tree: Fix copying of DTDs
- Don't create multiple DTD nodes.
- Fix UAF if malloc fails.
- Skip DTD nodes if tree module is disabled.
Fixes #583.
|
|
4e4c89a4
|
2023-08-21T00:26:01
|
|
doc: Improve documentation of configuration options
|
|
778cca38
|
2023-08-20T22:50:57
|
|
legacy: Add stubs for disabled modules
When legacy support is requested, always enable stubs for FTP and
XPointer location modules which were removed from the standard
configuration. Going forward, the --with-legacy configuration option
should be used to provide maximum ABI compatibility.
Fixes #433.
|
|
ed3bd052
|
2023-08-20T20:48:10
|
|
parser: Allow to set maximum amplification factor
|
|
9d80a2b1
|
2023-08-16T19:45:34
|
|
entities: Don't change doc when encoding entities
doc->encoding shouldn't be touched by xmlEncodeEntitiesInternal.
|
|
f1c1f5c6
|
2023-08-16T19:43:02
|
|
parser: Revert change to doc->encoding
Fixes #579.
|
|
61b8e097
|
2023-08-16T19:20:47
|
|
parser: Never use UTF-8 encoding handler
|
|
507f11ed
|
2023-08-16T15:43:47
|
|
encoding: Remove debugging code
|
|
138213ac
|
2023-08-15T12:49:27
|
|
python: Fix tests on MinGW
Add the directory containing libxml2.dll with os.add_dll_directory to
make tests work on MinGW.
This has changed in Python 3.8 but for some reason, the issue only
turned up with Python 3.11 on MinGW. Contrary to documentation, copying
libxml2.dll into the directory containing the .pyd file doesn't work.
|
|
e2ab48b9
|
2023-08-14T15:05:30
|
|
malloc-fail: Fix unsigned integer overflow in xmlTextReaderPushData
Return immediately if xmlParserInputBufferRead fails.
Found by OSS-Fuzz, see #344.
|
|
0d24fc0a
|
2023-08-14T12:53:49
|
|
html: Remove encoding hack in htmlCreateFileParserCtxt
Switch encoding directly instead of calling htmlCheckEncoding with faked
content.
|
|
5db5a704
|
2023-08-09T18:39:14
|
|
html: Fix UAF in htmlCurrentChar
Short-lived regression found by OSS-Fuzz.
|
|
b973ceaf
|
2023-08-09T18:37:20
|
|
parser: Fix mistake in xmlDetectEncoding
Short-lived regression.
|
|
cb717d7e
|
2023-08-09T16:52:02
|
|
parser: Update line number after coalescing text nodes
This should make the line number of text nodes deterministic. Before,
it depended on the callback sequence which depends on the size of chunks
fed to the parser.
|
|
855818bd
|
2023-08-08T15:21:37
|
|
parser: Check for truncated multi-byte sequences
When decoding input data, check whether the "raw" buffer is empty after
parsing the document. Otherwise, the input ends with a truncated
multi-byte sequence which shouldn't be silently ignored.
|
|
95e81a36
|
2023-08-08T15:21:31
|
|
parser: Decode all data in xmlCharEncInput
Even with flush set to true, xmlCharEncInput didn't guarantee to decode
all data. This complicated the push parser.
Remove the flush flag and always decode all available data.
Also fix ICU code where the flush flag has a different meaning. Always
set flush to false and retry even with empty input buffers.
|
|
834b8123
|
2023-08-08T15:21:28
|
|
parser: Stream data when reading from memory
Don't create a copy of the whole input buffer. Read the data chunk by
chunk to save memory.
Historically, it was probably envisioned to read data from memory
without additional copying. This doesn't work reliably with the current
design of the XML parser which requires a terminating null byte at the
end of input buffers. This lead to xmlReadMemory interfaces, which
expect pointer and size arguments, being changed to make a
zero-terminated copy of the input buffer. Interfaces based on
xmlReadDoc, which actually expect a zero-terminated string and
would make zero-copy operation work, were then simplified to rely on
xmlReadMemoryi, resulting in an unnecessary copy.
To avoid copying (possibly gigabytes) of memory temporarily, we now
stream in-memory input just like content read from files in a
chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of
250 bytes). As a side effect, we also avoid another copy of the whole
input when handling non-UTF-8 data which was made possible by some
earlier commits.
Interfaces expecting zero-terminated strings now make use of strnlen
which unfortunately isn't part of the standard C library and only
mandated since POSIX 2008.
|
|
5aff27ae
|
2023-08-08T15:21:25
|
|
parser: Optimize xmlLoadEntityContent
Load entity content via xmlParserInputBufferGrow, avoiding a copy.
This also fixes an entity size accounting error.
|
|
facc2a06
|
2023-08-08T15:21:21
|
|
parser: Don't overwrite EOF parser state
|
|
59fa0bb3
|
2023-08-08T15:21:14
|
|
parser: Simplify input pointer updates
The base member always points to the beginning of the buffer.
|
|
c88ab7e3
|
2023-08-08T15:19:54
|
|
parser: Don't reinitialize parser input members
The parser input struct should already be initialized.
|
|
4ee08155
|
2023-08-08T15:19:51
|
|
encoding: Move rawconsumed accounting to xmlCharEncInput
|
|
a0462e2d
|
2023-08-08T15:19:49
|
|
test: Add push parser test with overridden encoding
After recent changes, it should work to call xmlSwitchEncoding to
override the encoding for the push parser. This was never properly
supported, so Chromium and WebKit added a hack to reset the encoding in
the startDocument SAX handler.
|
|
ec7be506
|
2023-08-08T15:19:46
|
|
parser: Rework encoding detection
Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set
when xmlSwitchEncoding is called. The parser can use the flag to
reliably detect whether an encoding was already set via user override,
BOM or other auto-detection. In this case, the encoding declaration
won't be used to switch the encoding.
Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding
and ctxt->input->buf->encoder was used.
Introduce private helper functions to switch encodings used by both the
XML and HTML parser:
- xmlDetectEncoding which skips over the BOM, allowing to remove the
BOM checks from other encoding functions.
- xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns
about encoding mismatches.
If users override the encoding, store the declared instead of the actual
encoding in xmlDoc. In this case, the actual encoding is known and the
raw value from the doc is more useful.
Also use the input flags to store the ISO-8859-1 fallback state.
Restrict the fallback to cases where no encoding was specified. (The
fallback is only useful in recovery mode and these days broken UTF-8 is
probably more likely than ISO-8859-1, so it might eventually be removed
completely.)
The 'charset' member of xmlParserCtxt is now unused. The 'encoding'
member of xmlParserInput is now unused.
The 'standalone' member of xmlParserInput is renamed to 'flags'.
A new parser state XML_PARSER_XML_DECL is added for the push parser.
|
|
d38e73f9
|
2023-08-08T15:19:44
|
|
parser: Always create UTF-8 in xmlParseReference
It seems that this code path could only be triggered after an encoding
error in recovery mode. Creating char-ref nodes is unnecessary and
typically unexpected.
|
|
131d0dc0
|
2023-08-08T15:19:39
|
|
parser: Don't use 'standalone' member of xmlParserInput
The standalone declaration is only parsed in the main input stream.
|
|
d9ec182b
|
2023-08-08T15:19:36
|
|
parser: Don't detect encoding in xmlCtxtResetPush
The encoding will be detected in xmlParseTryOrFinish.
|
|
3a64f394
|
2023-08-08T15:19:25
|
|
html: Remove some debugging code in htmlParseTryOrFinish
|
|
58de9d31
|
2023-08-03T12:00:55
|
|
valid: Fix c1->parent pointer in xmlCopyDocElementContent
Fixes #572.
|
|
75693281
|
2023-07-21T14:50:30
|
|
malloc-fail: Fix memory leak in xmlCompileAttributeTest
Found by OSS-Fuzz, see #344.
|
|
90bcbcfc
|
2023-07-20T21:08:01
|
|
parser: Fix potential use-after-free in xmlParseCharDataInternal
Return immediately if a SAX handler stops the parser.
Fixes #569.
|
|
88447447
|
2023-06-23T23:04:30
|
|
parser: Fix typo in previous commit
|
|
9d0541dd
|
2023-06-22T18:06:53
|
|
parser: Make xmlSwitchEncoding always skip the BOM
Chromium calls xmlSwitchEncoding from the start document handler and
relies on this function to skip the BOM. Commit 98840d40 changed the
behavior when switching to UTF-16 since inspecting the input buffer at
this point is fragile.
Revert part of the commit to also skip a potential (decoded UTF-8) BOM
when switching to UTF-16. Make sure that we do this only at the start of
an input stream to avoid U-FEFF characters being lost.
BOM handling should ultimately be moved to the parsing code to avoid
such bugs.
See https://bugs.chromium.org/p/chromium/issues/detail?id=1451026
|
|
2473b485
|
2023-06-21T14:15:02
|
|
autotools: fix Python module file ext for cygwin/msys2
both use .dll, not .pyd
|
|
5f54bac9
|
2023-06-10T10:50:02
|
|
testapi: test_xmlSAXDefaultVersion() leaves xmlSAX2DefaultVersionValue set to 1 with LIBXML_SAX1_ENABLED
Add code to save and to restore the default value of
xmlSAX2DefaultVersionValue.
Fixes #554.
|
|
b236b7a5
|
2023-06-08T21:53:05
|
|
parser: Halt parser when growing buffer results in OOM
Fix short-lived regression from previous commit.
It might be safer to make xmlBufSetInputBaseCur use the original buffer
even in case of errors.
Found by OSS-Fuzz.
|