|
e0dd330b
|
2023-09-29T00:18:44
|
|
parser: Use hash tables to avoid quadratic behavior
Use a hash table to lookup namespaces by prefix. The hash table stores
an index into the namespace table. Auxiliary data for namespaces is
stored in a separate array along the main namespace table.
Use a hash table to verify attribute uniqueness. The hash table stores
an index into the attribute table.
Reuse hash value from the dictionary to avoid computing them twice.
See #346.
|
|
f1c1f5c6
|
2023-08-16T19:43:02
|
|
parser: Revert change to doc->encoding
Fixes #579.
|
|
ec7be506
|
2023-08-08T15:19:46
|
|
parser: Rework encoding detection
Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set
when xmlSwitchEncoding is called. The parser can use the flag to
reliably detect whether an encoding was already set via user override,
BOM or other auto-detection. In this case, the encoding declaration
won't be used to switch the encoding.
Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding
and ctxt->input->buf->encoder was used.
Introduce private helper functions to switch encodings used by both the
XML and HTML parser:
- xmlDetectEncoding which skips over the BOM, allowing to remove the
BOM checks from other encoding functions.
- xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns
about encoding mismatches.
If users override the encoding, store the declared instead of the actual
encoding in xmlDoc. In this case, the actual encoding is known and the
raw value from the doc is more useful.
Also use the input flags to store the ISO-8859-1 fallback state.
Restrict the fallback to cases where no encoding was specified. (The
fallback is only useful in recovery mode and these days broken UTF-8 is
probably more likely than ISO-8859-1, so it might eventually be removed
completely.)
The 'charset' member of xmlParserCtxt is now unused. The 'encoding'
member of xmlParserInput is now unused.
The 'standalone' member of xmlParserInput is renamed to 'flags'.
A new parser state XML_PARSER_XML_DECL is added for the push parser.
|
|
fc69cf56
|
2023-04-30T17:51:29
|
|
parser: Move xmlFatalErr to parserInternals.c
|
|
04d1bedd
|
2023-03-21T13:08:44
|
|
parser: Rework shrinking of input buffers
Don't try to grow the input buffer in xmlParserShrink. This makes sure
that no memory allocations are made and the function always succeeds.
Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of
DTD parsing loops.
Shrink before growing.
|
|
b167c731
|
2023-03-14T14:42:36
|
|
parser: Fix short-lived regression causing infinite loops
Fix 3eb6bf03. We really have to halt the parser, so the input buffer
gets reset.
|
|
2099441f
|
2023-03-13T17:51:13
|
|
parser: Stop calling xmlParserInputShrink
Introduce xmlParserShrink which takes a parser context to simplify error
handling.
|
|
3eb6bf03
|
2023-03-12T16:47:15
|
|
parser: Stop calling xmlParserInputGrow
Introduce xmlParserGrow which takes a parser context to simplify error
handling.
|
|
ccb6d544
|
2022-11-27T02:09:27
|
|
Hide internal functions
These functions were never declared in public headers, so it should be
safe to hide them.
Fixes #139.
|
|
0f568c0b
|
2022-08-26T01:22:33
|
|
Consolidate private header files
Private functions were previously declared
- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.
Consolidate all private header files in include/private.
|