|
2f3655c9
|
2025-05-20T19:40:06
|
|
parser: Pop PEs that start markup declarations explicitly
We currently only handle "Validity constraint: Proper Declaration/PE
Nesting", but we must detect "Well-formedness constraint: PE Between
Declarations" separately:
> The replacement text of a parameter entity reference in a DeclSep must
> match the production extSubsetDecl.
PEs in DeclSeps are PEs that start with a full markup declaration (or
another PE). These are handled in xmParse{Internal|External}Subset. We
set a flag on these PEs and don't close them implicitly in
xmlSkipBlankCharsPE. This will make unterminated declarations in such
PEs cause a parser error. The PEs are closed explicitly in
xmParse{Internal|External}Subset, the only location where they are
allowed to end.
|
|
dd1961e0
|
2025-05-20T16:37:18
|
|
valid: Skip more validity checks if not validating
|
|
fca0860d
|
2025-05-19T21:17:39
|
|
tree: Deprecate public struct members related to DTDs
Let's deprecate these members for now. If these are really used, they
can be undeprecated later.
|
|
7c9b5535
|
2025-05-19T19:10:55
|
|
doc: Document unused error domains
|
|
7008740a
|
2025-05-18T01:52:38
|
|
parser: Consolidate scanning of XML Names
Use new productions by default.
Fixes #194.
Fixes #364.
See #707.
|
|
210f5a37
|
2025-05-16T21:18:16
|
|
chvalid: Mark functions as deprecated
|
|
954aae90
|
2025-05-16T21:13:17
|
|
doc: Improve regexp documentation
|
|
c5b45fbc
|
2025-05-16T16:54:09
|
|
doc: Misc fixes
|
|
c4926b19
|
2025-05-16T02:12:23
|
|
codegen: Merge xmlunicode.c into xmlregexp.c
Include generated parts.
Generate xmlChRangeGroups instead of functions for Unicode blocks.
|
|
4cb767e9
|
2025-05-16T01:52:44
|
|
codegen: Only generate tables for character ranges
The rest can be easily maintained manually.
|
|
6f4b4527
|
2025-05-15T23:43:32
|
|
parser: Stop using ctxt->linenumbers
I think this was used to avoid setting the `line` member before it was
added (20+ years ago).
|
|
a05fa9a9
|
2025-05-15T18:41:35
|
|
codegen: Rerun codegen scripts
|
|
a40f36e7
|
2025-05-14T04:04:28
|
|
include: Stop using *Ptr typedefs in public headers
|
|
2d83a84c
|
2025-05-14T00:29:19
|
|
doc: Misc improvements
|
|
f0983199
|
2025-05-12T13:00:20
|
|
html: Map some encodings according to HTML5
Windows-1252 is a superset of ISO-8859-1 and should be used instead.
Same for ASCII.
Also map UCS-2 and UTF-16 to UTF-16LE.
|
|
628006f4
|
2025-05-12T11:47:40
|
|
encoding: Add windows-1252
Fixes #915.
|
|
f602c0c1
|
2025-05-12T00:04:22
|
|
html: Rework serialization of meta encoding attributes
Don't allocate memory.
|
|
0674ccb7
|
2025-05-11T20:55:57
|
|
html: Stop omitting end tags when serializing
Align with HTML5.
|
|
05b8fe0a
|
2025-04-12T23:10:40
|
|
html: Don't escape RAWTEXT and PLAINTEXT
Align with HTML5.
|
|
777e2adf
|
2025-05-09T23:53:03
|
|
io: Consolidate escaping code
Use generated table approach of xmlSerializeText for xmlEscapeText.
Move most code to xmlIO.c.
|
|
dad11630
|
2025-05-09T22:05:38
|
|
entities: Always replace invalid chars when escaping
The previous refactor painstakingly recreated the different behavior of
separate functions that were merged. It makes
Optimize IS_CHAR check for non-ASCII chars.
|
|
971038e5
|
2025-05-09T20:26:33
|
|
html: Call lower-level escaping functions
Removes the need to pass a document around.
|
|
63535d39
|
2025-05-09T20:13:43
|
|
tree: Make xmlNodeListGetStringInternal work with escape flags
|
|
442c1903
|
2025-05-09T18:52:36
|
|
doc: Fix some damage from automated conversions
Add some newlines, fix returns.
|
|
98a61c9d
|
2025-05-09T16:48:09
|
|
doc: Fix briefs in tree docs
|
|
46f05ea4
|
2025-05-09T00:21:47
|
|
html: Rework meta charset handling
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.
Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.
Add full support for <meta charset=""> which is also used when adding
meta tags.
Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.
Only switch encoding once when parsing.
Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.
Fixes #909.
|
|
38ea8fa9
|
2025-05-06T18:31:45
|
|
doc: Fix varargs
|
|
9bbffec5
|
2025-05-06T17:42:46
|
|
doc: Move brief to top, params to bottom of doc comments
|
|
ab13fbfd
|
2025-05-06T14:06:43
|
|
doc: Misc fixes to error docs
|
|
b1685459
|
2025-05-06T12:50:52
|
|
doc: Misc fixes to xmlsave docs
|
|
298f70b3
|
2025-05-05T21:36:36
|
|
doc: Misc fixes to HTML tree docs
|
|
80b6429f
|
2025-05-04T19:13:24
|
|
doc: Misc fixes to encoding docs
|
|
81ac2e27
|
2025-05-04T18:41:44
|
|
doc: Misc fixes to valid docs
|
|
714decd6
|
2025-05-04T17:50:26
|
|
doc: Misc fixes to entities docs
|
|
f38f3e7b
|
2025-05-04T16:49:49
|
|
doc: Misc fixes to IO documentation
|
|
e6cfd049
|
2025-05-04T14:52:42
|
|
doc: Misc fixes to tree docs
|
|
1bf44f09
|
2025-05-04T02:15:25
|
|
doc: Misc fixes to parser docs
|
|
b7274fb0
|
2025-05-03T16:34:02
|
|
doc: Misc fixes to HTML parser docs
|
|
411f30ef
|
2025-05-03T16:21:15
|
|
doc: Don't document legacy HTML parser macros
|
|
4a010875
|
2025-05-03T15:38:15
|
|
doc: Move parser option docs to enum
|
|
a449c5fd
|
2025-05-03T01:31:09
|
|
catalog: Deprecate some functions
|
|
075283d4
|
2025-05-03T00:17:39
|
|
xlink: Deprecate remaining public function
This was never finished.
|
|
2c150e62
|
2025-05-02T20:18:34
|
|
doc: Formatting fixes
|
|
08a282f9
|
2025-05-02T20:12:52
|
|
doc: Doxygen fixes for xmlversion.h
|
|
e78e05c9
|
2025-05-02T17:32:51
|
|
doc: Fix autolinks to functions
Unfortunately, autolinks in .c files aren't converted by Doxygen for
some reason.
|
|
f7c41287
|
2025-05-02T15:57:17
|
|
doc: Remove more comment block headers
|
|
0ffa7dd8
|
2025-05-02T14:52:03
|
|
include: Add hyperlink to deprecation warnings
Doxygen creates a nice "deprecated list" for us.
|
|
1eca6e34
|
2025-04-30T00:54:00
|
|
parser: Deprecate xmlClearParserCtxt
|
|
e525564f
|
2025-05-01T19:20:06
|
|
doc: Remove empty lines at start of block
These lines were left over after automatic conversion.
|
|
fd6ab89b
|
2025-04-28T15:58:19
|
|
doc: Adjust documentation of public structs
|
|
8816f267
|
2025-04-28T14:55:47
|
|
doc: Adjust documentation of enums
|
|
e549622b
|
2025-04-28T15:11:24
|
|
doc: Convert documentation to Doxygen
Automated conversion based on a few regexes.
|
|
69879da8
|
2025-04-28T14:04:30
|
|
doc: Remove email addresses from documentation
Also remove authorship information from generated files, hash.c and
globals.c which were rewritten.
|
|
61890e39
|
2025-04-27T21:50:15
|
|
doc: Prepare for conversion to Doxygen
Fix many params in internal functions (not really necessary but Doxygen
warns about that in XML mode).
Fix formatting in a few corner cases that automatic conversion can't
handle.
Rearrange some DOC_DISABLE blocks.
|
|
87b30343
|
2025-04-29T20:00:01
|
|
io: Fix linkage of __xml*BufferCreateFilename functions
Make these functions usable on Windows.
|
|
fc8899d4
|
2025-04-27T12:59:41
|
|
parser: Make xmlCtxtGetValidCtxt depend on VALID_ENABLED
|
|
b85d77d1
|
2025-04-20T14:31:24
|
|
http: Remove built-in HTTP client
Stubs are retained for ABI compatibility.
Fixes #631.
Obsoletes #160.
|
|
4ba1f923
|
2025-04-18T17:28:24
|
|
html: Avoid HTML_PARSE_HTML5 clashing with XML_PARSE_NOENT
There are several users that pass invalid XML parser options to the
HTML parser. Choose a value that is less likely to clash.
|
|
aa4ef773
|
2025-04-17T19:53:14
|
|
parser: Deprecate output-related globals
|
|
fc4adba9
|
2025-04-12T16:26:07
|
|
error: Fix initGenericErrorDefaultFunc compatibility macro
|
|
97ffa77d
|
2025-04-10T17:36:58
|
|
encoding: Deprecate non-thread-safe functions
|
|
2ecc08f6
|
2025-04-09T21:11:47
|
|
html: Deprecate more functions
|
|
b3492259
|
2025-03-14T00:01:11
|
|
include: Change some return types from int to enum
This also affects some new functions from 2.13.
|
|
fd1b9391
|
2025-03-13T23:20:16
|
|
include: Convert some macros to enums
|
|
84c6524e
|
2025-03-13T19:45:35
|
|
encoding: Support input-only and output-only converters
Make it possible to open an encoding handler only for input or output.
This avoids the creation of unnecessary converters.
Should also fix #863.
|
|
69b83bb6
|
2025-03-10T02:18:51
|
|
encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
|
|
03a8f1dd
|
2025-03-11T18:53:24
|
|
doc: Document SAX handlers a little more
|
|
87c9e000
|
2025-03-09T22:20:23
|
|
encoding: Rework custom encoding implementation API
|
|
ba9148d8
|
2025-03-09T20:30:49
|
|
parser: Undeprecate input->consumed
Should be deprecated after fixing #762.
|
|
a0dbf030
|
2025-03-09T20:24:06
|
|
parser: Undeprecate ctxt->loadsubset
Should be deprecated after fixing #873.
|
|
d96911f1
|
2025-03-08T23:00:29
|
|
doc: Documentation fixes
|
|
5f0b1378
|
2025-03-08T22:07:15
|
|
parser: Add more parser context accessors
Fixes #763.
|
|
38f47507
|
2025-03-05T21:06:05
|
|
encoding: Make conversion callbacks more type-safe
|
|
a846d964
|
2025-03-05T16:49:42
|
|
encoding: Remove compatibility struct members
|
|
94d8a3e2
|
2025-03-05T14:56:46
|
|
parser: Convert xmlParserMaxDepth to macro
|
|
69657224
|
2025-03-04T20:32:02
|
|
globals: Remove unused globals
- xmlBufferAllocScheme
- xmlDefaultBufferSize
- xmlParserDebugEntities
|
|
92d7b0cd
|
2025-03-04T20:18:11
|
|
xpath: Rename valuePush and valuePop
|
|
03be993c
|
2025-03-04T18:42:35
|
|
Use memcpy to avoid pointer cast warnings
|
|
f502e9b2
|
2025-03-04T17:23:44
|
|
include: Add more deprecation warnings
|
|
85bd58ef
|
2025-03-04T16:07:40
|
|
globals: Remove functions related to global state handling
- xmlGetGlobalState
- xmlInitializeGlobalState
- xmlGetThreadId
- xmlIsMainThread
|
|
03a8d5f9
|
2025-03-04T16:00:08
|
|
unicode: Make Unicode functions private
|
|
3d37ff84
|
2025-03-04T15:10:09
|
|
globals: Also use global state struct if threads are disabled
|
|
a15ad9b2
|
2025-03-04T14:06:50
|
|
parser: Remove compatibility symbols
|
|
8e871162
|
2025-03-04T13:36:55
|
|
parser: Remove oldXMLWDcompatibility
|
|
cdc5cfed
|
2025-03-04T13:26:51
|
|
legacy: Remove legacy symbols
|
|
3250a01d
|
2025-03-04T13:15:42
|
|
error: Convert initGenericErrorDefaultFunc to macro
|
|
c42b3227
|
2025-03-04T13:11:18
|
|
parser: Convert inputPush and inputPop to macros
|
|
361f7bff
|
2025-03-04T13:02:36
|
|
parser: Make nodePush, nodePop, namePush, namePop private
|
|
0b27097a
|
2025-03-04T12:55:25
|
|
encoding: Rename unprefixed public functions
|
|
e50d314a
|
2025-02-25T23:07:19
|
|
build: Add separate configuration option for RELAX NG
Support for RELAX NG used to be enabled together with XML Schema support
(--with-schemas). Now there's a separate option and a new feature macro
LIBXML_RELAXNG_ENABLED.
|
|
7ae8e8ac
|
2025-02-22T21:06:34
|
|
schemas: Make xmlSchemaDump depend on DEBUG_ENABLED
|
|
6fc26076
|
2025-02-22T20:31:45
|
|
regexp: Hide debugging code behind DEBUG_REGEXP
xmlRegexpPrint is now a deprecated no-op.
|
|
9c16a153
|
2025-02-13T18:41:33
|
|
Revert "include: Make most IS_* macros private"
This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
|
|
93506d41
|
2025-01-29T00:17:01
|
|
parser: Make catalog PIs opt-in
This is an obscure feature that shouldn't be enabled by default.
|
|
1082d813
|
2025-01-28T23:21:34
|
|
parser: Prepare to make decompression opt-in
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
|
|
a78843be
|
2025-01-28T20:13:58
|
|
xmllint: Support compressed input from stdin
Another regression related to reading from stdin.
Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.
This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.
Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
|
|
bfe6af2e
|
2025-01-17T17:09:04
|
|
fuzz: Remove hacks to build lint fuzzer
Don't include source file directly.
|
|
e4194110
|
2025-01-17T16:00:05
|
|
schemas: Make ValidateStream take a const SAXHandler
|
|
c134e8b4
|
2024-12-19T21:05:49
|
|
include: Make INPUT_CHUNK macro private
|
|
84a6c82f
|
2024-12-19T20:59:10
|
|
include: Make most IS_* macros private
Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.
|