parser.c


Log

Author Commit Date CI Message
Nick Wellnhofer 69b83bb6 2025-03-10T02:18:51 encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.
Nick Wellnhofer 8696ebe1 2025-03-11T14:32:35 parser: Fix ignorableWhitespace callback If ignorableWhitespace differs from the "characters" callback, we have to check for blanks as well. Regressed with 1f5b537.
Nick Wellnhofer 25490528 2025-03-11T10:54:34 parser: Fix spurious error in SAX mode Short-lived regression from 5f0b1378.
Nick Wellnhofer 5f0b1378 2025-03-08T22:07:15 parser: Add more parser context accessors Fixes #763.
Nick Wellnhofer 94d8a3e2 2025-03-05T14:56:46 parser: Convert xmlParserMaxDepth to macro
Nick Wellnhofer 03a8d5f9 2025-03-04T16:00:08 unicode: Make Unicode functions private
Nick Wellnhofer cdc5cfed 2025-03-04T13:26:51 legacy: Remove legacy symbols
Nick Wellnhofer c42b3227 2025-03-04T13:11:18 parser: Convert inputPush and inputPop to macros
Nick Wellnhofer 361f7bff 2025-03-04T13:02:36 parser: Make nodePush, nodePop, namePush, namePop private
Nick Wellnhofer 05bd1720 2025-03-01T10:25:29 parser: Fix parsing of DTD content Regressed in 2.11. Fixes #868.
Nick Wellnhofer e50d314a 2025-02-25T23:07:19 build: Add separate configuration option for RELAX NG Support for RELAX NG used to be enabled together with XML Schema support (--with-schemas). Now there's a separate option and a new feature macro LIBXML_RELAXNG_ENABLED.
Nick Wellnhofer b4d3d87e 2025-02-01T22:02:33 parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.
Nick Wellnhofer 57e4bbd8 2025-01-31T16:45:35 parser: Improve handling of NOCDATA option Don't modify the callback structure. This makes sure that unsetting the option works.
Nick Wellnhofer 1f5b5371 2025-01-31T16:21:20 parser: Improve handling of NOBLANKS option Don't change the SAX handler. Use a helper function to invoke "characters" SAX callback. The old code didn't advance the input pointer consistently before invoking the callback. There was also some inconsistency wrt to ctxt->space handling. I don't understand the ctxt->space thing, but now we always behave like the non-complex case before.
Nick Wellnhofer 7a8722f5 2025-01-31T14:55:29 parser: Document that XML_PARSE_NOBLANKS is broken Long text content can generate multiple "characters" callbacks which can lead to NOBLANKS removing whitespace in non-whitespace text nodes. So the NOBLANKS option doesn't even work reliably with the pull parser. This would be extremely hard to fix. Unfortunately, `xmllint --format` relies on this option which is another reason why this feature never really worked.
Nick Wellnhofer 9efe1414 2025-01-31T13:07:35 parser: Fix detection of ']]>' when push-parsing Fixes #850.
Nick Wellnhofer 115b13f9 2025-01-30T23:18:56 parser: Document push parser limitations
Nick Wellnhofer 53a48468 2025-01-30T15:15:30 xmllint: Make --push report parse errors The push parser leaves documents in ctxt->myDoc even if they're invalid. Also fix documentation. Regressed with f8ff4d86.
Nick Wellnhofer 5535721f 2025-01-30T01:27:03 parser: Grow input buffer after lots of whitespace Make sure that the input buffer is grown after consuming large amounts of whitespace. Also move a comment.
Nick Wellnhofer 218264fa 2025-01-30T01:26:01 parser: Always shrink input buffer Shrinking the input buffer is cheap now and should be done as soon as possible.
Nick Wellnhofer 93506d41 2025-01-29T00:17:01 parser: Make catalog PIs opt-in This is an obscure feature that shouldn't be enabled by default.
Nick Wellnhofer 1082d813 2025-01-28T23:21:34 parser: Prepare to make decompression opt-in Add a new parser option XML_PARSE_UNZIP that enables decompression. xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set this option currently, but downstream users should start to set the option if they really need it.
Nick Wellnhofer a78843be 2025-01-28T20:13:58 xmllint: Support compressed input from stdin Another regression related to reading from stdin. Making a "-" filename read from stdin was deeply baked into the core IO code but is inherently insecure. I really want to reenable this dangerous feature as sparingly as possible. This now enables compressed input when using the "Fd" API functions which wan't supported before. But XML_PARSE_NO_UNZIP will be inverted later. Allow compressed stdin in xmlReadFile to support xmlstarlet and older versions of xsltproc. So far, these are the only known command-line tools that rely on "-" meaning stdin.
Nick Wellnhofer ca819160 2025-01-03T20:50:08 include: Use intptr_t to cast between pointers and ints
Nick Wellnhofer 2e3a91a7 2024-12-26T21:05:18 doc: Fix documentation
Nick Wellnhofer 8231c036 2024-12-15T23:36:04 parser: Check reallocations for overflow
Nick Wellnhofer 6548ba11 2024-12-13T16:37:40 parser: Fix argument checks in xmlCtxtParse* - Raise invalid argument error. - Free input stream if ctxt is NULL.
Nick Wellnhofer eae9a1bd 2024-11-26T14:18:22 parser: Pop input stream in xmlCtxtValidateDtd
Nick Wellnhofer dafcefb2 2024-11-25T22:22:26 parser: Fail on catastrophic errors in recovery mode
Nick Wellnhofer 0dc26910 2024-11-20T21:04:19 parser: Deprecate more internal functions
Nick Wellnhofer 84a6eece 2024-11-18T20:40:47 parser: Remove unneeded call to xmlDetectEncoding
Nick Wellnhofer 497081ba 2024-11-17T20:25:07 parser: Remove remaining calls to xml{Push|Pop}Input
Nick Wellnhofer 0f4f8900 2024-11-17T20:13:14 parser: Rename inputPush to xmlCtxtPushInput
Nick Wellnhofer e2ad249c 2024-11-17T19:48:44 parser: Deprecate more internal symbols - xmlParseExternalSubset - xmlPushInput - xmlPopInput - xmlCopyCharMultiByte - xmlCreateEntityParserCtxt - xmlStringComment
Nick Wellnhofer 631778f6 2024-11-17T12:11:41 parser: Check for malloc failure in xmlCtxtParseDtd
Nick Wellnhofer 7f8c436c 2024-11-15T16:30:52 parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd This allows to use the context's error handler, options and other settings. Fixes #808.
Ruslan Garipov aaecdc92 2024-11-12T16:42:36 parser: Assign value without if-statement This avoids an if-statement, because effectively it does nothing. And, for example, binary artifact generated by GCC with -O2 optimization settings does not contain that if-statement -- the code just uses the hprefix->name field explicitly. No functional changes intended. Signed-off-by: Ruslan Garipov <ruslanngaripov@gmail.com>
Nick Wellnhofer 869e3fd4 2024-11-01T16:52:31 parser: Fix loading of parameter entities in external DTDs Regressed with commit 12f0bb94. Fixes #816.
Nick Wellnhofer efb57ddb 2024-10-30T14:02:36 parser: Fix downstream code that swaps DTDs Downstream code like the nginx xslt module can change the document's DTD pointers in a SAX callback. If an entity from a separate DTD is parsed lazily, its content must not reference the current document. Regressed with commit d025cfbb. Fixes #815.
Nick Wellnhofer 0ec5687e 2024-10-28T20:41:56 parser: Rework xmlCtxtGrowAttrs Remove unneeded argument. Check for integer overflow. We probably hit the buffer size limit in xmlParserGrow before, but better be safe.
Nick Wellnhofer ffb058f4 2024-10-28T20:12:52 parser: Fix detection of duplicate attributes We really need a second scan if more than one namespace clash was detected.
Nick Wellnhofer b52a3044 2024-10-24T18:18:47 parser: Use counted_by attribute if supported We only have a single struct with a flexible array member.
Nick Wellnhofer 74dfc49b 2024-09-26T21:24:00 parser: Clarify logic in xmlParseStartTag2
Nick Wellnhofer 0bc4608c 2024-09-15T20:28:49 html: Use hash table to check for duplicate attributes
Nick Wellnhofer 0ce7bfe5 2024-09-12T01:44:18 html: Try to avoid passing XML options to HTML parser
Nick Wellnhofer 16de1346 2024-09-11T19:05:38 parser: Make new options actually work
Nick Wellnhofer dde62ae5 2024-08-28T23:58:20 parser: Align push parsing of CDATA sections with pull parser Remove special handling of CDATA sections in push parser. This makes sure that only a single callback is generated for large sections. Fixes #22 and needed for #412.
Nick Wellnhofer 4d10e53a 2024-08-28T22:47:20 parser: Make sure to set and increment input id Revert part of commits 410931e3 and b9d2f3c9.
Nick Wellnhofer 6d365ca0 2024-08-28T22:09:30 doc: XML_PARSE_NO_XXE is available since 2.13.0
makise-homura 103aadbc 2024-08-14T23:15:30 parser: Suppress EDG maybe-uninitialized warning
Nick Wellnhofer 02fcb1ef 2024-07-25T17:07:18 parser: Make xmlParseChunk return an error if parser was stopped This regressed after enhancing the disableSAX member in 2.13. Should fix #777.
Nick Wellnhofer 1a893230 2024-07-06T01:03:46 [CVE-2024-40896] Fix XXE protection in downstream code Some users set an entity's children manually in the getEntity SAX callback to restrict entity expansion. This stopped working after renaming the "checked" member of xmlEntity, making at least one downstream project and its dependants susceptible to XXE attacks. See #761.
Nick Wellnhofer 6a3c0b0d 2024-07-22T12:53:00 parser: Increase XML_MAX_DICTIONARY_LIMIT This limit is somewhat arbitrary and can be reached when fuzzing documents up to 1 MB. Increase limit to 100 MB and disable limit if XML_PARSE_HUGE is set.
Nick Wellnhofer 5d36664f 2024-07-16T00:35:53 memory: Deprecate xmlGcMemSetup
Nick Wellnhofer 7148b778 2024-07-07T16:11:08 parser: Optimize memory buffer I/O Reenable zero-copy IO for zero-terminated static memory buffers. Don't stream zero-terminated dynamic memory buffers on top of creating a copy.
Nick Wellnhofer 34c9108f 2024-07-07T18:38:31 encoding: Add sizeOut argument to xmlCharEncInput When push parsing, we want to convert as much of the input as possible. When pull parsing memory buffers, we want to convert data chunk by chunk to save memory.
Nick Wellnhofer 6be79014 2024-07-15T14:18:26 Remove unused code
Nick Wellnhofer fee0006a 2024-07-15T13:03:55 parser: Fix memory leak after malloc failure in xml*ParseDTD
Nick Wellnhofer 8af55c8d 2024-07-06T22:14:21 parser: Rename new input API functions These weren't made public yet.
Nick Wellnhofer d74ca594 2024-07-06T22:04:06 parser: Rename internal xmlNewInput functions
Nick Wellnhofer 4f329dc5 2024-07-10T03:27:47 parser: Implement xmlCtxtParseContent This implements xmlCtxtParseContent, a better alternative to xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a parser context and a parser input, making it a lot more versatile. xmlParseInNodeContext is now implemented in terms of xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never modifies the target document, improving thread safety. xmlParseInNodeContext is also more lenient now with regard to undeclared entities. Fixes #727.
Nick Wellnhofer f51ad063 2024-07-08T11:23:39 parser: Fix error return of xmlParseBalancedChunkMemory Only return an error code if the chunk is not well-formed to match the 2.12 behavior. Return 0 on non-fatal errors like invalid namespaces. Fixes #765.
Nick Wellnhofer 2e63656e 2024-07-07T19:21:46 parser: Check return value of inputPush inputPush typically doesn't fail because we pre-allocate the input table. The return value should be checked nevertheless.
Nick Wellnhofer 1e5375c1 2024-07-06T15:15:57 SAX2: Check return value of xmlPushInput Fix null deref in case of malloc failure.
Nick Wellnhofer 38195cf5 2024-07-06T14:58:16 parser: Don't produce names with invalid UTF-8 in recovery mode
Nick Wellnhofer fdfeecfe 2024-07-02T21:54:26 parser: Reenable ctxt->directory Unused internally, but used in downstream code. Should fix #753.
Nick Wellnhofer 606f4108 2024-07-02T20:57:15 parser: Allow to disable catalogs with parser options Implement XML_PARSE_NO_SYS_CATALOG and XML_PARSE_NO_CATALOG_PI. Fixes #735.
Nick Wellnhofer 866be54e 2024-07-02T04:27:53 parser: Don't use deprecated xmlSplitQName
Nick Wellnhofer bc793390 2024-06-27T16:23:14 parser: Update documentation
Nick Wellnhofer eca972e6 2024-06-26T02:22:04 parser: Add getters for XML declaration to parser context Access to struct members will be deprecated.
Mike Dalessio bbbbbb46 2024-06-20T03:19:48 parser: implement xmlCtxtGetOptions In 712a31ab, the `options` struct member was deprecated. To allow callers to check the status of options bits, introduce xmlCtxtGetOptions.
Rosen Penev 217e9b7a 2024-06-08T12:27:45 clang-tidy: don't return in void functions Found with readability-redundant-control-flow Signed-off-by: Rosen Penev <rosenp@gmail.com>
Nick Wellnhofer 32cac377 2024-06-17T17:59:49 parser: Selectively reenable reading from "-" Make filename "-" mean stdin for legacy SAX1 functions and xmlReadFile. This should hopefully fix most command line utilities. See #737.
Nick Wellnhofer 33a1f897 2024-06-16T19:16:47 legacy: Merge SAX.c into legacy.c
Nick Wellnhofer 10d60d15 2024-06-16T00:04:46 regexp: Stop using LIBXML_AUTOMATA_ENABLED This macro always equals LIBXML_REGEXP_ENABLED.
Nick Wellnhofer b0fc67aa 2024-06-15T22:53:55 build: Remove --with-tree configuration option This option would allow for a smaller, but mostly useless minimal build. But it complicates the symbol availability logic in an insane way and requires specialized tools like our custom C parser in doc/apibuild.py. See #717.
Nick Wellnhofer 039ce1e8 2024-06-14T16:41:43 parser: Pass global object to sax->setDocumentLocator Revert part of commit c011e760. Fixes #732.
Nick Wellnhofer dba1ed85 2024-06-12T18:19:55 ftp: Remove FTP support Remove the built-in FTP client. If you configure --with-legacy, old symbols are retained for ABI compatibility.
Nick Wellnhofer 52384043 2024-06-11T19:10:41 parser: Pass resource type to resource loader
Nick Wellnhofer 89fcae4d 2024-06-11T16:19:58 parser: Don't report malloc failures when creating context We don't want messages to stderr before an error handler could be set on a parser context.
Nick Wellnhofer 410931e3 2024-06-11T00:55:38 parser: Only set input ID for PE refs Other input streams don't require IDs.
Nick Wellnhofer ff3b0919 2024-06-11T00:00:32 parser: Implement XML_PARSE_NO_UNZIP option
Nick Wellnhofer 47cbb6bb 2024-06-10T14:04:00 doc: Don't mention xmlNewInputURL
Nick Wellnhofer 8318b5a6 2024-06-09T14:22:53 parser: Fix NULL checks for output arguments
Nick Wellnhofer 0cde1b78 2024-06-06T23:50:03 parser: Fix "Truncated multi-byte sequence" error Don't raise the error if decoding failed.
Nick Wellnhofer 122b6130 2024-06-04T16:33:02 parser: Fix performance regression when parsing namespaces The namespace hash table didn't reuse deleted buckets, leading to quadratic behavior. Also ignore deleted buckets when resizing. Fixes #726.
Nick Wellnhofer a7e26707 2024-06-03T14:04:44 parser: Don't overwrite OOM errors in xmlSBuf
Nick Wellnhofer e75e878e 2024-05-20T13:58:22 doc: Update and fix documentation
Nick Wellnhofer 4fefba4c 2024-05-15T17:52:20 parser: Rework handling of undeclared entities Throw an error if entity substitution was requested. Now we only downgrade to a warning if - XML_PARSE_DTDLOAD wasn't specified, and - entity aren't substituted or XML_PARSE_NO_XXE was specified. Should fix #724.
Nick Wellnhofer 4ff2dccf 2024-05-10T02:04:52 SAX2: Warn if URI resolution failed
Nick Wellnhofer 4fe116eb 2024-05-10T00:05:44 parser: Don't report error on invalid URI Only fragment identifiers are an error. This removes the last user of xmlErrMsg*. Now every error reported by the parser should result in one of ctxt->wellFormed, ctxt->nsWellFormed or ctxt->valid being set to zero.
Nick Wellnhofer a4c2b723 2024-05-05T17:26:31 io: Don't set close callback in xmlParserInputBufferCreateFd
Nick Wellnhofer fdc5ff36 2024-05-02T16:23:04 parser: Always throw entity errors if external DTD is loaded When parsing with XML_PARSE_DTDLOAD, missing entities are always an error. Also consolidate behavior when validating. See b717abdd.
Nick Wellnhofer 39e5b35b 2024-05-02T22:06:19 parser: Don't create undeclared entity refs in substitution mode We never want to create entity reference nodes if entity substitution is enabled. This also applies to undeclared entities.
Nick Wellnhofer 1cdfece1 2024-04-28T18:33:40 memory: Remove memory debugging This is useless compared to sanitizers or valgrind and has a considerable performance impact if enabled accidentally.
Nick Wellnhofer 45fe9924 2024-04-22T17:12:54 parser: Don't create reference in xmlLookupGeneralEntity This should only be done in xmlParseReference. The handling of undeclared entities is still somewhat inconsistent. In element content we create references even if entity substitution is enabled. In attribute values undeclared entities are always ignored.
Nick Wellnhofer b717abdd 2024-04-22T15:42:39 parser: Consolidate error handling for undeclared entities Always use XML_WAR_UNDECLARED_ENTITY with warning error level in documents with external subset or parameter entities. Use XML_ERR_UNDECLARED_ENTITY otherwise.
Nick Wellnhofer f506ec66 2024-04-15T11:27:44 parser: Always decode entities in namespace URIs Also decode entities in namespace URIs if entity substitution wasn't requested. This should fix some corner cases when comparing namespace URIs. The Namespaces in XML 1.0 spec says: > In a namespace declaration, the URI reference is the normalized value > of the attribute, so replacement of XML character and entity > references has already been done before any comparison. Make the serialization code escape special characters in namespace URIs like in attribute values. This fixes serialization if entities were substituted when parsing. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
Nick Wellnhofer 2840e33c 2024-03-04T07:34:25 tree: Allocate XML namespace statically
Nick Wellnhofer 186562a1 2024-03-12T19:55:33 parser: Fix detection of duplicate attributes in XML namespace Fixes a regression from commit e0dd330b, resulting in duplicate attributes in the predefined XML namespace not being detected or extraneous default attributes being passed. Fixes #704.