parser.c


Log

Author Commit Date CI Message
Nick Wellnhofer 93506d41 2025-01-29T00:17:01 parser: Make catalog PIs opt-in This is an obscure feature that shouldn't be enabled by default.
Nick Wellnhofer 1082d813 2025-01-28T23:21:34 parser: Prepare to make decompression opt-in Add a new parser option XML_PARSE_UNZIP that enables decompression. xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set this option currently, but downstream users should start to set the option if they really need it.
Nick Wellnhofer a78843be 2025-01-28T20:13:58 xmllint: Support compressed input from stdin Another regression related to reading from stdin. Making a "-" filename read from stdin was deeply baked into the core IO code but is inherently insecure. I really want to reenable this dangerous feature as sparingly as possible. This now enables compressed input when using the "Fd" API functions which wan't supported before. But XML_PARSE_NO_UNZIP will be inverted later. Allow compressed stdin in xmlReadFile to support xmlstarlet and older versions of xsltproc. So far, these are the only known command-line tools that rely on "-" meaning stdin.
Nick Wellnhofer ca819160 2025-01-03T20:50:08 include: Use intptr_t to cast between pointers and ints
Nick Wellnhofer 2e3a91a7 2024-12-26T21:05:18 doc: Fix documentation
Nick Wellnhofer 8231c036 2024-12-15T23:36:04 parser: Check reallocations for overflow
Nick Wellnhofer 6548ba11 2024-12-13T16:37:40 parser: Fix argument checks in xmlCtxtParse* - Raise invalid argument error. - Free input stream if ctxt is NULL.
Nick Wellnhofer eae9a1bd 2024-11-26T14:18:22 parser: Pop input stream in xmlCtxtValidateDtd
Nick Wellnhofer dafcefb2 2024-11-25T22:22:26 parser: Fail on catastrophic errors in recovery mode
Nick Wellnhofer 0dc26910 2024-11-20T21:04:19 parser: Deprecate more internal functions
Nick Wellnhofer 84a6eece 2024-11-18T20:40:47 parser: Remove unneeded call to xmlDetectEncoding
Nick Wellnhofer 497081ba 2024-11-17T20:25:07 parser: Remove remaining calls to xml{Push|Pop}Input
Nick Wellnhofer 0f4f8900 2024-11-17T20:13:14 parser: Rename inputPush to xmlCtxtPushInput
Nick Wellnhofer e2ad249c 2024-11-17T19:48:44 parser: Deprecate more internal symbols - xmlParseExternalSubset - xmlPushInput - xmlPopInput - xmlCopyCharMultiByte - xmlCreateEntityParserCtxt - xmlStringComment
Nick Wellnhofer 631778f6 2024-11-17T12:11:41 parser: Check for malloc failure in xmlCtxtParseDtd
Nick Wellnhofer 7f8c436c 2024-11-15T16:30:52 parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd This allows to use the context's error handler, options and other settings. Fixes #808.
Ruslan Garipov aaecdc92 2024-11-12T16:42:36 parser: Assign value without if-statement This avoids an if-statement, because effectively it does nothing. And, for example, binary artifact generated by GCC with -O2 optimization settings does not contain that if-statement -- the code just uses the hprefix->name field explicitly. No functional changes intended. Signed-off-by: Ruslan Garipov <ruslanngaripov@gmail.com>
Nick Wellnhofer 869e3fd4 2024-11-01T16:52:31 parser: Fix loading of parameter entities in external DTDs Regressed with commit 12f0bb94. Fixes #816.
Nick Wellnhofer efb57ddb 2024-10-30T14:02:36 parser: Fix downstream code that swaps DTDs Downstream code like the nginx xslt module can change the document's DTD pointers in a SAX callback. If an entity from a separate DTD is parsed lazily, its content must not reference the current document. Regressed with commit d025cfbb. Fixes #815.
Nick Wellnhofer 0ec5687e 2024-10-28T20:41:56 parser: Rework xmlCtxtGrowAttrs Remove unneeded argument. Check for integer overflow. We probably hit the buffer size limit in xmlParserGrow before, but better be safe.
Nick Wellnhofer ffb058f4 2024-10-28T20:12:52 parser: Fix detection of duplicate attributes We really need a second scan if more than one namespace clash was detected.
Nick Wellnhofer b52a3044 2024-10-24T18:18:47 parser: Use counted_by attribute if supported We only have a single struct with a flexible array member.
Nick Wellnhofer 74dfc49b 2024-09-26T21:24:00 parser: Clarify logic in xmlParseStartTag2
Nick Wellnhofer 0bc4608c 2024-09-15T20:28:49 html: Use hash table to check for duplicate attributes
Nick Wellnhofer 0ce7bfe5 2024-09-12T01:44:18 html: Try to avoid passing XML options to HTML parser
Nick Wellnhofer 16de1346 2024-09-11T19:05:38 parser: Make new options actually work
Nick Wellnhofer dde62ae5 2024-08-28T23:58:20 parser: Align push parsing of CDATA sections with pull parser Remove special handling of CDATA sections in push parser. This makes sure that only a single callback is generated for large sections. Fixes #22 and needed for #412.
Nick Wellnhofer 4d10e53a 2024-08-28T22:47:20 parser: Make sure to set and increment input id Revert part of commits 410931e3 and b9d2f3c9.
Nick Wellnhofer 6d365ca0 2024-08-28T22:09:30 doc: XML_PARSE_NO_XXE is available since 2.13.0
makise-homura 103aadbc 2024-08-14T23:15:30 parser: Suppress EDG maybe-uninitialized warning
Nick Wellnhofer 02fcb1ef 2024-07-25T17:07:18 parser: Make xmlParseChunk return an error if parser was stopped This regressed after enhancing the disableSAX member in 2.13. Should fix #777.
Nick Wellnhofer 1a893230 2024-07-06T01:03:46 [CVE-2024-40896] Fix XXE protection in downstream code Some users set an entity's children manually in the getEntity SAX callback to restrict entity expansion. This stopped working after renaming the "checked" member of xmlEntity, making at least one downstream project and its dependants susceptible to XXE attacks. See #761.
Nick Wellnhofer 6a3c0b0d 2024-07-22T12:53:00 parser: Increase XML_MAX_DICTIONARY_LIMIT This limit is somewhat arbitrary and can be reached when fuzzing documents up to 1 MB. Increase limit to 100 MB and disable limit if XML_PARSE_HUGE is set.
Nick Wellnhofer 5d36664f 2024-07-16T00:35:53 memory: Deprecate xmlGcMemSetup
Nick Wellnhofer 7148b778 2024-07-07T16:11:08 parser: Optimize memory buffer I/O Reenable zero-copy IO for zero-terminated static memory buffers. Don't stream zero-terminated dynamic memory buffers on top of creating a copy.
Nick Wellnhofer 34c9108f 2024-07-07T18:38:31 encoding: Add sizeOut argument to xmlCharEncInput When push parsing, we want to convert as much of the input as possible. When pull parsing memory buffers, we want to convert data chunk by chunk to save memory.
Nick Wellnhofer 6be79014 2024-07-15T14:18:26 Remove unused code
Nick Wellnhofer fee0006a 2024-07-15T13:03:55 parser: Fix memory leak after malloc failure in xml*ParseDTD
Nick Wellnhofer 8af55c8d 2024-07-06T22:14:21 parser: Rename new input API functions These weren't made public yet.
Nick Wellnhofer d74ca594 2024-07-06T22:04:06 parser: Rename internal xmlNewInput functions
Nick Wellnhofer 4f329dc5 2024-07-10T03:27:47 parser: Implement xmlCtxtParseContent This implements xmlCtxtParseContent, a better alternative to xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a parser context and a parser input, making it a lot more versatile. xmlParseInNodeContext is now implemented in terms of xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never modifies the target document, improving thread safety. xmlParseInNodeContext is also more lenient now with regard to undeclared entities. Fixes #727.
Nick Wellnhofer f51ad063 2024-07-08T11:23:39 parser: Fix error return of xmlParseBalancedChunkMemory Only return an error code if the chunk is not well-formed to match the 2.12 behavior. Return 0 on non-fatal errors like invalid namespaces. Fixes #765.
Nick Wellnhofer 2e63656e 2024-07-07T19:21:46 parser: Check return value of inputPush inputPush typically doesn't fail because we pre-allocate the input table. The return value should be checked nevertheless.
Nick Wellnhofer 1e5375c1 2024-07-06T15:15:57 SAX2: Check return value of xmlPushInput Fix null deref in case of malloc failure.
Nick Wellnhofer 38195cf5 2024-07-06T14:58:16 parser: Don't produce names with invalid UTF-8 in recovery mode
Nick Wellnhofer fdfeecfe 2024-07-02T21:54:26 parser: Reenable ctxt->directory Unused internally, but used in downstream code. Should fix #753.
Nick Wellnhofer 606f4108 2024-07-02T20:57:15 parser: Allow to disable catalogs with parser options Implement XML_PARSE_NO_SYS_CATALOG and XML_PARSE_NO_CATALOG_PI. Fixes #735.
Nick Wellnhofer 866be54e 2024-07-02T04:27:53 parser: Don't use deprecated xmlSplitQName
Nick Wellnhofer bc793390 2024-06-27T16:23:14 parser: Update documentation
Nick Wellnhofer eca972e6 2024-06-26T02:22:04 parser: Add getters for XML declaration to parser context Access to struct members will be deprecated.
Mike Dalessio bbbbbb46 2024-06-20T03:19:48 parser: implement xmlCtxtGetOptions In 712a31ab, the `options` struct member was deprecated. To allow callers to check the status of options bits, introduce xmlCtxtGetOptions.
Rosen Penev 217e9b7a 2024-06-08T12:27:45 clang-tidy: don't return in void functions Found with readability-redundant-control-flow Signed-off-by: Rosen Penev <rosenp@gmail.com>
Nick Wellnhofer 32cac377 2024-06-17T17:59:49 parser: Selectively reenable reading from "-" Make filename "-" mean stdin for legacy SAX1 functions and xmlReadFile. This should hopefully fix most command line utilities. See #737.
Nick Wellnhofer 33a1f897 2024-06-16T19:16:47 legacy: Merge SAX.c into legacy.c
Nick Wellnhofer 10d60d15 2024-06-16T00:04:46 regexp: Stop using LIBXML_AUTOMATA_ENABLED This macro always equals LIBXML_REGEXP_ENABLED.
Nick Wellnhofer b0fc67aa 2024-06-15T22:53:55 build: Remove --with-tree configuration option This option would allow for a smaller, but mostly useless minimal build. But it complicates the symbol availability logic in an insane way and requires specialized tools like our custom C parser in doc/apibuild.py. See #717.
Nick Wellnhofer 039ce1e8 2024-06-14T16:41:43 parser: Pass global object to sax->setDocumentLocator Revert part of commit c011e760. Fixes #732.
Nick Wellnhofer dba1ed85 2024-06-12T18:19:55 ftp: Remove FTP support Remove the built-in FTP client. If you configure --with-legacy, old symbols are retained for ABI compatibility.
Nick Wellnhofer 52384043 2024-06-11T19:10:41 parser: Pass resource type to resource loader
Nick Wellnhofer 89fcae4d 2024-06-11T16:19:58 parser: Don't report malloc failures when creating context We don't want messages to stderr before an error handler could be set on a parser context.
Nick Wellnhofer 410931e3 2024-06-11T00:55:38 parser: Only set input ID for PE refs Other input streams don't require IDs.
Nick Wellnhofer ff3b0919 2024-06-11T00:00:32 parser: Implement XML_PARSE_NO_UNZIP option
Nick Wellnhofer 47cbb6bb 2024-06-10T14:04:00 doc: Don't mention xmlNewInputURL
Nick Wellnhofer 8318b5a6 2024-06-09T14:22:53 parser: Fix NULL checks for output arguments
Nick Wellnhofer 0cde1b78 2024-06-06T23:50:03 parser: Fix "Truncated multi-byte sequence" error Don't raise the error if decoding failed.
Nick Wellnhofer 122b6130 2024-06-04T16:33:02 parser: Fix performance regression when parsing namespaces The namespace hash table didn't reuse deleted buckets, leading to quadratic behavior. Also ignore deleted buckets when resizing. Fixes #726.
Nick Wellnhofer a7e26707 2024-06-03T14:04:44 parser: Don't overwrite OOM errors in xmlSBuf
Nick Wellnhofer e75e878e 2024-05-20T13:58:22 doc: Update and fix documentation
Nick Wellnhofer 4fefba4c 2024-05-15T17:52:20 parser: Rework handling of undeclared entities Throw an error if entity substitution was requested. Now we only downgrade to a warning if - XML_PARSE_DTDLOAD wasn't specified, and - entity aren't substituted or XML_PARSE_NO_XXE was specified. Should fix #724.
Nick Wellnhofer 4ff2dccf 2024-05-10T02:04:52 SAX2: Warn if URI resolution failed
Nick Wellnhofer 4fe116eb 2024-05-10T00:05:44 parser: Don't report error on invalid URI Only fragment identifiers are an error. This removes the last user of xmlErrMsg*. Now every error reported by the parser should result in one of ctxt->wellFormed, ctxt->nsWellFormed or ctxt->valid being set to zero.
Nick Wellnhofer a4c2b723 2024-05-05T17:26:31 io: Don't set close callback in xmlParserInputBufferCreateFd
Nick Wellnhofer fdc5ff36 2024-05-02T16:23:04 parser: Always throw entity errors if external DTD is loaded When parsing with XML_PARSE_DTDLOAD, missing entities are always an error. Also consolidate behavior when validating. See b717abdd.
Nick Wellnhofer 39e5b35b 2024-05-02T22:06:19 parser: Don't create undeclared entity refs in substitution mode We never want to create entity reference nodes if entity substitution is enabled. This also applies to undeclared entities.
Nick Wellnhofer 1cdfece1 2024-04-28T18:33:40 memory: Remove memory debugging This is useless compared to sanitizers or valgrind and has a considerable performance impact if enabled accidentally.
Nick Wellnhofer 45fe9924 2024-04-22T17:12:54 parser: Don't create reference in xmlLookupGeneralEntity This should only be done in xmlParseReference. The handling of undeclared entities is still somewhat inconsistent. In element content we create references even if entity substitution is enabled. In attribute values undeclared entities are always ignored.
Nick Wellnhofer b717abdd 2024-04-22T15:42:39 parser: Consolidate error handling for undeclared entities Always use XML_WAR_UNDECLARED_ENTITY with warning error level in documents with external subset or parameter entities. Use XML_ERR_UNDECLARED_ENTITY otherwise.
Nick Wellnhofer f506ec66 2024-04-15T11:27:44 parser: Always decode entities in namespace URIs Also decode entities in namespace URIs if entity substitution wasn't requested. This should fix some corner cases when comparing namespace URIs. The Namespaces in XML 1.0 spec says: > In a namespace declaration, the URI reference is the normalized value > of the attribute, so replacement of XML character and entity > references has already been done before any comparison. Make the serialization code escape special characters in namespace URIs like in attribute values. This fixes serialization if entities were substituted when parsing. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
Nick Wellnhofer 2840e33c 2024-03-04T07:34:25 tree: Allocate XML namespace statically
Nick Wellnhofer 186562a1 2024-03-12T19:55:33 parser: Fix detection of duplicate attributes in XML namespace Fixes a regression from commit e0dd330b, resulting in duplicate attributes in the predefined XML namespace not being detected or extraneous default attributes being passed. Fixes #704.
Nick Wellnhofer 4d774612 2024-02-13T11:35:12 parser: Fix column number in attribute values Short-lived regression from 37c6618b.
Nick Wellnhofer 95f2a174 2024-01-30T13:25:17 parser: Fix crash in xmlParseInNodeContext with HTML documents Ignore namespaces if we have an HTML document with namespaces added manually. Fixes #672.
Nick Wellnhofer 6dc2fdb2 2024-01-07T14:30:57 parser: Account for full size of non-well-formed entities Account for the full size of the entity if parsing stops because of errors. In our cost model, we have to assume that the entity loader processes the whole entity regardless of its content.
Nick Wellnhofer 29beef65 2024-01-02T21:50:38 parser: Pop inputs if parsing DTD failed This should provide some statistics in ctxt->sizeentcopy even in the error or recovery case.
Nick Wellnhofer 02a2038d 2024-01-10T14:17:49 parser: Handle NOCDATA properly when expanding entities Short-lived regression from e1153832.
Nick Wellnhofer e1153832 2024-01-07T01:29:37 parser: Fix quadratic behavior when copying entities Process the first and last text node with the SAX handler to make the text merging optimization kick in. Fixes #657.
Nick Wellnhofer f237e5b9 2024-01-05T15:40:23 parser: Avoid duplicate namespace errors Don't report an extra attribute uniqueness error if a namespace is undeclared. This matches old behavior.
Nick Wellnhofer 02cc5c36 2024-01-05T04:17:14 parser: Add XML_PARSE_NO_XXE parser option
Nick Wellnhofer 12f0bb94 2024-01-05T01:14:28 parser: Synchronize more options
Nick Wellnhofer 3efbe916 2024-01-05T00:11:29 parser: Mark 'token' member as unused in xmlParserCtxt
Nick Wellnhofer b82fd81d 2024-01-04T23:25:06 parser: Rework xmlCtxtParseDocument Make xmlCtxtParseDocument take a parser input which can be popped after parsing.
Nick Wellnhofer d7d300ba 2024-01-04T17:50:11 parser: Remove remnants of runtime debugging feature Apparently, this feature was remove long ago. Fixes #651.
Nick Wellnhofer 8c5848bd 2024-01-04T17:14:31 parser: Make xmlParseContent more useful This is an internal function which isn't really usable without some hacks. See WebKit/Chromium trying to recreate the effects of xmlDetectSAX2 manually, for example. Make xmlParseContent perform late initialization and check whether the content was fully parsed. Also rename xmlDetectSAX2 and document why it's needed.
Nick Wellnhofer a7356dfe 2024-01-03T18:02:46 parser: Clear invalid entity content This was removed in earlier commits, but we really want to make sure that entity content is syntactically valid.
Nick Wellnhofer 30d83977 2024-01-04T15:18:14 fuzz: Disable catalogs The catalogs API doesn't report OOM errors. It's basically impossible to use it safely in its current form.
Nick Wellnhofer 85f99023 2024-01-02T17:52:43 parser: Fix buffer size checks Don't test size of remaining data. This causes false positives with memory buffers. Also impose XML_MAX_HUGE_LENGTH limit when parsing with XML_PARSE_HUGE.
Nick Wellnhofer e8fb3d63 2024-01-02T17:45:54 parser: Convert some "internal errors" to meaningful codes
Nick Wellnhofer 5cb4b05c 2024-01-02T17:16:22 parser: Lower maximum entity nesting depth Limit entity nesting depth to 20 or 40 with XML_PARSE_HUGE. Change error code to XML_ERR_RESOURCE_LIMIT.
Nick Wellnhofer a2cc7f5f 2024-01-02T17:02:21 parser: Set depth limit to 2048 with XML_PARSE_HUGE Deeply nested documents can cause performance problems, so the nesting depth should always be limited to a reasonable value. Also remove the global xmlParserMaxDepth setting which isn't thread-safe and seems unused.
Nick Wellnhofer 875bb084 2023-09-07T03:25:45 parser: Implement xmlCtxtSetOptions Surprisingly, some options can only be enabled with xmlCtxtUseOptions and it's impossible to unset them. Add a new API function xmlCtxtSetOptions which sets or clears all options. Finally document all parser options. Make sure to synchronize option bits and struct members.