parser.c


Log

Author Commit Date CI Message
Nick Wellnhofer 2b6b3945 2025-06-03T16:12:56 Revert "SAX1: Align handling of default attributes with SAX2" This reverts commit db65b2fc51ef0d6e4d2e9dc65ba12fe948da49f3. This didn't check for duplicate default attributes.
Nick Wellnhofer 30375877 2025-06-03T15:50:54 parser: Fix custom SAX parsers without cdataBlock handler Use characters handler if cdataBlock handler is NULL. Regressed with 57e4bbd8. Should fix #934.
Nick Wellnhofer 479f26f9 2025-06-03T00:28:16 regexp: Remove unfinished reimplementation This was never enabled.
Nick Wellnhofer 0f8543e1 2025-06-02T14:19:01 parser: Fix error reporting in xmlSkipBlankCharsPEBalanced Short-lived regression.
Nick Wellnhofer 6a6a46f0 2025-05-28T16:02:41 doc: Fix autolink errors Fix links, remove links to internal functions.
Nick Wellnhofer 7bd8d1d9 2025-05-28T15:53:38 doc: Prefix autolinks with '#' Use `#func` instead of `func()` to ignore parameters and make all autolinks work.
Nick Wellnhofer 8baa5de1 2025-05-27T17:51:50 parser: Avoid integer overflow in xmlParseCharDataInternal `nbchar` could overflow with larger than 2GB memory buffers which some new APIs allow. This shouldn't affect memory safety. Limit maximum amount of bytes passed to character callback to XML_MAX_ITEMS (1e9).
Nick Wellnhofer ab06bfa1 2025-05-26T15:03:07 parser: Fix error return in xmlParseElementContentDecl Avoid internal error later in xmlValidBuildAContentModel after 2a60ca06c. Also avoids some unnecessary error messages.
Nick Wellnhofer db65b2fc 2025-05-20T22:41:08 SAX1: Align handling of default attributes with SAX2 The SAX1 parser is legacy code, but it seems more maintainable to align it with SAX2.
Nick Wellnhofer e4cbc295 2025-05-20T21:57:01 parser: Check attribute normalization standalone constraint To fully implement "VC: Standalone Document Declaration", we have to check for normalization changes caused by non-CDATA attribute types declared externally. Fixes #119.
Nick Wellnhofer 682195c8 2025-05-20T22:00:57 parser: Fix "Proper Declaration/PE Nesting" validity constraint Now that we handle "WFC: PE Between Declarations" correctly, we can turn "Proper Declaration/PE Nesting" from a WFC into VC as specified. Fixes #118.
Nick Wellnhofer 2f3655c9 2025-05-20T19:40:06 parser: Pop PEs that start markup declarations explicitly We currently only handle "Validity constraint: Proper Declaration/PE Nesting", but we must detect "Well-formedness constraint: PE Between Declarations" separately: > The replacement text of a parameter entity reference in a DeclSep must > match the production extSubsetDecl. PEs in DeclSeps are PEs that start with a full markup declaration (or another PE). These are handled in xmParse{Internal|External}Subset. We set a flag on these PEs and don't close them implicitly in xmlSkipBlankCharsPE. This will make unterminated declarations in such PEs cause a parser error. The PEs are closed explicitly in xmParse{Internal|External}Subset, the only location where they are allowed to end.
Nick Wellnhofer 2a60ca06 2025-05-20T16:50:32 valid: Don't check enum values Rely on the parser to pass valid arguments.
Nick Wellnhofer dd1961e0 2025-05-20T16:37:18 valid: Skip more validity checks if not validating
Nick Wellnhofer 4dc44c83 2025-05-21T20:21:32 parser: Rework entity boundary check for element content Only use depth of input stack. This makes the input ID unused internally.
Nick Wellnhofer 74ea6b48 2025-05-21T17:44:27 parser: Start using input depth for entity boundary check Now that we make sure that PEs starting markup won't be popped implicitly, it's enough to check that no new entities are on the stack when checking boundaries.
Nick Wellnhofer 47aca2c6 2025-05-19T18:43:14 parser: Only check validity contraints when validating
Nick Wellnhofer 172550d2 2025-05-18T17:45:11 parser: Only validate EnumerationTypes when requested This has quadratic behavior and is only a validity constraint.
Nick Wellnhofer 7008740a 2025-05-18T01:52:38 parser: Consolidate scanning of XML Names Use new productions by default. Fixes #194. Fixes #364. See #707.
Nick Wellnhofer 657254a8 2025-05-18T01:21:43 parser: Factor out xmlIsNameCharNew/Old
Nick Wellnhofer c5b45fbc 2025-05-16T16:54:09 doc: Misc fixes
Nick Wellnhofer 6f4b4527 2025-05-15T23:43:32 parser: Stop using ctxt->linenumbers I think this was used to avoid setting the `line` member before it was added (20+ years ago).
Nick Wellnhofer adfbeb7e 2025-05-14T04:58:21 doc: Stop using *Ptr typedefs in documentation
Nick Wellnhofer a40f36e7 2025-05-14T04:04:28 include: Stop using *Ptr typedefs in public headers
Nick Wellnhofer 442c1903 2025-05-09T18:52:36 doc: Fix some damage from automated conversions Add some newlines, fix returns.
Nick Wellnhofer ad390a5d 2025-05-09T15:34:53 parser: Set doc properties in endDocument SAX handler
Nick Wellnhofer 1bf44f09 2025-05-04T02:15:25 doc: Misc fixes to parser docs
Nick Wellnhofer 4a010875 2025-05-03T15:38:15 doc: Move parser option docs to enum
Nick Wellnhofer 9bbffec5 2025-05-06T17:42:46 doc: Move brief to top, params to bottom of doc comments
Nick Wellnhofer cb1635a6 2025-05-02T19:05:25 doc: Use @since command
Nick Wellnhofer e78e05c9 2025-05-02T17:32:51 doc: Fix autolinks to functions Unfortunately, autolinks in .c files aren't converted by Doxygen for some reason.
Nick Wellnhofer f7c41287 2025-05-02T15:57:17 doc: Remove more comment block headers
Nick Wellnhofer 1eca6e34 2025-04-30T00:54:00 parser: Deprecate xmlClearParserCtxt
Nick Wellnhofer e525564f 2025-05-01T19:20:06 doc: Remove empty lines at start of block These lines were left over after automatic conversion.
Nick Wellnhofer e549622b 2025-04-28T15:11:24 doc: Convert documentation to Doxygen Automated conversion based on a few regexes.
Nick Wellnhofer 69879da8 2025-04-28T14:04:30 doc: Remove email addresses from documentation Also remove authorship information from generated files, hash.c and globals.c which were rewritten.
Nick Wellnhofer 61890e39 2025-04-27T21:50:15 doc: Prepare for conversion to Doxygen Fix many params in internal functions (not really necessary but Doxygen warns about that in XML mode). Fix formatting in a few corner cases that automatic conversion can't handle. Rearrange some DOC_DISABLE blocks.
Nick Wellnhofer 0bac84b1 2025-04-24T18:37:16 Add missing NULL checks to public API functions
Nick Wellnhofer 72906f16 2025-04-25T11:41:50 parser: Make undeclared entities in XML content fatal When parsing XML content with functions like xmlParseBalancedChunk or xmlParseInNodeContext, make undeclared entities always a fatal error to match 2.13 behavior. This was deliberately changed in 4f329dc5, probably to make the tests pass. Should fix #895.
Nick Wellnhofer b85d77d1 2025-04-20T14:31:24 http: Remove built-in HTTP client Stubs are retained for ABI compatibility. Fixes #631. Obsoletes #160.
Nick Wellnhofer a5c4a6ef 2025-03-28T16:31:14 parser: Fix XML_PARSE_NOBLANKS dropping non-whitespace text Regressed with 1f5b5371. Fixes #884.
Nick Wellnhofer 69b83bb6 2025-03-10T02:18:51 encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.
Nick Wellnhofer 8696ebe1 2025-03-11T14:32:35 parser: Fix ignorableWhitespace callback If ignorableWhitespace differs from the "characters" callback, we have to check for blanks as well. Regressed with 1f5b537.
Nick Wellnhofer 25490528 2025-03-11T10:54:34 parser: Fix spurious error in SAX mode Short-lived regression from 5f0b1378.
Nick Wellnhofer 5f0b1378 2025-03-08T22:07:15 parser: Add more parser context accessors Fixes #763.
Nick Wellnhofer 94d8a3e2 2025-03-05T14:56:46 parser: Convert xmlParserMaxDepth to macro
Nick Wellnhofer 03a8d5f9 2025-03-04T16:00:08 unicode: Make Unicode functions private
Nick Wellnhofer cdc5cfed 2025-03-04T13:26:51 legacy: Remove legacy symbols
Nick Wellnhofer c42b3227 2025-03-04T13:11:18 parser: Convert inputPush and inputPop to macros
Nick Wellnhofer 361f7bff 2025-03-04T13:02:36 parser: Make nodePush, nodePop, namePush, namePop private
Nick Wellnhofer 05bd1720 2025-03-01T10:25:29 parser: Fix parsing of DTD content Regressed in 2.11. Fixes #868.
Nick Wellnhofer e50d314a 2025-02-25T23:07:19 build: Add separate configuration option for RELAX NG Support for RELAX NG used to be enabled together with XML Schema support (--with-schemas). Now there's a separate option and a new feature macro LIBXML_RELAXNG_ENABLED.
Nick Wellnhofer b4d3d87e 2025-02-01T22:02:33 parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.
Nick Wellnhofer 57e4bbd8 2025-01-31T16:45:35 parser: Improve handling of NOCDATA option Don't modify the callback structure. This makes sure that unsetting the option works.
Nick Wellnhofer 1f5b5371 2025-01-31T16:21:20 parser: Improve handling of NOBLANKS option Don't change the SAX handler. Use a helper function to invoke "characters" SAX callback. The old code didn't advance the input pointer consistently before invoking the callback. There was also some inconsistency wrt to ctxt->space handling. I don't understand the ctxt->space thing, but now we always behave like the non-complex case before.
Nick Wellnhofer 7a8722f5 2025-01-31T14:55:29 parser: Document that XML_PARSE_NOBLANKS is broken Long text content can generate multiple "characters" callbacks which can lead to NOBLANKS removing whitespace in non-whitespace text nodes. So the NOBLANKS option doesn't even work reliably with the pull parser. This would be extremely hard to fix. Unfortunately, `xmllint --format` relies on this option which is another reason why this feature never really worked.
Nick Wellnhofer 9efe1414 2025-01-31T13:07:35 parser: Fix detection of ']]>' when push-parsing Fixes #850.
Nick Wellnhofer 115b13f9 2025-01-30T23:18:56 parser: Document push parser limitations
Nick Wellnhofer 53a48468 2025-01-30T15:15:30 xmllint: Make --push report parse errors The push parser leaves documents in ctxt->myDoc even if they're invalid. Also fix documentation. Regressed with f8ff4d86.
Nick Wellnhofer 5535721f 2025-01-30T01:27:03 parser: Grow input buffer after lots of whitespace Make sure that the input buffer is grown after consuming large amounts of whitespace. Also move a comment.
Nick Wellnhofer 218264fa 2025-01-30T01:26:01 parser: Always shrink input buffer Shrinking the input buffer is cheap now and should be done as soon as possible.
Nick Wellnhofer 93506d41 2025-01-29T00:17:01 parser: Make catalog PIs opt-in This is an obscure feature that shouldn't be enabled by default.
Nick Wellnhofer 1082d813 2025-01-28T23:21:34 parser: Prepare to make decompression opt-in Add a new parser option XML_PARSE_UNZIP that enables decompression. xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set this option currently, but downstream users should start to set the option if they really need it.
Nick Wellnhofer a78843be 2025-01-28T20:13:58 xmllint: Support compressed input from stdin Another regression related to reading from stdin. Making a "-" filename read from stdin was deeply baked into the core IO code but is inherently insecure. I really want to reenable this dangerous feature as sparingly as possible. This now enables compressed input when using the "Fd" API functions which wan't supported before. But XML_PARSE_NO_UNZIP will be inverted later. Allow compressed stdin in xmlReadFile to support xmlstarlet and older versions of xsltproc. So far, these are the only known command-line tools that rely on "-" meaning stdin.
Nick Wellnhofer ca819160 2025-01-03T20:50:08 include: Use intptr_t to cast between pointers and ints
Nick Wellnhofer 2e3a91a7 2024-12-26T21:05:18 doc: Fix documentation
Nick Wellnhofer 8231c036 2024-12-15T23:36:04 parser: Check reallocations for overflow
Nick Wellnhofer 6548ba11 2024-12-13T16:37:40 parser: Fix argument checks in xmlCtxtParse* - Raise invalid argument error. - Free input stream if ctxt is NULL.
Nick Wellnhofer eae9a1bd 2024-11-26T14:18:22 parser: Pop input stream in xmlCtxtValidateDtd
Nick Wellnhofer dafcefb2 2024-11-25T22:22:26 parser: Fail on catastrophic errors in recovery mode
Nick Wellnhofer 0dc26910 2024-11-20T21:04:19 parser: Deprecate more internal functions
Nick Wellnhofer 84a6eece 2024-11-18T20:40:47 parser: Remove unneeded call to xmlDetectEncoding
Nick Wellnhofer 497081ba 2024-11-17T20:25:07 parser: Remove remaining calls to xml{Push|Pop}Input
Nick Wellnhofer 0f4f8900 2024-11-17T20:13:14 parser: Rename inputPush to xmlCtxtPushInput
Nick Wellnhofer e2ad249c 2024-11-17T19:48:44 parser: Deprecate more internal symbols - xmlParseExternalSubset - xmlPushInput - xmlPopInput - xmlCopyCharMultiByte - xmlCreateEntityParserCtxt - xmlStringComment
Nick Wellnhofer 631778f6 2024-11-17T12:11:41 parser: Check for malloc failure in xmlCtxtParseDtd
Nick Wellnhofer 7f8c436c 2024-11-15T16:30:52 parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd This allows to use the context's error handler, options and other settings. Fixes #808.
Ruslan Garipov aaecdc92 2024-11-12T16:42:36 parser: Assign value without if-statement This avoids an if-statement, because effectively it does nothing. And, for example, binary artifact generated by GCC with -O2 optimization settings does not contain that if-statement -- the code just uses the hprefix->name field explicitly. No functional changes intended. Signed-off-by: Ruslan Garipov <ruslanngaripov@gmail.com>
Nick Wellnhofer 869e3fd4 2024-11-01T16:52:31 parser: Fix loading of parameter entities in external DTDs Regressed with commit 12f0bb94. Fixes #816.
Nick Wellnhofer efb57ddb 2024-10-30T14:02:36 parser: Fix downstream code that swaps DTDs Downstream code like the nginx xslt module can change the document's DTD pointers in a SAX callback. If an entity from a separate DTD is parsed lazily, its content must not reference the current document. Regressed with commit d025cfbb. Fixes #815.
Nick Wellnhofer 0ec5687e 2024-10-28T20:41:56 parser: Rework xmlCtxtGrowAttrs Remove unneeded argument. Check for integer overflow. We probably hit the buffer size limit in xmlParserGrow before, but better be safe.
Nick Wellnhofer ffb058f4 2024-10-28T20:12:52 parser: Fix detection of duplicate attributes We really need a second scan if more than one namespace clash was detected.
Nick Wellnhofer b52a3044 2024-10-24T18:18:47 parser: Use counted_by attribute if supported We only have a single struct with a flexible array member.
Nick Wellnhofer 74dfc49b 2024-09-26T21:24:00 parser: Clarify logic in xmlParseStartTag2
Nick Wellnhofer 0bc4608c 2024-09-15T20:28:49 html: Use hash table to check for duplicate attributes
Nick Wellnhofer 0ce7bfe5 2024-09-12T01:44:18 html: Try to avoid passing XML options to HTML parser
Nick Wellnhofer 16de1346 2024-09-11T19:05:38 parser: Make new options actually work
Nick Wellnhofer dde62ae5 2024-08-28T23:58:20 parser: Align push parsing of CDATA sections with pull parser Remove special handling of CDATA sections in push parser. This makes sure that only a single callback is generated for large sections. Fixes #22 and needed for #412.
Nick Wellnhofer 4d10e53a 2024-08-28T22:47:20 parser: Make sure to set and increment input id Revert part of commits 410931e3 and b9d2f3c9.
Nick Wellnhofer 6d365ca0 2024-08-28T22:09:30 doc: XML_PARSE_NO_XXE is available since 2.13.0
makise-homura 103aadbc 2024-08-14T23:15:30 parser: Suppress EDG maybe-uninitialized warning
Nick Wellnhofer 02fcb1ef 2024-07-25T17:07:18 parser: Make xmlParseChunk return an error if parser was stopped This regressed after enhancing the disableSAX member in 2.13. Should fix #777.
Nick Wellnhofer 1a893230 2024-07-06T01:03:46 [CVE-2024-40896] Fix XXE protection in downstream code Some users set an entity's children manually in the getEntity SAX callback to restrict entity expansion. This stopped working after renaming the "checked" member of xmlEntity, making at least one downstream project and its dependants susceptible to XXE attacks. See #761.
Nick Wellnhofer 6a3c0b0d 2024-07-22T12:53:00 parser: Increase XML_MAX_DICTIONARY_LIMIT This limit is somewhat arbitrary and can be reached when fuzzing documents up to 1 MB. Increase limit to 100 MB and disable limit if XML_PARSE_HUGE is set.
Nick Wellnhofer 5d36664f 2024-07-16T00:35:53 memory: Deprecate xmlGcMemSetup
Nick Wellnhofer 7148b778 2024-07-07T16:11:08 parser: Optimize memory buffer I/O Reenable zero-copy IO for zero-terminated static memory buffers. Don't stream zero-terminated dynamic memory buffers on top of creating a copy.
Nick Wellnhofer 34c9108f 2024-07-07T18:38:31 encoding: Add sizeOut argument to xmlCharEncInput When push parsing, we want to convert as much of the input as possible. When pull parsing memory buffers, we want to convert data chunk by chunk to save memory.
Nick Wellnhofer 6be79014 2024-07-15T14:18:26 Remove unused code
Nick Wellnhofer fee0006a 2024-07-15T13:03:55 parser: Fix memory leak after malloc failure in xml*ParseDTD
Nick Wellnhofer 8af55c8d 2024-07-06T22:14:21 parser: Rename new input API functions These weren't made public yet.