parser.c


Log

Author Commit Date CI Message
Nick Wellnhofer 859899a8 2025-07-26T22:20:58 doc: Document option handling of deprecated "SAX1" functions
Nick Wellnhofer 144ed959 2025-07-22T22:38:05 parser: Move xmlSaturatedAdd to private header
Nick Wellnhofer e3daef5c 2025-07-22T22:31:02 parser: Fix xmlSaturatedAddSizeT argument type This is only used for entity size accounting. The bug only affected platforms where sizeof(long) != sizeof(size_t) and was probably harmless.
Nick Wellnhofer 7a41b18c 2025-07-22T01:08:38 parser: Remove xmlHaltParser Always halt the parser on resource limit and entity loop errors and remove the remaining calls which seem unnecessary.
Nick Wellnhofer bd9d5e39 2025-07-09T13:10:31 parser: Fix handling of invalid char refs in recovery mode Revert to the old behavior which handles invalid char refs more gracefully. Probably regressed with 37c6618b (version 2.13.0).
Nick Wellnhofer 6c796b37 2025-06-22T17:46:13 doc: Misc fixes
Nick Wellnhofer 56a767ed 2025-06-28T02:35:14 doc: Small fix
Nick Wellnhofer 0d52684e 2025-06-28T02:34:33 parser: Don't set dict limit when setting options This is done in xmlCtxtInitializeLate.
Nick Wellnhofer 1b737cc8 2025-06-27T19:52:54 parser: Another fix to ]]> detection in push parser The original fix for issue #850 in commit 9efe1414 was incomplete.
Nick Wellnhofer 7c913850 2025-06-22T20:12:48 parser: Remove unnecessary dict checks when freeing strings The following strings are never allocated from a dict: - xmlParserCtxt.version - xmlParserCtxt.encoding - xmlParserCtxt.extSubURI - xmlParserCtxt.extSubSystem - xmlDoc.version - xmlDoc.encoding - xmlDoc.URL - xmlDTD.ExternalID - xmlDTD.SystemID - xmlID.value Also make the struct members point to non-const chars to avoid casts when freeing.
Nick Wellnhofer e7802738 2025-06-22T14:39:28 parser: Don't load external content if only XML_SKIP_IDS is set At some point, the `loadsubset` member was augmented to also control handling of ID attributes in addition to loading of external DTDs. These two features are unrelated and shouldn't have been mixed. This mistake was probably inspired by the misnamed XML_DETECT_IDS flag. As a side effect, setting XML_SKIP_IDS always enabled loading of external DTDs and parameter entities. This change makes it possible to ignore IDs without loading external content. This is a deliberate API change that improves security and is unlikely to affect users. This also makes sure that the new XML_PARSE_SKIP_IDS option doesn't enable unsafe behavior.
Nick Wellnhofer 1c96d5ef 2025-06-21T15:08:07 parser: Add comment in xmlStopParser
Nick Wellnhofer a4d25b3d 2025-06-18T16:00:57 doc: Small fixes
Michael Mann cf4f9672 2025-06-21T11:16:39 Add XML_PARSE_SKIP_IDS to replace XML_SKIP_IDS Mark loadset member as deprecated Fixes #873
Nick Wellnhofer a3992815 2025-06-12T13:51:37 parser: Fix buffer overflow when parsing PublicIds Regressed with 8231c0366 and 30665ae4.
Nick Wellnhofer 30665ae4 2025-06-11T18:09:41 parser: Fix parsing of PublicIds and VersionNums Regressed in 8231c0366. Fixes #940.
Nick Wellnhofer 416da89d 2025-06-04T20:49:16 html: Make htmlCtxtReset call xmlCtxtReset The two implementations shouldn't diverge.
Alex Richardson 7e4247b2 2025-06-05T21:28:31 parser: use XML_INT_TO_PTR when storing integers as pointers This fixes warnings when using a CHERI-aware toolchain.
Nick Wellnhofer 2b6b3945 2025-06-03T16:12:56 Revert "SAX1: Align handling of default attributes with SAX2" This reverts commit db65b2fc51ef0d6e4d2e9dc65ba12fe948da49f3. This didn't check for duplicate default attributes.
Nick Wellnhofer 30375877 2025-06-03T15:50:54 parser: Fix custom SAX parsers without cdataBlock handler Use characters handler if cdataBlock handler is NULL. Regressed with 57e4bbd8. Should fix #934.
Nick Wellnhofer 479f26f9 2025-06-03T00:28:16 regexp: Remove unfinished reimplementation This was never enabled.
Nick Wellnhofer 0f8543e1 2025-06-02T14:19:01 parser: Fix error reporting in xmlSkipBlankCharsPEBalanced Short-lived regression.
Nick Wellnhofer 6a6a46f0 2025-05-28T16:02:41 doc: Fix autolink errors Fix links, remove links to internal functions.
Nick Wellnhofer 7bd8d1d9 2025-05-28T15:53:38 doc: Prefix autolinks with '#' Use `#func` instead of `func()` to ignore parameters and make all autolinks work.
Nick Wellnhofer 8baa5de1 2025-05-27T17:51:50 parser: Avoid integer overflow in xmlParseCharDataInternal `nbchar` could overflow with larger than 2GB memory buffers which some new APIs allow. This shouldn't affect memory safety. Limit maximum amount of bytes passed to character callback to XML_MAX_ITEMS (1e9).
Nick Wellnhofer ab06bfa1 2025-05-26T15:03:07 parser: Fix error return in xmlParseElementContentDecl Avoid internal error later in xmlValidBuildAContentModel after 2a60ca06c. Also avoids some unnecessary error messages.
Nick Wellnhofer 4dc44c83 2025-05-21T20:21:32 parser: Rework entity boundary check for element content Only use depth of input stack. This makes the input ID unused internally.
Nick Wellnhofer 74ea6b48 2025-05-21T17:44:27 parser: Start using input depth for entity boundary check Now that we make sure that PEs starting markup won't be popped implicitly, it's enough to check that no new entities are on the stack when checking boundaries.
Nick Wellnhofer db65b2fc 2025-05-20T22:41:08 SAX1: Align handling of default attributes with SAX2 The SAX1 parser is legacy code, but it seems more maintainable to align it with SAX2.
Nick Wellnhofer e4cbc295 2025-05-20T21:57:01 parser: Check attribute normalization standalone constraint To fully implement "VC: Standalone Document Declaration", we have to check for normalization changes caused by non-CDATA attribute types declared externally. Fixes #119.
Nick Wellnhofer 682195c8 2025-05-20T22:00:57 parser: Fix "Proper Declaration/PE Nesting" validity constraint Now that we handle "WFC: PE Between Declarations" correctly, we can turn "Proper Declaration/PE Nesting" from a WFC into VC as specified. Fixes #118.
Nick Wellnhofer 2f3655c9 2025-05-20T19:40:06 parser: Pop PEs that start markup declarations explicitly We currently only handle "Validity constraint: Proper Declaration/PE Nesting", but we must detect "Well-formedness constraint: PE Between Declarations" separately: > The replacement text of a parameter entity reference in a DeclSep must > match the production extSubsetDecl. PEs in DeclSeps are PEs that start with a full markup declaration (or another PE). These are handled in xmParse{Internal|External}Subset. We set a flag on these PEs and don't close them implicitly in xmlSkipBlankCharsPE. This will make unterminated declarations in such PEs cause a parser error. The PEs are closed explicitly in xmParse{Internal|External}Subset, the only location where they are allowed to end.
Nick Wellnhofer 2a60ca06 2025-05-20T16:50:32 valid: Don't check enum values Rely on the parser to pass valid arguments.
Nick Wellnhofer dd1961e0 2025-05-20T16:37:18 valid: Skip more validity checks if not validating
Nick Wellnhofer 47aca2c6 2025-05-19T18:43:14 parser: Only check validity contraints when validating
Nick Wellnhofer 172550d2 2025-05-18T17:45:11 parser: Only validate EnumerationTypes when requested This has quadratic behavior and is only a validity constraint.
Nick Wellnhofer 7008740a 2025-05-18T01:52:38 parser: Consolidate scanning of XML Names Use new productions by default. Fixes #194. Fixes #364. See #707.
Nick Wellnhofer 657254a8 2025-05-18T01:21:43 parser: Factor out xmlIsNameCharNew/Old
Nick Wellnhofer c5b45fbc 2025-05-16T16:54:09 doc: Misc fixes
Nick Wellnhofer 6f4b4527 2025-05-15T23:43:32 parser: Stop using ctxt->linenumbers I think this was used to avoid setting the `line` member before it was added (20+ years ago).
Nick Wellnhofer adfbeb7e 2025-05-14T04:58:21 doc: Stop using *Ptr typedefs in documentation
Nick Wellnhofer a40f36e7 2025-05-14T04:04:28 include: Stop using *Ptr typedefs in public headers
Nick Wellnhofer 442c1903 2025-05-09T18:52:36 doc: Fix some damage from automated conversions Add some newlines, fix returns.
Nick Wellnhofer ad390a5d 2025-05-09T15:34:53 parser: Set doc properties in endDocument SAX handler
Nick Wellnhofer 9bbffec5 2025-05-06T17:42:46 doc: Move brief to top, params to bottom of doc comments
Nick Wellnhofer 1bf44f09 2025-05-04T02:15:25 doc: Misc fixes to parser docs
Nick Wellnhofer 4a010875 2025-05-03T15:38:15 doc: Move parser option docs to enum
Nick Wellnhofer cb1635a6 2025-05-02T19:05:25 doc: Use @since command
Nick Wellnhofer e78e05c9 2025-05-02T17:32:51 doc: Fix autolinks to functions Unfortunately, autolinks in .c files aren't converted by Doxygen for some reason.
Nick Wellnhofer f7c41287 2025-05-02T15:57:17 doc: Remove more comment block headers
Nick Wellnhofer 1eca6e34 2025-04-30T00:54:00 parser: Deprecate xmlClearParserCtxt
Nick Wellnhofer e525564f 2025-05-01T19:20:06 doc: Remove empty lines at start of block These lines were left over after automatic conversion.
Nick Wellnhofer e549622b 2025-04-28T15:11:24 doc: Convert documentation to Doxygen Automated conversion based on a few regexes.
Nick Wellnhofer 69879da8 2025-04-28T14:04:30 doc: Remove email addresses from documentation Also remove authorship information from generated files, hash.c and globals.c which were rewritten.
Nick Wellnhofer 61890e39 2025-04-27T21:50:15 doc: Prepare for conversion to Doxygen Fix many params in internal functions (not really necessary but Doxygen warns about that in XML mode). Fix formatting in a few corner cases that automatic conversion can't handle. Rearrange some DOC_DISABLE blocks.
Nick Wellnhofer 0bac84b1 2025-04-24T18:37:16 Add missing NULL checks to public API functions
Nick Wellnhofer 72906f16 2025-04-25T11:41:50 parser: Make undeclared entities in XML content fatal When parsing XML content with functions like xmlParseBalancedChunk or xmlParseInNodeContext, make undeclared entities always a fatal error to match 2.13 behavior. This was deliberately changed in 4f329dc5, probably to make the tests pass. Should fix #895.
Nick Wellnhofer b85d77d1 2025-04-20T14:31:24 http: Remove built-in HTTP client Stubs are retained for ABI compatibility. Fixes #631. Obsoletes #160.
Nick Wellnhofer a5c4a6ef 2025-03-28T16:31:14 parser: Fix XML_PARSE_NOBLANKS dropping non-whitespace text Regressed with 1f5b5371. Fixes #884.
Nick Wellnhofer 69b83bb6 2025-03-10T02:18:51 encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.
Nick Wellnhofer 8696ebe1 2025-03-11T14:32:35 parser: Fix ignorableWhitespace callback If ignorableWhitespace differs from the "characters" callback, we have to check for blanks as well. Regressed with 1f5b537.
Nick Wellnhofer 25490528 2025-03-11T10:54:34 parser: Fix spurious error in SAX mode Short-lived regression from 5f0b1378.
Nick Wellnhofer 5f0b1378 2025-03-08T22:07:15 parser: Add more parser context accessors Fixes #763.
Nick Wellnhofer 94d8a3e2 2025-03-05T14:56:46 parser: Convert xmlParserMaxDepth to macro
Nick Wellnhofer 03a8d5f9 2025-03-04T16:00:08 unicode: Make Unicode functions private
Nick Wellnhofer cdc5cfed 2025-03-04T13:26:51 legacy: Remove legacy symbols
Nick Wellnhofer c42b3227 2025-03-04T13:11:18 parser: Convert inputPush and inputPop to macros
Nick Wellnhofer 361f7bff 2025-03-04T13:02:36 parser: Make nodePush, nodePop, namePush, namePop private
Nick Wellnhofer 05bd1720 2025-03-01T10:25:29 parser: Fix parsing of DTD content Regressed in 2.11. Fixes #868.
Nick Wellnhofer e50d314a 2025-02-25T23:07:19 build: Add separate configuration option for RELAX NG Support for RELAX NG used to be enabled together with XML Schema support (--with-schemas). Now there's a separate option and a new feature macro LIBXML_RELAXNG_ENABLED.
Nick Wellnhofer b4d3d87e 2025-02-01T22:02:33 parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.
Nick Wellnhofer 57e4bbd8 2025-01-31T16:45:35 parser: Improve handling of NOCDATA option Don't modify the callback structure. This makes sure that unsetting the option works.
Nick Wellnhofer 1f5b5371 2025-01-31T16:21:20 parser: Improve handling of NOBLANKS option Don't change the SAX handler. Use a helper function to invoke "characters" SAX callback. The old code didn't advance the input pointer consistently before invoking the callback. There was also some inconsistency wrt to ctxt->space handling. I don't understand the ctxt->space thing, but now we always behave like the non-complex case before.
Nick Wellnhofer 7a8722f5 2025-01-31T14:55:29 parser: Document that XML_PARSE_NOBLANKS is broken Long text content can generate multiple "characters" callbacks which can lead to NOBLANKS removing whitespace in non-whitespace text nodes. So the NOBLANKS option doesn't even work reliably with the pull parser. This would be extremely hard to fix. Unfortunately, `xmllint --format` relies on this option which is another reason why this feature never really worked.
Nick Wellnhofer 9efe1414 2025-01-31T13:07:35 parser: Fix detection of ']]>' when push-parsing Fixes #850.
Nick Wellnhofer 115b13f9 2025-01-30T23:18:56 parser: Document push parser limitations
Nick Wellnhofer 53a48468 2025-01-30T15:15:30 xmllint: Make --push report parse errors The push parser leaves documents in ctxt->myDoc even if they're invalid. Also fix documentation. Regressed with f8ff4d86.
Nick Wellnhofer 5535721f 2025-01-30T01:27:03 parser: Grow input buffer after lots of whitespace Make sure that the input buffer is grown after consuming large amounts of whitespace. Also move a comment.
Nick Wellnhofer 218264fa 2025-01-30T01:26:01 parser: Always shrink input buffer Shrinking the input buffer is cheap now and should be done as soon as possible.
Nick Wellnhofer 93506d41 2025-01-29T00:17:01 parser: Make catalog PIs opt-in This is an obscure feature that shouldn't be enabled by default.
Nick Wellnhofer 1082d813 2025-01-28T23:21:34 parser: Prepare to make decompression opt-in Add a new parser option XML_PARSE_UNZIP that enables decompression. xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set this option currently, but downstream users should start to set the option if they really need it.
Nick Wellnhofer a78843be 2025-01-28T20:13:58 xmllint: Support compressed input from stdin Another regression related to reading from stdin. Making a "-" filename read from stdin was deeply baked into the core IO code but is inherently insecure. I really want to reenable this dangerous feature as sparingly as possible. This now enables compressed input when using the "Fd" API functions which wan't supported before. But XML_PARSE_NO_UNZIP will be inverted later. Allow compressed stdin in xmlReadFile to support xmlstarlet and older versions of xsltproc. So far, these are the only known command-line tools that rely on "-" meaning stdin.
Nick Wellnhofer ca819160 2025-01-03T20:50:08 include: Use intptr_t to cast between pointers and ints
Nick Wellnhofer 2e3a91a7 2024-12-26T21:05:18 doc: Fix documentation
Nick Wellnhofer 8231c036 2024-12-15T23:36:04 parser: Check reallocations for overflow
Nick Wellnhofer 6548ba11 2024-12-13T16:37:40 parser: Fix argument checks in xmlCtxtParse* - Raise invalid argument error. - Free input stream if ctxt is NULL.
Nick Wellnhofer eae9a1bd 2024-11-26T14:18:22 parser: Pop input stream in xmlCtxtValidateDtd
Nick Wellnhofer dafcefb2 2024-11-25T22:22:26 parser: Fail on catastrophic errors in recovery mode
Nick Wellnhofer 0dc26910 2024-11-20T21:04:19 parser: Deprecate more internal functions
Nick Wellnhofer 84a6eece 2024-11-18T20:40:47 parser: Remove unneeded call to xmlDetectEncoding
Nick Wellnhofer 497081ba 2024-11-17T20:25:07 parser: Remove remaining calls to xml{Push|Pop}Input
Nick Wellnhofer 0f4f8900 2024-11-17T20:13:14 parser: Rename inputPush to xmlCtxtPushInput
Nick Wellnhofer e2ad249c 2024-11-17T19:48:44 parser: Deprecate more internal symbols - xmlParseExternalSubset - xmlPushInput - xmlPopInput - xmlCopyCharMultiByte - xmlCreateEntityParserCtxt - xmlStringComment
Nick Wellnhofer 631778f6 2024-11-17T12:11:41 parser: Check for malloc failure in xmlCtxtParseDtd
Nick Wellnhofer 7f8c436c 2024-11-15T16:30:52 parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd This allows to use the context's error handler, options and other settings. Fixes #808.
Ruslan Garipov aaecdc92 2024-11-12T16:42:36 parser: Assign value without if-statement This avoids an if-statement, because effectively it does nothing. And, for example, binary artifact generated by GCC with -O2 optimization settings does not contain that if-statement -- the code just uses the hprefix->name field explicitly. No functional changes intended. Signed-off-by: Ruslan Garipov <ruslanngaripov@gmail.com>
Nick Wellnhofer 869e3fd4 2024-11-01T16:52:31 parser: Fix loading of parameter entities in external DTDs Regressed with commit 12f0bb94. Fixes #816.
Nick Wellnhofer efb57ddb 2024-10-30T14:02:36 parser: Fix downstream code that swaps DTDs Downstream code like the nginx xslt module can change the document's DTD pointers in a SAX callback. If an entity from a separate DTD is parsed lazily, its content must not reference the current document. Regressed with commit d025cfbb. Fixes #815.
Nick Wellnhofer 0ec5687e 2024-10-28T20:41:56 parser: Rework xmlCtxtGrowAttrs Remove unneeded argument. Check for integer overflow. We probably hit the buffer size limit in xmlParserGrow before, but better be safe.
Nick Wellnhofer ffb058f4 2024-10-28T20:12:52 parser: Fix detection of duplicate attributes We really need a second scan if more than one namespace clash was detected.