parserInternals.c


Log

Author Commit Date CI Message
Nick Wellnhofer 152fbb60 2025-08-02T21:27:32 parser: Make sure to stop parser before checking max errors Short-lived regression from 7a41b18c.
Nick Wellnhofer 8689523a 2025-07-22T23:57:03 parser: Implement xmlCtxtGetInputWindow See #762.
Nick Wellnhofer 469c847f 2025-07-22T23:44:10 parser: Split out xmlParserInputGetWindow
Nick Wellnhofer 8aaa53d7 2025-07-22T22:38:50 parser: Implement xmlCtxtGetInputPosition See #762.
Nick Wellnhofer 144ed959 2025-07-22T22:38:05 parser: Move xmlSaturatedAdd to private header
Nick Wellnhofer a7fc9e1a 2025-07-22T20:50:13 parser: Add more parser context accessors The only thing remaining is access to parser input, see #762.
Nick Wellnhofer 7a41b18c 2025-07-22T01:08:38 parser: Remove xmlHaltParser Always halt the parser on resource limit and entity loop errors and remove the remaining calls which seem unnecessary.
Nick Wellnhofer cdf4c6f1 2025-07-21T22:43:57 doc: Mention XML_PARSE_NOERROR in more places
Nick Wellnhofer 7c913850 2025-06-22T20:12:48 parser: Remove unnecessary dict checks when freeing strings The following strings are never allocated from a dict: - xmlParserCtxt.version - xmlParserCtxt.encoding - xmlParserCtxt.extSubURI - xmlParserCtxt.extSubSystem - xmlDoc.version - xmlDoc.encoding - xmlDoc.URL - xmlDTD.ExternalID - xmlDTD.SystemID - xmlID.value Also make the struct members point to non-const chars to avoid casts when freeing.
Nick Wellnhofer a4d25b3d 2025-06-18T16:00:57 doc: Small fixes
Nick Wellnhofer 1dcd3df2 2025-06-20T23:46:46 parser: Fix xmlCtxtIsStopped Make xmlCtxtIsStopped check for fatal errors as well. This makes it easier to migrate away from disableSAX.
Nick Wellnhofer 70335c41 2025-06-06T03:29:57 html: Don't stop on unsupported encoding Continue to parse unlike in the XML case.
Nick Wellnhofer 7bd8d1d9 2025-05-28T15:53:38 doc: Prefix autolinks with '#' Use `#func` instead of `func()` to ignore parameters and make all autolinks work.
Nick Wellnhofer 30cf6d09 2025-05-26T01:13:24 parser: Add XML_INPUT_USE_SYS_CATALOG Also clean up catalog resolution and add error handling using the global error. Don't try to look up the resolved URI a second time. Add some comments. Fix documentation.
Nick Wellnhofer 34bafa14 2025-05-25T20:56:40 parser: Use parser context as default in resource loader This allows to access the original context for example when using modules like XInclude or schemas.
Nick Wellnhofer 6f4b4527 2025-05-15T23:43:32 parser: Stop using ctxt->linenumbers I think this was used to avoid setting the `line` member before it was added (20+ years ago).
Nick Wellnhofer adfbeb7e 2025-05-14T04:58:21 doc: Stop using *Ptr typedefs in documentation
Nick Wellnhofer a40f36e7 2025-05-14T04:04:28 include: Stop using *Ptr typedefs in public headers
Nick Wellnhofer 2d83a84c 2025-05-14T00:29:19 doc: Misc improvements
Nick Wellnhofer cdce17c3 2025-05-12T21:21:25 html: Only map HTML encodings from meta tag
Nick Wellnhofer f0983199 2025-05-12T13:00:20 html: Map some encodings according to HTML5 Windows-1252 is a superset of ISO-8859-1 and should be used instead. Same for ASCII. Also map UCS-2 and UTF-16 to UTF-16LE.
Nick Wellnhofer 442c1903 2025-05-09T18:52:36 doc: Fix some damage from automated conversions Add some newlines, fix returns.
Nick Wellnhofer 38ea8fa9 2025-05-06T18:31:45 doc: Fix varargs
Nick Wellnhofer 9bbffec5 2025-05-06T17:42:46 doc: Move brief to top, params to bottom of doc comments
Nick Wellnhofer ab13fbfd 2025-05-06T14:06:43 doc: Misc fixes to error docs
Nick Wellnhofer 1bf44f09 2025-05-04T02:15:25 doc: Misc fixes to parser docs
Nick Wellnhofer cb1635a6 2025-05-02T19:05:25 doc: Use @since command
Nick Wellnhofer e78e05c9 2025-05-02T17:32:51 doc: Fix autolinks to functions Unfortunately, autolinks in .c files aren't converted by Doxygen for some reason.
Nick Wellnhofer 1eca6e34 2025-04-30T00:54:00 parser: Deprecate xmlClearParserCtxt
Nick Wellnhofer e525564f 2025-05-01T19:20:06 doc: Remove empty lines at start of block These lines were left over after automatic conversion.
Nick Wellnhofer e549622b 2025-04-28T15:11:24 doc: Convert documentation to Doxygen Automated conversion based on a few regexes.
Nick Wellnhofer 69879da8 2025-04-28T14:04:30 doc: Remove email addresses from documentation Also remove authorship information from generated files, hash.c and globals.c which were rewritten.
Nick Wellnhofer 61890e39 2025-04-27T21:50:15 doc: Prepare for conversion to Doxygen Fix many params in internal functions (not really necessary but Doxygen warns about that in XML mode). Fix formatting in a few corner cases that automatic conversion can't handle. Rearrange some DOC_DISABLE blocks.
Nick Wellnhofer fc8899d4 2025-04-27T12:59:41 parser: Make xmlCtxtGetValidCtxt depend on VALID_ENABLED
Nick Wellnhofer b85d77d1 2025-04-20T14:31:24 http: Remove built-in HTTP client Stubs are retained for ABI compatibility. Fixes #631. Obsoletes #160.
Nick Wellnhofer 9e3159d0 2025-04-19T14:26:15 parser: Never use XML catalogs when parsing HTML files When loading HTML files we shouldn't try to resolve URIs using the XML catalogs.
Nick Wellnhofer b3492259 2025-03-14T00:01:11 include: Change some return types from int to enum This also affects some new functions from 2.13.
Nick Wellnhofer fd1b9391 2025-03-13T23:20:16 include: Convert some macros to enums
Nick Wellnhofer 84c6524e 2025-03-13T19:45:35 encoding: Support input-only and output-only converters Make it possible to open an encoding handler only for input or output. This avoids the creation of unnecessary converters. Should also fix #863.
Nick Wellnhofer 69b83bb6 2025-03-10T02:18:51 encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.
Nick Wellnhofer 25490528 2025-03-11T10:54:34 parser: Fix spurious error in SAX mode Short-lived regression from 5f0b1378.
Nick Wellnhofer 5f0b1378 2025-03-08T22:07:15 parser: Add more parser context accessors Fixes #763.
Nick Wellnhofer 6bb2ea8e 2025-02-01T14:58:06 html: Adjust xmlDetectEncoding for HTML Don't check for UTF-32 or EBCDIC. We now perform BOM sniffing and the first step of the HTML5 prescan algorithm (detect UTF-16 XML declarations). The rest of the algorithm still has to be implemented.
Nick Wellnhofer 0de90f51 2025-01-30T01:25:31 parser: Define SIZE_MAX
Nick Wellnhofer 3eced32e 2025-01-29T23:49:56 parser: Fix push parser with encoding and single chunk When push-parsing with an encoding handler, we must convert the whole buffer in the initial conversion. Otherwise, parsing a single chunk larger than ~4KB would fail. Regressed with commit 34c9108f.
Nick Wellnhofer 1082d813 2025-01-28T23:21:34 parser: Prepare to make decompression opt-in Add a new parser option XML_PARSE_UNZIP that enables decompression. xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set this option currently, but downstream users should start to set the option if they really need it.
Nick Wellnhofer a78843be 2025-01-28T20:13:58 xmllint: Support compressed input from stdin Another regression related to reading from stdin. Making a "-" filename read from stdin was deeply baked into the core IO code but is inherently insecure. I really want to reenable this dangerous feature as sparingly as possible. This now enables compressed input when using the "Fd" API functions which wan't supported before. But XML_PARSE_NO_UNZIP will be inverted later. Allow compressed stdin in xmlReadFile to support xmlstarlet and older versions of xsltproc. So far, these are the only known command-line tools that rely on "-" meaning stdin.
Nick Wellnhofer 2e3a91a7 2024-12-26T21:05:18 doc: Fix documentation
Nick Wellnhofer 8231c036 2024-12-15T23:36:04 parser: Check reallocations for overflow
Nick Wellnhofer 0dd910e8 2024-12-18T23:37:35 save: Fix handling of catastrophic errors Don't overwrite catastrophic errors xmlSaveErr. Overwrite non-catastrophic errors in xmlOutputBufferClose.
Nick Wellnhofer 1e1b4891 2024-12-13T16:45:38 parser: Also raise error if ctxt is NULL Update global error variable even if context is missing because of an invalid (NULL) argument.
Nick Wellnhofer 70cce2ec 2024-11-26T01:20:54 parser: Make XML_ERR_RESOURCE_LIMIT non-catastrophic
Nick Wellnhofer 57087e5f 2024-11-25T20:59:06 parser: Don't overwrite catastrophic errors Stop reporting errors after a catastrophic error. Also make sure that ctxt->errNo matches ctxt->lastError.code.
Nick Wellnhofer 0f4f8900 2024-11-17T20:13:14 parser: Rename inputPush to xmlCtxtPushInput
Nick Wellnhofer e2ad249c 2024-11-17T19:48:44 parser: Deprecate more internal symbols - xmlParseExternalSubset - xmlPushInput - xmlPopInput - xmlCopyCharMultiByte - xmlCreateEntityParserCtxt - xmlStringComment
Nick Wellnhofer bd9eed46 2024-09-02T18:37:41 parser: Make unsupported encodings an error in declarations This was changed in 45157261, but in encoding declarations, unsupported encodings should raise a fatal error. Fixes #794.
Nick Wellnhofer 1d009fe3 2024-08-05T15:14:21 parser: Report at least one fatal error
Nick Wellnhofer bfed6e6a 2024-08-05T14:58:37 parser: Fix error handling after reaching limit Mark document as non-wellformed and stop parser even if error limit was reached. Regressed in abd74186.
Nick Wellnhofer 6a3c0b0d 2024-07-22T12:53:00 parser: Increase XML_MAX_DICTIONARY_LIMIT This limit is somewhat arbitrary and can be reached when fuzzing documents up to 1 MB. Increase limit to 100 MB and disable limit if XML_PARSE_HUGE is set.
Nick Wellnhofer a6f54f05 2024-07-07T18:52:17 io: Fine-tune initial IO buffer size
Nick Wellnhofer 34c9108f 2024-07-07T18:38:31 encoding: Add sizeOut argument to xmlCharEncInput When push parsing, we want to convert as much of the input as possible. When pull parsing memory buffers, we want to convert data chunk by chunk to save memory.
Nick Wellnhofer 92f30711 2024-07-07T03:02:11 parser: Optimize buffer shrinking Remove checks now that we can shrink memory buffers efficiently. Shrink more aggressively.
Nick Wellnhofer a221cd78 2024-07-07T03:01:51 buf: Rework xmlBuf code Always use what the old implementation called the "IO" allocation scheme, allowing to move the content pointer past the initial allocation. This is inexpensive and allows efficient shrinking. Optimize xmlBufGrow, reusing shrunken memory as much as possible. Simplify xmlBufAdd. Make xmlBufBackToBuffer return an error on overflow. Make "size" exclude the terminating NULL byte. Always provide an initial size. Reintroduce static buffers. Remove xmlBufResize and several other functions.
Nick Wellnhofer 72886980 2024-07-15T14:35:47 error: Add helper functions to print errors and abort
Nick Wellnhofer aa6aec19 2024-07-11T12:37:25 parser: Fix xmlInputSetEncodingHandler again Short-lived regression.
Nick Wellnhofer 8af55c8d 2024-07-06T22:14:21 parser: Rename new input API functions These weren't made public yet.
Nick Wellnhofer d74ca594 2024-07-06T22:04:06 parser: Rename internal xmlNewInput functions
Nick Wellnhofer 4f329dc5 2024-07-10T03:27:47 parser: Implement xmlCtxtParseContent This implements xmlCtxtParseContent, a better alternative to xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a parser context and a parser input, making it a lot more versatile. xmlParseInNodeContext is now implemented in terms of xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never modifies the target document, improving thread safety. xmlParseInNodeContext is also more lenient now with regard to undeclared entities. Fixes #727.
Nick Wellnhofer 4fec0889 2024-07-10T22:31:15 parser: Fix memory leak in xmlInputSetEncodingHandler Short-lived regression.
Nick Wellnhofer 59354717 2024-07-09T13:54:07 parser: Fix malloc failure handling in xmlInputSetEncodingHandler Don't set encoder if allocating buffer failed. This could lead to xmlByteConsumed processing invalid UTF-8.
Nick Wellnhofer ea31ac5b 2024-07-07T01:02:11 fuzz: Fix spaceMax
Nick Wellnhofer 29e3ab92 2024-07-06T15:48:43 fuzz: Make reallocs more likely
Nick Wellnhofer 38195cf5 2024-07-06T14:58:16 parser: Don't produce names with invalid UTF-8 in recovery mode
Nick Wellnhofer ec088109 2024-07-04T15:15:17 parser: Upgrade XML_IO_NETWORK_ATTEMPT to error Fixes XML::LibXML test suite.
Nick Wellnhofer fdfeecfe 2024-07-02T21:54:26 parser: Reenable ctxt->directory Unused internally, but used in downstream code. Should fix #753.
Nick Wellnhofer 606f4108 2024-07-02T20:57:15 parser: Allow to disable catalogs with parser options Implement XML_PARSE_NO_SYS_CATALOG and XML_PARSE_NO_CATALOG_PI. Fixes #735.
Nick Wellnhofer 197e09d5 2024-07-02T19:46:51 parser: Fix xmlLoadResource Short-lived regression.
Nick Wellnhofer ede5d99a 2024-07-02T01:42:33 parser: Fix typo
Nick Wellnhofer 30ef7755 2024-07-02T04:02:16 parser: Don't use deprecated xmlCopyChar
Nick Wellnhofer 751ba00e 2024-07-02T03:41:05 parser: Don't use deprecated xmlSwitchInputEncoding
Nick Wellnhofer 9a4770ef 2024-07-02T02:18:03 doc: Improve documentation
Nick Wellnhofer 0b0dd989 2024-06-28T23:13:38 parser: Fix EBCDIC detection
Nick Wellnhofer 221df375 2024-06-28T00:34:52 parser: Support custom charset conversion implementations Implement xmlCtxtSetCharEncConvImpl. I agree that the name is terrible.
Nick Wellnhofer e72eda10 2024-06-28T01:41:36 parser: Add NULL check in xmlNewIOInputStream
Nick Wellnhofer bc793390 2024-06-27T16:23:14 parser: Update documentation
Nick Wellnhofer 193f4653 2024-06-26T19:28:28 parser: Implement xmlCtxtGetStatus This allows access to ctxt->wellFormed, ctxt->nsWellFormed and ctxt->valid. It also detects several fatal non-parser errors which really should be another error level.
Nick Wellnhofer cc0cc2d3 2024-06-26T04:32:49 parser: Add more parser context accessors
Nick Wellnhofer eca972e6 2024-06-26T02:22:04 parser: Add getters for XML declaration to parser context Access to struct members will be deprecated.
Nick Wellnhofer 3ff8a2c4 2024-06-26T01:08:48 parser: Deprecate xmlIsLetter
Nick Wellnhofer fa50be92 2024-06-25T23:19:56 parser: Move implementation of xmlCtxtGetLastError
Rosen Penev 217e9b7a 2024-06-08T12:27:45 clang-tidy: don't return in void functions Found with readability-redundant-control-flow Signed-off-by: Rosen Penev <rosenp@gmail.com>
Nick Wellnhofer c5e9a5b2 2024-06-17T15:29:56 parser: Use catalogs with resource loader
Nick Wellnhofer 6deebe03 2024-06-17T13:09:37 parser: Make xmlInputCreateUrl handle HTTP input
Nick Wellnhofer d2fd9d37 2024-06-17T12:55:44 parser: Fix swapped arguments
Nick Wellnhofer 2608baaf 2024-06-14T19:42:40 parser: Make failure to load main document a warning Revert the change that made failures to load the main document an error. This fixes the --path option of xmllint and xsltproc. Should fix #733.
Nick Wellnhofer dba1ed85 2024-06-12T18:19:55 ftp: Remove FTP support Remove the built-in FTP client. If you configure --with-legacy, old symbols are retained for ABI compatibility.
Nick Wellnhofer 52384043 2024-06-11T19:10:41 parser: Pass resource type to resource loader
Nick Wellnhofer ab5e6deb 2024-06-11T18:11:51 parser: Introduce XML_INPUT_NETWORK input flag This allows to disable network access when creating parser inputs with xmlInputCreateUrl.
Nick Wellnhofer 89fcae4d 2024-06-11T16:19:58 parser: Don't report malloc failures when creating context We don't want messages to stderr before an error handler could be set on a parser context.
Nick Wellnhofer 64ad2725 2024-06-11T03:51:43 parser: Introduce per-context resource loader