include/libxml/parser.h


Log

Author Commit Date CI Message
Nick Wellnhofer 38ea8fa9 2025-05-06T18:31:45 doc: Fix varargs
Nick Wellnhofer 9bbffec5 2025-05-06T17:42:46 doc: Move brief to top, params to bottom of doc comments
Nick Wellnhofer e6cfd049 2025-05-04T14:52:42 doc: Misc fixes to tree docs
Nick Wellnhofer 1bf44f09 2025-05-04T02:15:25 doc: Misc fixes to parser docs
Nick Wellnhofer 4a010875 2025-05-03T15:38:15 doc: Move parser option docs to enum
Nick Wellnhofer f7c41287 2025-05-02T15:57:17 doc: Remove more comment block headers
Nick Wellnhofer 1eca6e34 2025-04-30T00:54:00 parser: Deprecate xmlClearParserCtxt
Nick Wellnhofer fd6ab89b 2025-04-28T15:58:19 doc: Adjust documentation of public structs
Nick Wellnhofer 8816f267 2025-04-28T14:55:47 doc: Adjust documentation of enums
Nick Wellnhofer e549622b 2025-04-28T15:11:24 doc: Convert documentation to Doxygen Automated conversion based on a few regexes.
Nick Wellnhofer 61890e39 2025-04-27T21:50:15 doc: Prepare for conversion to Doxygen Fix many params in internal functions (not really necessary but Doxygen warns about that in XML mode). Fix formatting in a few corner cases that automatic conversion can't handle. Rearrange some DOC_DISABLE blocks.
Nick Wellnhofer fc8899d4 2025-04-27T12:59:41 parser: Make xmlCtxtGetValidCtxt depend on VALID_ENABLED
Nick Wellnhofer aa4ef773 2025-04-17T19:53:14 parser: Deprecate output-related globals
Nick Wellnhofer b3492259 2025-03-14T00:01:11 include: Change some return types from int to enum This also affects some new functions from 2.13.
Nick Wellnhofer fd1b9391 2025-03-13T23:20:16 include: Convert some macros to enums
Nick Wellnhofer 03a8f1dd 2025-03-11T18:53:24 doc: Document SAX handlers a little more
Nick Wellnhofer ba9148d8 2025-03-09T20:30:49 parser: Undeprecate input->consumed Should be deprecated after fixing #762.
Nick Wellnhofer a0dbf030 2025-03-09T20:24:06 parser: Undeprecate ctxt->loadsubset Should be deprecated after fixing #873.
Nick Wellnhofer d96911f1 2025-03-08T23:00:29 doc: Documentation fixes
Nick Wellnhofer 5f0b1378 2025-03-08T22:07:15 parser: Add more parser context accessors Fixes #763.
Nick Wellnhofer 69657224 2025-03-04T20:32:02 globals: Remove unused globals - xmlBufferAllocScheme - xmlDefaultBufferSize - xmlParserDebugEntities
Nick Wellnhofer 3d37ff84 2025-03-04T15:10:09 globals: Also use global state struct if threads are disabled
Nick Wellnhofer a15ad9b2 2025-03-04T14:06:50 parser: Remove compatibility symbols
Nick Wellnhofer 8e871162 2025-03-04T13:36:55 parser: Remove oldXMLWDcompatibility
Nick Wellnhofer cdc5cfed 2025-03-04T13:26:51 legacy: Remove legacy symbols
Nick Wellnhofer e50d314a 2025-02-25T23:07:19 build: Add separate configuration option for RELAX NG Support for RELAX NG used to be enabled together with XML Schema support (--with-schemas). Now there's a separate option and a new feature macro LIBXML_RELAXNG_ENABLED.
Nick Wellnhofer 93506d41 2025-01-29T00:17:01 parser: Make catalog PIs opt-in This is an obscure feature that shouldn't be enabled by default.
Nick Wellnhofer 1082d813 2025-01-28T23:21:34 parser: Prepare to make decompression opt-in Add a new parser option XML_PARSE_UNZIP that enables decompression. xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set this option currently, but downstream users should start to set the option if they really need it.
Nick Wellnhofer 0dc26910 2024-11-20T21:04:19 parser: Deprecate more internal functions
Nick Wellnhofer 5a51f085 2024-11-17T13:50:15 valid: Implement xmlCtxtValidateDocument This allows to use the error handler or resource loader of a parser context.
Nick Wellnhofer 7f8c436c 2024-11-15T16:30:52 parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd This allows to use the context's error handler, options and other settings. Fixes #808.
Nick Wellnhofer eb66d03e 2024-07-07T23:15:54 io: Deprecate a few functions
Nick Wellnhofer 69f12d6d 2024-07-13T00:17:18 encoding: Deprecate xmlByteConsumed This was only used by Chromium/WebKit to detect whether xmlParseContent really succeeded. It's a horrible, overcomplicated hack. See 8c5848bd and #767.
Nick Wellnhofer 8af55c8d 2024-07-06T22:14:21 parser: Rename new input API functions These weren't made public yet.
Nick Wellnhofer 4f329dc5 2024-07-10T03:27:47 parser: Implement xmlCtxtParseContent This implements xmlCtxtParseContent, a better alternative to xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a parser context and a parser input, making it a lot more versatile. xmlParseInNodeContext is now implemented in terms of xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never modifies the target document, improving thread safety. xmlParseInNodeContext is also more lenient now with regard to undeclared entities. Fixes #727.
Nick Wellnhofer 82e0455c 2024-07-06T19:48:07 Undeprecate some symbols for now - xmlKeepBlanksDefault is needed as a work-around for xmlParseBalancedChunk, see issue #727. - ctxt->options already has an accessor and will be deprecated later. - input->cur, input->base, input->end: See #762.
Nick Wellnhofer 205e56da 2024-07-02T22:32:43 parser: Undeprecate ctxt->directory
Nick Wellnhofer 606f4108 2024-07-02T20:57:15 parser: Allow to disable catalogs with parser options Implement XML_PARSE_NO_SYS_CATALOG and XML_PARSE_NO_CATALOG_PI. Fixes #735.
Nick Wellnhofer 221df375 2024-06-28T00:34:52 parser: Support custom charset conversion implementations Implement xmlCtxtSetCharEncConvImpl. I agree that the name is terrible.
Nick Wellnhofer 044ddf07 2024-06-28T03:14:12 parser: Undeprecate some parser context members
Nick Wellnhofer 193f4653 2024-06-26T19:28:28 parser: Implement xmlCtxtGetStatus This allows access to ctxt->wellFormed, ctxt->nsWellFormed and ctxt->valid. It also detects several fatal non-parser errors which really should be another error level.
Nick Wellnhofer cc0cc2d3 2024-06-26T04:32:49 parser: Add more parser context accessors
Nick Wellnhofer eca972e6 2024-06-26T02:22:04 parser: Add getters for XML declaration to parser context Access to struct members will be deprecated.
Nick Wellnhofer f9c33a55 2024-06-21T18:25:11 parser: Undeprecate some xmlParserInput members
Nick Wellnhofer 1228b4e0 2024-06-21T18:22:04 parser: Deprecate xmlParserCtxt->lastError We alredy have xmlCtxtGetLastError().
Nick Wellnhofer f82ca02b 2024-06-21T18:17:11 parser: Undeprecate some xmlParserCtxt members These are essential for SAX parsers.
Mike Dalessio bbbbbb46 2024-06-20T03:19:48 parser: implement xmlCtxtGetOptions In 712a31ab, the `options` struct member was deprecated. To allow callers to check the status of options bits, introduce xmlCtxtGetOptions.
Nick Wellnhofer 1112699c 2024-06-17T02:42:18 legacy: Remove most legacy functions from public headers Also remove warning messages.
Nick Wellnhofer 5fca9498 2024-06-16T19:56:08 doc: Hide internal macro
Nick Wellnhofer 387f0c78 2023-12-06T18:35:30 include: Readd circular dependency between tree.h and parser.h There are dozens of downstream projects that only include tree.h but use declarations from parser.h. This broke after the recent cleanup of circular dependencies. Make tree.h include parser.h again. This is a hack but doesn't change the include directory struture. This commit only made it into the 2.12 branch but wasn't applied to master, so the issue turned up in 2.13.0 again. Should fix #734.
Nick Wellnhofer 712a31ab 2024-06-10T23:06:13 parser: Deprecate most public struct members This will probably cause many warnings in downstream code abusing libxml2 internals, but we can always undeprecate some members later.
Nick Wellnhofer 52384043 2024-06-11T19:10:41 parser: Pass resource type to resource loader
Nick Wellnhofer 64ad2725 2024-06-11T03:51:43 parser: Introduce per-context resource loader
Nick Wellnhofer ff3b0919 2024-06-11T00:00:32 parser: Implement XML_PARSE_NO_UNZIP option
Nick Wellnhofer 5b1d7ff0 2024-05-20T22:51:44 parser: Remove redefinitions for legacy globals
Nick Wellnhofer 8961056f 2024-01-23T00:47:44 parser: Make experimental input API private This needs to be reworked.
Nick Wellnhofer 02cc5c36 2024-01-05T04:17:14 parser: Add XML_PARSE_NO_XXE parser option
Nick Wellnhofer 12f0bb94 2024-01-05T01:14:28 parser: Synchronize more options
Nick Wellnhofer 3efbe916 2024-01-05T00:11:29 parser: Mark 'token' member as unused in xmlParserCtxt
Nick Wellnhofer b82fd81d 2024-01-04T23:25:06 parser: Rework xmlCtxtParseDocument Make xmlCtxtParseDocument take a parser input which can be popped after parsing.
Nick Wellnhofer d7d300ba 2024-01-04T17:50:11 parser: Remove remnants of runtime debugging feature Apparently, this feature was remove long ago. Fixes #651.
Nick Wellnhofer 875bb084 2023-09-07T03:25:45 parser: Implement xmlCtxtSetOptions Surprisingly, some options can only be enabled with xmlCtxtUseOptions and it's impossible to unset them. Add a new API function xmlCtxtSetOptions which sets or clears all options. Finally document all parser options. Make sure to synchronize option bits and struct members.
Nick Wellnhofer 2b79f106 2023-12-29T21:07:04 parser: Simplify entity size accounting
Nick Wellnhofer 7e0bbbc1 2023-12-27T18:33:30 parser: New input API Provide a new set of functions to create xmlParserInputs. These can be used for the document entity or from external entity loaders. - Don't require xmlParserInputBuffer. - All functions take a base URI. - All functions take an encoding as string. - xmlNewInputURL also takes a public ID. - xmlNewInputMemory takes a size_t. - Optimization hints for memory buffers. Improve documentation. Only call xmlInitParser before allocating a new parser context. Call xmlCtxtUseOptions as early as possible.
Nick Wellnhofer a5dcf0f4 2023-12-26T03:27:23 parser: Mark more parser context members as unused
Nick Wellnhofer 6a9a88a1 2023-12-26T03:13:05 parser: Move progressive flag into input struct
Nick Wellnhofer d944a415 2023-12-26T02:10:35 parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.
Nick Wellnhofer c1bddd4c 2023-12-23T01:09:17 parser: Mark 'length' member of xmlParserInput as unused
Nick Wellnhofer 955c177f 2023-12-23T00:58:36 parser: Stop using 'directory' struct member This was only used as a pointless fallback for URI resolution.
Nick Wellnhofer 54c70ed5 2023-12-18T19:31:29 parser: Improve error handling Introduce xmlCtxtSetErrorHandler allowing to set a structured error for a parser context. There already was the "serror" SAX handler but this always receives the parser context as argument. Start to use xmlRaiseMemoryError. Remove useless arguments from memory error functions. Rename xmlErrMemory to xmlCtxtErrMemory. Remove a few calls to xmlGenericError. Remove support for runtime entity debugging.
Nick Wellnhofer 5d2dbe79 2023-12-14T13:37:25 parser: Fix build --without-output Fixes #647
Nick Wellnhofer df0b540b 2023-12-07T14:40:13 include: Rename XML_EMPTY helper macro Avoid name clash with downstream projects.
Nick Wellnhofer a9738e31 2023-12-07T14:15:29 include: Move declaration of xmlInitGlobals Fix downstream build issues after reworking globals.h.
Nick Wellnhofer 9122ad0c 2023-12-06T19:56:50 include: Move globals from xmlsave.h to parser.h Fix downstream build issues after reworking globals.h.
Nick Wellnhofer c011e760 2023-12-06T01:09:31 globals: Remove unused globals from thread storage Setting these deprecated globals hasn't had an effect for a long time. Make them constants. This reduces the size of per-thread storage from ~700 to ~250 bytes.
Nick Wellnhofer ff6c3188 2023-11-23T15:22:59 include: Remove useless 'const' from function arguments
Nick Wellnhofer aca37d8c 2023-11-20T15:20:37 parser: Only enable SAX2 if there are SAX2 element handlers This reverts part of commit 235b15a5 for backward compatibility and adds some comments trying to clarify the whole mess. Fixes #623.
Nick Wellnhofer e0dd330b 2023-09-29T00:18:44 parser: Use hash tables to avoid quadratic behavior Use a hash table to lookup namespaces by prefix. The hash table stores an index into the namespace table. Auxiliary data for namespaces is stored in a separate array along the main namespace table. Use a hash table to verify attribute uniqueness. The hash table stores an index into the attribute table. Reuse hash value from the dictionary to avoid computing them twice. See #346.
Nick Wellnhofer 8c084ebd 2023-09-21T22:57:33 doc: Make apibuild.py happy
Nick Wellnhofer 72262030 2023-09-21T14:52:14 parser: Readd some includes to parser.h and xmlreader.h Fix backward compatibility.
Nick Wellnhofer da274bfa 2023-09-21T01:29:40 build: Fix build when certain modules are disabled
Nick Wellnhofer d6ba4033 2023-09-20T20:49:59 globals: Move remaining declarations to correct places globals.h is now deprecated. Sanity is restored.
Nick Wellnhofer 11a1839d 2023-09-20T17:54:48 globals: Move remaining globals back to correct header files This undoes a lot of damage.
Nick Wellnhofer d1336fd3 2023-09-20T17:00:50 globals: Move malloc hooks back to xmlmemory.h
Nick Wellnhofer 2e6c49a7 2023-09-20T14:43:14 globals: Don't store xmlParserVersion in global state This is a constant.
Nick Wellnhofer db8b9722 2023-09-20T13:56:16 parser: Deprecate global parser options Note that setting global options has no effect anyway when using any of the modern parser API functions which take an option argument like xmlReadMemory or when using xmlCtxtUseOptions. Global options only have an effect when using old API functions xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling xmlCtxtUseOptions. Unfortunately, many downstream projects still modify global parser options often without realizing that it has no effect. If necessary, switch to the modern API. Then you can safely remove all code that changes global options. Here's a list of deprecated functions and global variables together with the corresponding parser options. - xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue Parser option XML_PARSE_NOENT - xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue Inverse of parser option XML_PARSE_NOBLANKS - xmlPedanticParserDefault, xmlPedanticParserDefaultValue Parser option XML_PARSE_PEDANTIC - xmlLineNumbersDefault, xmlLineNumbersDefaultValue Always enabled by new API - xmlDoValidityCheckingDefaultValue Parser option XML_PARSE_DTDVALID - xmlGetWarningsDefaultValue Inverse of parser option XML_PARSE_NOWARNING - xmlLoadExtDtdDefaultValue Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR
Nick Wellnhofer ed3bd052 2023-08-20T20:48:10 parser: Allow to set maximum amplification factor
Nick Wellnhofer ec7be506 2023-08-08T15:19:46 parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.
Nick Wellnhofer e7c3a4ca 2023-03-13T19:19:46 parser: Deprecate some parser input functions
Nick Wellnhofer 59b33661 2022-12-27T14:15:51 error: Limit number of parser errors Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.
Nick Wellnhofer ce76ebfd 2022-12-19T20:56:23 entities: Stop counting entities This was only used in the old version of xmlParserEntityCheck.
Nick Wellnhofer 463bbeec 2022-12-19T18:39:45 entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.
Nick Wellnhofer ce9baf94 2022-12-08T02:48:27 Remove XMLCALL and XMLCDECL macros from public headers
Nick Wellnhofer 68a6518c 2022-11-15T18:23:33 parser: Rewrite push parser boundary checks Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.
Nick Wellnhofer 65dc8a63 2022-09-01T00:13:19 Make xmlNewSAXParserCtx take a const sax handler Also improve documentation.
Nick Wellnhofer 51035c53 2022-08-25T19:53:04 Generate deprecation warnings for old SAX API
Nick Wellnhofer 9a82b94a 2022-08-24T04:21:58 Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt Add API functions to create a parser context with a custom SAX handler without having to mess with ctxt->sax manually.
Nick Wellnhofer 4a8c71eb 2022-03-04T03:35:57 Remove DOCBparser This code has been broken and deprecated since version 2.6.0, released in 2003. Because of a bug in commit 961b535c, DOCBparser.c was never compiled since 2012. I couldn't find a Debian package using any of its symbols, so it seems safe to remove this module.
Nick Wellnhofer ebb17970 2022-03-04T02:31:59 Remove unneeded #includes
Nick Wellnhofer cf4893f7 2022-02-20T19:56:41 Deprecate legacy functions