include/private/parser.h


Log

Author Commit Date CI Message
Nick Wellnhofer 0bc4608c 2024-09-15T20:28:49 html: Use hash table to check for duplicate attributes
Nick Wellnhofer 8af55c8d 2024-07-06T22:14:21 parser: Rename new input API functions These weren't made public yet.
Nick Wellnhofer d74ca594 2024-07-06T22:04:06 parser: Rename internal xmlNewInput functions
Nick Wellnhofer 38195cf5 2024-07-06T14:58:16 parser: Don't produce names with invalid UTF-8 in recovery mode
Nick Wellnhofer 16e7ecd4 2024-07-01T16:01:24 xinclude: Check URI length Don't report long URIs as OOM errors.
Nick Wellnhofer 52384043 2024-06-11T19:10:41 parser: Pass resource type to resource loader
Nick Wellnhofer 89fcae4d 2024-06-11T16:19:58 parser: Don't report malloc failures when creating context We don't want messages to stderr before an error handler could be set on a parser context.
Nick Wellnhofer b9d2f3c9 2024-06-11T02:15:18 parser: Introduce new input API - xmlInputCreateUrl - xmlInputCreateMemory - xmlInputCreateString - xmlInputCreateFd - xmlInputCreateIO - xmlInputSetEncoding These functions don't take a parser context and work on xmlParserInputs, replacing functions working on xmlParserInputBuffers. xmlInputCreateUrl and xmlInputSetEncoding offer fine-grained error handling. Several XML_INPUT_* flags offer additional control.
Nick Wellnhofer ff3b0919 2024-06-11T00:00:32 parser: Implement XML_PARSE_NO_UNZIP option
Nick Wellnhofer b47a95fe 2024-05-20T13:10:41 parser: Don't make xmlCtxtErrIO public
Nick Wellnhofer 84a71860 2024-02-26T15:14:28 xmlreader: Fix xmlTextReaderConstEncoding Regression from commit f1c1f5c6. Fixes #697.
Nick Wellnhofer 8961056f 2024-01-23T00:47:44 parser: Make experimental input API private This needs to be reworked.
Nick Wellnhofer 37c6618b 2023-12-30T02:50:34 parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.
Nick Wellnhofer 7e0bbbc1 2023-12-27T18:33:30 parser: New input API Provide a new set of functions to create xmlParserInputs. These can be used for the document entity or from external entity loaders. - Don't require xmlParserInputBuffer. - All functions take a base URI. - All functions take an encoding as string. - xmlNewInputURL also takes a public ID. - xmlNewInputMemory takes a size_t. - Optimization hints for memory buffers. Improve documentation. Only call xmlInitParser before allocating a new parser context. Call xmlCtxtUseOptions as early as possible.
Nick Wellnhofer 6a9a88a1 2023-12-26T03:13:05 parser: Move progressive flag into input struct
Nick Wellnhofer d944a415 2023-12-26T02:10:35 parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.
Nick Wellnhofer 13043691 2023-12-20T00:33:34 parser: Rename xmlErrParser to xmlCtxtErr
Nick Wellnhofer 8d0aaf4b 2023-12-19T20:47:36 parser: Remove xmlErrEncoding Use xmlFatalErr or xmlCtxtErrIO.
Nick Wellnhofer 54c70ed5 2023-12-18T19:31:29 parser: Improve error handling Introduce xmlCtxtSetErrorHandler allowing to set a structured error for a parser context. There already was the "serror" SAX handler but this always receives the parser context as argument. Start to use xmlRaiseMemoryError. Remove useless arguments from memory error functions. Rename xmlErrMemory to xmlCtxtErrMemory. Remove a few calls to xmlGenericError. Remove support for runtime entity debugging.
Nick Wellnhofer f19a9510 2023-12-10T17:50:22 parser: Report malloc failures Fix many places where malloc failures aren't reported. Make xmlErrMemory public. This is useful for custom external entity loaders. Introduce new API function xmlSwitchEncodingName. Change the way how we store whether the the parser is stopped. This used to be signaled by setting ctxt->instate to XML_PARSER_EOF which was misdesigned and error-prone. Set ctxt->disableSAX to 2 instead and introduce a macro PARSER_STOPPED. Also stop to remove parser inputs in xmlHaltParser. This allows to remove many checks of ctxt->instate. Introduce xmlErrParser to handle errors if a parser context is available.
Nick Wellnhofer c082ef46 2023-08-09T16:59:36 parser: Stop switching to ISO-8859-1 on encoding errors Use U+FFFD Replacement Character if invalid UTF-8 is encountered in recovery mode. Also rewrite xmlNextChar and xmlCurrentChar. Fixes #598.
Nick Wellnhofer eb69c1d3 2023-10-02T12:16:05 parser: Fix initialization of namespace data Move initialization to xmlInitSAXParserCtxt. Also add missing XML_HIDDEN to xmlParserNsFree. Fixes #597.
Nick Wellnhofer e0dd330b 2023-09-29T00:18:44 parser: Use hash tables to avoid quadratic behavior Use a hash table to lookup namespaces by prefix. The hash table stores an index into the namespace table. Auxiliary data for namespaces is stored in a separate array along the main namespace table. Use a hash table to verify attribute uniqueness. The hash table stores an index into the attribute table. Reuse hash value from the dictionary to avoid computing them twice. See #346.
Nick Wellnhofer f1c1f5c6 2023-08-16T19:43:02 parser: Revert change to doc->encoding Fixes #579.
Nick Wellnhofer ec7be506 2023-08-08T15:19:46 parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.
Nick Wellnhofer fc69cf56 2023-04-30T17:51:29 parser: Move xmlFatalErr to parserInternals.c
Nick Wellnhofer 04d1bedd 2023-03-21T13:08:44 parser: Rework shrinking of input buffers Don't try to grow the input buffer in xmlParserShrink. This makes sure that no memory allocations are made and the function always succeeds. Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of DTD parsing loops. Shrink before growing.
Nick Wellnhofer b167c731 2023-03-14T14:42:36 parser: Fix short-lived regression causing infinite loops Fix 3eb6bf03. We really have to halt the parser, so the input buffer gets reset.
Nick Wellnhofer 2099441f 2023-03-13T17:51:13 parser: Stop calling xmlParserInputShrink Introduce xmlParserShrink which takes a parser context to simplify error handling.
Nick Wellnhofer 3eb6bf03 2023-03-12T16:47:15 parser: Stop calling xmlParserInputGrow Introduce xmlParserGrow which takes a parser context to simplify error handling.
Nick Wellnhofer ccb6d544 2022-11-27T02:09:27 Hide internal functions These functions were never declared in public headers, so it should be safe to hide them. Fixes #139.
Nick Wellnhofer 0f568c0b 2022-08-26T01:22:33 Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.