parser.c


Log

Author Commit Date CI Message
Nick Wellnhofer fdfeecfe 2024-07-02T21:54:26 parser: Reenable ctxt->directory Unused internally, but used in downstream code. Should fix #753.
Nick Wellnhofer 606f4108 2024-07-02T20:57:15 parser: Allow to disable catalogs with parser options Implement XML_PARSE_NO_SYS_CATALOG and XML_PARSE_NO_CATALOG_PI. Fixes #735.
Nick Wellnhofer 866be54e 2024-07-02T04:27:53 parser: Don't use deprecated xmlSplitQName
Nick Wellnhofer bc793390 2024-06-27T16:23:14 parser: Update documentation
Nick Wellnhofer eca972e6 2024-06-26T02:22:04 parser: Add getters for XML declaration to parser context Access to struct members will be deprecated.
Mike Dalessio bbbbbb46 2024-06-20T03:19:48 parser: implement xmlCtxtGetOptions In 712a31ab, the `options` struct member was deprecated. To allow callers to check the status of options bits, introduce xmlCtxtGetOptions.
Rosen Penev 217e9b7a 2024-06-08T12:27:45 clang-tidy: don't return in void functions Found with readability-redundant-control-flow Signed-off-by: Rosen Penev <rosenp@gmail.com>
Nick Wellnhofer 32cac377 2024-06-17T17:59:49 parser: Selectively reenable reading from "-" Make filename "-" mean stdin for legacy SAX1 functions and xmlReadFile. This should hopefully fix most command line utilities. See #737.
Nick Wellnhofer 33a1f897 2024-06-16T19:16:47 legacy: Merge SAX.c into legacy.c
Nick Wellnhofer 10d60d15 2024-06-16T00:04:46 regexp: Stop using LIBXML_AUTOMATA_ENABLED This macro always equals LIBXML_REGEXP_ENABLED.
Nick Wellnhofer b0fc67aa 2024-06-15T22:53:55 build: Remove --with-tree configuration option This option would allow for a smaller, but mostly useless minimal build. But it complicates the symbol availability logic in an insane way and requires specialized tools like our custom C parser in doc/apibuild.py. See #717.
Nick Wellnhofer 039ce1e8 2024-06-14T16:41:43 parser: Pass global object to sax->setDocumentLocator Revert part of commit c011e760. Fixes #732.
Nick Wellnhofer dba1ed85 2024-06-12T18:19:55 ftp: Remove FTP support Remove the built-in FTP client. If you configure --with-legacy, old symbols are retained for ABI compatibility.
Nick Wellnhofer 52384043 2024-06-11T19:10:41 parser: Pass resource type to resource loader
Nick Wellnhofer 89fcae4d 2024-06-11T16:19:58 parser: Don't report malloc failures when creating context We don't want messages to stderr before an error handler could be set on a parser context.
Nick Wellnhofer 410931e3 2024-06-11T00:55:38 parser: Only set input ID for PE refs Other input streams don't require IDs.
Nick Wellnhofer ff3b0919 2024-06-11T00:00:32 parser: Implement XML_PARSE_NO_UNZIP option
Nick Wellnhofer 47cbb6bb 2024-06-10T14:04:00 doc: Don't mention xmlNewInputURL
Nick Wellnhofer 8318b5a6 2024-06-09T14:22:53 parser: Fix NULL checks for output arguments
Nick Wellnhofer 0cde1b78 2024-06-06T23:50:03 parser: Fix "Truncated multi-byte sequence" error Don't raise the error if decoding failed.
Nick Wellnhofer 122b6130 2024-06-04T16:33:02 parser: Fix performance regression when parsing namespaces The namespace hash table didn't reuse deleted buckets, leading to quadratic behavior. Also ignore deleted buckets when resizing. Fixes #726.
Nick Wellnhofer a7e26707 2024-06-03T14:04:44 parser: Don't overwrite OOM errors in xmlSBuf
Nick Wellnhofer e75e878e 2024-05-20T13:58:22 doc: Update and fix documentation
Nick Wellnhofer 4fefba4c 2024-05-15T17:52:20 parser: Rework handling of undeclared entities Throw an error if entity substitution was requested. Now we only downgrade to a warning if - XML_PARSE_DTDLOAD wasn't specified, and - entity aren't substituted or XML_PARSE_NO_XXE was specified. Should fix #724.
Nick Wellnhofer 4ff2dccf 2024-05-10T02:04:52 SAX2: Warn if URI resolution failed
Nick Wellnhofer 4fe116eb 2024-05-10T00:05:44 parser: Don't report error on invalid URI Only fragment identifiers are an error. This removes the last user of xmlErrMsg*. Now every error reported by the parser should result in one of ctxt->wellFormed, ctxt->nsWellFormed or ctxt->valid being set to zero.
Nick Wellnhofer a4c2b723 2024-05-05T17:26:31 io: Don't set close callback in xmlParserInputBufferCreateFd
Nick Wellnhofer fdc5ff36 2024-05-02T16:23:04 parser: Always throw entity errors if external DTD is loaded When parsing with XML_PARSE_DTDLOAD, missing entities are always an error. Also consolidate behavior when validating. See b717abdd.
Nick Wellnhofer 39e5b35b 2024-05-02T22:06:19 parser: Don't create undeclared entity refs in substitution mode We never want to create entity reference nodes if entity substitution is enabled. This also applies to undeclared entities.
Nick Wellnhofer 1cdfece1 2024-04-28T18:33:40 memory: Remove memory debugging This is useless compared to sanitizers or valgrind and has a considerable performance impact if enabled accidentally.
Nick Wellnhofer 45fe9924 2024-04-22T17:12:54 parser: Don't create reference in xmlLookupGeneralEntity This should only be done in xmlParseReference. The handling of undeclared entities is still somewhat inconsistent. In element content we create references even if entity substitution is enabled. In attribute values undeclared entities are always ignored.
Nick Wellnhofer b717abdd 2024-04-22T15:42:39 parser: Consolidate error handling for undeclared entities Always use XML_WAR_UNDECLARED_ENTITY with warning error level in documents with external subset or parameter entities. Use XML_ERR_UNDECLARED_ENTITY otherwise.
Nick Wellnhofer f506ec66 2024-04-15T11:27:44 parser: Always decode entities in namespace URIs Also decode entities in namespace URIs if entity substitution wasn't requested. This should fix some corner cases when comparing namespace URIs. The Namespaces in XML 1.0 spec says: > In a namespace declaration, the URI reference is the normalized value > of the attribute, so replacement of XML character and entity > references has already been done before any comparison. Make the serialization code escape special characters in namespace URIs like in attribute values. This fixes serialization if entities were substituted when parsing. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
Nick Wellnhofer 2840e33c 2024-03-04T07:34:25 tree: Allocate XML namespace statically
Nick Wellnhofer 186562a1 2024-03-12T19:55:33 parser: Fix detection of duplicate attributes in XML namespace Fixes a regression from commit e0dd330b, resulting in duplicate attributes in the predefined XML namespace not being detected or extraneous default attributes being passed. Fixes #704.
Nick Wellnhofer 4d774612 2024-02-13T11:35:12 parser: Fix column number in attribute values Short-lived regression from 37c6618b.
Nick Wellnhofer 95f2a174 2024-01-30T13:25:17 parser: Fix crash in xmlParseInNodeContext with HTML documents Ignore namespaces if we have an HTML document with namespaces added manually. Fixes #672.
Nick Wellnhofer 6dc2fdb2 2024-01-07T14:30:57 parser: Account for full size of non-well-formed entities Account for the full size of the entity if parsing stops because of errors. In our cost model, we have to assume that the entity loader processes the whole entity regardless of its content.
Nick Wellnhofer 29beef65 2024-01-02T21:50:38 parser: Pop inputs if parsing DTD failed This should provide some statistics in ctxt->sizeentcopy even in the error or recovery case.
Nick Wellnhofer 02a2038d 2024-01-10T14:17:49 parser: Handle NOCDATA properly when expanding entities Short-lived regression from e1153832.
Nick Wellnhofer e1153832 2024-01-07T01:29:37 parser: Fix quadratic behavior when copying entities Process the first and last text node with the SAX handler to make the text merging optimization kick in. Fixes #657.
Nick Wellnhofer f237e5b9 2024-01-05T15:40:23 parser: Avoid duplicate namespace errors Don't report an extra attribute uniqueness error if a namespace is undeclared. This matches old behavior.
Nick Wellnhofer 02cc5c36 2024-01-05T04:17:14 parser: Add XML_PARSE_NO_XXE parser option
Nick Wellnhofer 12f0bb94 2024-01-05T01:14:28 parser: Synchronize more options
Nick Wellnhofer 3efbe916 2024-01-05T00:11:29 parser: Mark 'token' member as unused in xmlParserCtxt
Nick Wellnhofer b82fd81d 2024-01-04T23:25:06 parser: Rework xmlCtxtParseDocument Make xmlCtxtParseDocument take a parser input which can be popped after parsing.
Nick Wellnhofer d7d300ba 2024-01-04T17:50:11 parser: Remove remnants of runtime debugging feature Apparently, this feature was remove long ago. Fixes #651.
Nick Wellnhofer 8c5848bd 2024-01-04T17:14:31 parser: Make xmlParseContent more useful This is an internal function which isn't really usable without some hacks. See WebKit/Chromium trying to recreate the effects of xmlDetectSAX2 manually, for example. Make xmlParseContent perform late initialization and check whether the content was fully parsed. Also rename xmlDetectSAX2 and document why it's needed.
Nick Wellnhofer a7356dfe 2024-01-03T18:02:46 parser: Clear invalid entity content This was removed in earlier commits, but we really want to make sure that entity content is syntactically valid.
Nick Wellnhofer 30d83977 2024-01-04T15:18:14 fuzz: Disable catalogs The catalogs API doesn't report OOM errors. It's basically impossible to use it safely in its current form.
Nick Wellnhofer 85f99023 2024-01-02T17:52:43 parser: Fix buffer size checks Don't test size of remaining data. This causes false positives with memory buffers. Also impose XML_MAX_HUGE_LENGTH limit when parsing with XML_PARSE_HUGE.
Nick Wellnhofer e8fb3d63 2024-01-02T17:45:54 parser: Convert some "internal errors" to meaningful codes
Nick Wellnhofer 5cb4b05c 2024-01-02T17:16:22 parser: Lower maximum entity nesting depth Limit entity nesting depth to 20 or 40 with XML_PARSE_HUGE. Change error code to XML_ERR_RESOURCE_LIMIT.
Nick Wellnhofer a2cc7f5f 2024-01-02T17:02:21 parser: Set depth limit to 2048 with XML_PARSE_HUGE Deeply nested documents can cause performance problems, so the nesting depth should always be limited to a reasonable value. Also remove the global xmlParserMaxDepth setting which isn't thread-safe and seems unused.
Nick Wellnhofer 875bb084 2023-09-07T03:25:45 parser: Implement xmlCtxtSetOptions Surprisingly, some options can only be enabled with xmlCtxtUseOptions and it's impossible to unset them. Add a new API function xmlCtxtSetOptions which sets or clears all options. Finally document all parser options. Make sure to synchronize option bits and struct members.
Nick Wellnhofer 33ec407a 2023-09-07T03:33:09 parser: Always prefer option members over bitmask If an option has an extra member in xmlParserCtxt, it takes precedence over the value from the options bitmask. Fix a few places where this was ignored.
Nick Wellnhofer 22fd571f 2023-09-06T22:15:20 parser: Don't modify SAX2 handler if XML_PARSE_SAX1 is set It's a bad idea to modify members of the SAX handler struct for option state management. Ideally, ctxt->options should be the preferred source of truth.
Nick Wellnhofer 37c6618b 2023-12-30T02:50:34 parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.
Nick Wellnhofer 2b79f106 2023-12-29T21:07:04 parser: Simplify entity size accounting
Nick Wellnhofer 08d9b258 2023-12-29T15:20:56 parser: Support namespace scope in NsData struct The previous approach of recreating the NsData struct was flawed.
Nick Wellnhofer 5de48d12 2023-12-29T14:41:40 parser: Simplify error handling when parsing entities
Nick Wellnhofer f0dc52d0 2023-12-29T06:00:20 parser: Move cleanup of element stacks to xmlParseContent
Nick Wellnhofer a1ed589b 2023-12-29T23:12:06 parser: Avoid unwanted expansion of parameter entities Remove PE handling from xmlSkipBlankChars and add a separate version that handles PEs. Only call xmlSkipBlankCharsPE when parsing DTD constructs. This should make sure that PEs don't get expanded accidentally, for example in text declarations.
Nick Wellnhofer a73483ed 2023-12-29T00:22:02 parser: Remove extraneous error message This is not an "internal error" but some other error reported elsewhere.
Nick Wellnhofer 7e0bbbc1 2023-12-27T18:33:30 parser: New input API Provide a new set of functions to create xmlParserInputs. These can be used for the document entity or from external entity loaders. - Don't require xmlParserInputBuffer. - All functions take a base URI. - All functions take an encoding as string. - xmlNewInputURL also takes a public ID. - xmlNewInputMemory takes a size_t. - Optimization hints for memory buffers. Improve documentation. Only call xmlInitParser before allocating a new parser context. Call xmlCtxtUseOptions as early as possible.
Nick Wellnhofer 45157261 2023-12-27T21:30:13 parser: Downgrade XML_ERR_UNSUPPORTED_ENCODING to warning If the actual encoding is UTF-8 or ASCII, we don't want to fail.
Nick Wellnhofer 24b7144f 2023-12-27T15:50:58 parser: More refactoring of entity parsing Remove xmlCreateEntityParserCtxtInternal. Rework xmlNewEntityInputStream.
Nick Wellnhofer d3ceea0b 2023-12-27T15:18:09 parser: Fix encoding handling in xmlParserInputBufferCreateIO Don't pass encoding to xmlParserInputBufferCreateIO but use xmlSwitchEncoding to make sure that the encoding sticks.
Nick Wellnhofer d025cfbb 2023-12-27T03:53:24 parser: Always copy content from entity to target. Make sure that references from IDs are updated. Note that if there are IDs with the same value in a document, the last one will now be returned. IDs should be unique, but maybe this should be addressed.
Nick Wellnhofer 6337ff79 2023-12-27T03:29:13 parser: Simplify control flow in xmlParseReference
Nick Wellnhofer 579186f2 2023-12-27T03:03:26 parser: Remove xmlSetEntityReferenceFunc feature This has been deprecated for a long time.
Nick Wellnhofer b848338c 2023-12-27T01:46:40 parser: More refactoring of entity loading This sets input->entity also for general entities.
Nick Wellnhofer 4ecc85d2 2023-12-27T00:44:16 parser: Push general entity input streams on the stack This allows the error handler to give more context.
Nick Wellnhofer 6a9a88a1 2023-12-26T03:13:05 parser: Move progressive flag into input struct
Nick Wellnhofer 4f14fe9c 2023-12-26T02:44:38 parser: Remove remaining ctxt->instate checks Now ctxt->instate is only used for push parser states.
Nick Wellnhofer d944a415 2023-12-26T02:10:35 parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.
Nick Wellnhofer f3fa34dc 2023-12-26T22:37:26 parser: Fix general entity parsing Clear namespace database. Ignore non-fatal errors.
Nick Wellnhofer ecfbcc8a 2023-12-25T04:33:00 parser: Rework general entity parsing Don't create a new parser context but reuse the existing one. This exposes bug #601 in a more obvious way.
Nick Wellnhofer 955c177f 2023-12-23T00:58:36 parser: Stop using 'directory' struct member This was only used as a pointless fallback for URI resolution.
Nick Wellnhofer e8de3401 2023-12-22T02:57:19 parser: Also set document properties when push parsing Add new function xmlFinishDocument which invokes the endDocument SAX handler and sets the document's properties.
Nick Wellnhofer 13043691 2023-12-20T00:33:34 parser: Rename xmlErrParser to xmlCtxtErr
Nick Wellnhofer 8d0aaf4b 2023-12-19T20:47:36 parser: Remove xmlErrEncoding Use xmlFatalErr or xmlCtxtErrIO.
Nick Wellnhofer 23345a1c 2023-12-19T19:52:28 io: Report IO errors through xmlCtxtErrIO This is also a new public API function to be used in external entity loaders.
Nick Wellnhofer 531d06ad 2023-12-18T22:48:24 error: Stop printing some errors by default Unfortunately, it's long-standing behavior for libxml2 to print all reported errors to stderr by default. This default behavior is now partially disabled. If no error handler is set, only parser and validation errors are passed to a generic error handler or printed to stderr. Other errors are still available via xmlGetLastError and can be captured with a structured error handler.
Nick Wellnhofer 54c70ed5 2023-12-18T19:31:29 parser: Improve error handling Introduce xmlCtxtSetErrorHandler allowing to set a structured error for a parser context. There already was the "serror" SAX handler but this always receives the parser context as argument. Start to use xmlRaiseMemoryError. Remove useless arguments from memory error functions. Rename xmlErrMemory to xmlCtxtErrMemory. Remove a few calls to xmlGenericError. Remove support for runtime entity debugging.
Nick Wellnhofer 1c106edf 2023-12-13T23:56:19 parser: Allow recovery in xmlParseInNodeContext Should fix #645.
Nick Wellnhofer 862e9ce0 2023-12-13T14:53:44 malloc-fail: Fix use-of-uninitialized-value in xmlParseConditionalSections Short-lived regression.
Nick Wellnhofer c2bbeed1 2023-12-12T23:51:32 io: Fix memory lifetime issue with input buffers xmlParserInputBufferCreateMem must make a copy of the buffer. This fixes a regression from 2.11 which could cause reads from freed memory depending on the use case. Undeprecate xmlParserInputBufferCreateStatic which can avoid copying the whole buffer.
Nick Wellnhofer f19a9510 2023-12-10T17:50:22 parser: Report malloc failures Fix many places where malloc failures aren't reported. Make xmlErrMemory public. This is useful for custom external entity loaders. Introduce new API function xmlSwitchEncodingName. Change the way how we store whether the the parser is stopped. This used to be signaled by setting ctxt->instate to XML_PARSER_EOF which was misdesigned and error-prone. Set ctxt->disableSAX to 2 instead and introduce a macro PARSER_STOPPED. Also stop to remove parser inputs in xmlHaltParser. This allows to remove many checks of ctxt->instate. Introduce xmlErrParser to handle errors if a parser context is available.
Nick Wellnhofer 7d446e97 2023-12-08T12:13:49 parser: Fix namespaces redefined from default attributes This regressed in commit e0dd330b. Also fixes a long-standing issue where namespaces from default attributes weren't added if they match an existing namespace. Fixes #643.
Nick Wellnhofer c011e760 2023-12-06T01:09:31 globals: Remove unused globals from thread storage Setting these deprecated globals hasn't had an effect for a long time. Make them constants. This reduces the size of per-thread storage from ~700 to ~250 bytes.
Nick Wellnhofer 7f00273c 2023-12-01T19:21:17 parser: Fix invalid free in xmlParseBalancedChunkMemoryRecover Set the dictionary for newDoc in xmlParseBalancedChunkMemoryRecover. This is a long-standing bug which was masked by - xmlParseBalancedChunkMemoryRecover changing the document of the root node. This is a really bad idea, resulting in a mismatch between ctxt->myDoc and ctxt->node->doc. - SAX2.c preferring ctxt->node->doc over ctxt->myDoc until commit a31e1b06. Fixes #641.
Nick Wellnhofer c7629c9e 2023-11-30T16:52:34 parser: Clarify documentation regarding xmlReadMemory buffer size Fixes #638.
Nick Wellnhofer 43b511fa 2023-11-26T14:31:39 parser: Make CRLF increment line number Partial revert of cb927e85 fixing CRLFs not incrementing the line number. This requires to rework xmlParseQNameHashed. The original implementation prompted the change to xmlCurrentChar which really shouldn't modify the 'cur' pointer as side effect. But the NEXTL macro relies on this behavior. Ultimately, we should reintroduce the change to xmlCurrentChar and fix the NEXTL macro. This will lead to single CRs incrementing the line number as well which seems more consistent. Fixes #628.
Nick Wellnhofer aca37d8c 2023-11-20T15:20:37 parser: Only enable SAX2 if there are SAX2 element handlers This reverts part of commit 235b15a5 for backward compatibility and adds some comments trying to clarify the whole mess. Fixes #623.
Nick Wellnhofer 529df196 2023-11-15T12:10:25 parser: Don't overwrite error state in xmlParseTextDecl Fixes a null deref in xmlLoadEntityContent found by OSS-Fuzz.
Nick Wellnhofer 70cc45b8 2023-11-05T00:49:40 parser: Improve attribute hash table There's no need to grow the hash table dynamically. The size is known which simplifies the implementation.
Nick Wellnhofer 58598494 2023-11-04T23:47:33 parser: Fix combination of hash values This bug resulted in a stuck bit in hash values which can have a severe performance impact.
Nick Wellnhofer 7a2d412f 2023-10-31T20:15:38 parser: Copy default namespace in xmlParseBalancedChunkMemory
Nick Wellnhofer e0c2f14d 2023-10-31T13:53:15 parser: Copy namespaces in xmlParseBalancedChunkMemory Reenable copying of namespaces but don't set SAX data. This should match the old behavior.