Log

Author Commit Date CI Message
Nick Wellnhofer 29beef65 2024-01-02T21:50:38 parser: Pop inputs if parsing DTD failed This should provide some statistics in ctxt->sizeentcopy even in the error or recovery case.
Nick Wellnhofer 02a2038d 2024-01-10T14:17:49 parser: Handle NOCDATA properly when expanding entities Short-lived regression from e1153832.
Nick Wellnhofer fd801845 2024-01-07T15:19:58 fuzz: Cap URL size Cap URL size to avoid quadratic behavior when generating error messages.
Nick Wellnhofer 83c1ae13 2024-01-07T15:40:23 fuzz: Add missing include Fix build failure.
Nick Wellnhofer e1153832 2024-01-07T01:29:37 parser: Fix quadratic behavior when copying entities Process the first and last text node with the SAX handler to make the text merging optimization kick in. Fixes #657.
Nick Wellnhofer d2b55a7a 2024-01-05T20:31:10 writer: Implement xmlTextWriterClose This function can be used to make sure that closing the output stream succeeded. Fixes #513.
Nick Wellnhofer 02cc5c36 2024-01-05T04:17:14 parser: Add XML_PARSE_NO_XXE parser option
Nick Wellnhofer 12f0bb94 2024-01-05T01:14:28 parser: Synchronize more options
Nick Wellnhofer 3efbe916 2024-01-05T00:11:29 parser: Mark 'token' member as unused in xmlParserCtxt
Nick Wellnhofer b82fd81d 2024-01-04T23:25:06 parser: Rework xmlCtxtParseDocument Make xmlCtxtParseDocument take a parser input which can be popped after parsing.
Nick Wellnhofer f237e5b9 2024-01-05T15:40:23 parser: Avoid duplicate namespace errors Don't report an extra attribute uniqueness error if a namespace is undeclared. This matches old behavior.
Nick Wellnhofer c2b3294f 2024-01-04T21:20:51 fuzz: Abort on invalid UTF-8 The parser should never generate invalid UTF-8 these days even in recovery mode.
Michele Bianchi df098e3b 2023-12-22T12:02:08 Set LIBXML2_FOUND if it has been properly configured
Nick Wellnhofer d7d300ba 2024-01-04T17:50:11 parser: Remove remnants of runtime debugging feature Apparently, this feature was remove long ago. Fixes #651.
Nick Wellnhofer 8c5848bd 2024-01-04T17:14:31 parser: Make xmlParseContent more useful This is an internal function which isn't really usable without some hacks. See WebKit/Chromium trying to recreate the effects of xmlDetectSAX2 manually, for example. Make xmlParseContent perform late initialization and check whether the content was fully parsed. Also rename xmlDetectSAX2 and document why it's needed.
Nick Wellnhofer 65c65b65 2024-01-04T13:59:23 tests: Move away from global error handlers
Nick Wellnhofer 07c05546 2024-01-04T02:48:02 error: Make xmlFormatError public This is a useful function to get a verbose error report. Allows to remove duplicated code from runtest.c. Also reactivate check for schema parser failures.
Nick Wellnhofer d0eb5a7e 2024-01-03T18:12:29 parser: Remove xmlErrEncodingInt Convert the last user to xmlFatalErr.
Nick Wellnhofer f30b9b23 2024-01-03T18:11:44 fuzz: Add assertion in xmlCopyCharMultibyte This is an internal function that should never receive out-of-range codepoints.
Nick Wellnhofer a7356dfe 2024-01-03T18:02:46 parser: Clear invalid entity content This was removed in earlier commits, but we really want to make sure that entity content is syntactically valid.
Nick Wellnhofer 30d83977 2024-01-04T15:18:14 fuzz: Disable catalogs The catalogs API doesn't report OOM errors. It's basically impossible to use it safely in its current form.
Nick Wellnhofer ca5965d5 2024-01-02T21:49:43 save: Report more malloc failures
Nick Wellnhofer 2c9cd0b6 2024-01-02T18:51:24 fuzz: Abort on internal errors
Nick Wellnhofer 661ef936 2024-01-02T18:50:59 valid: Fix some error codes
Nick Wellnhofer 0821efc8 2024-01-02T18:33:57 encoding: Check whether encoding handlers support input/output The "HTML" encoding handler doesn't support input which could lead to a wrong error report.
Nick Wellnhofer 85f99023 2024-01-02T17:52:43 parser: Fix buffer size checks Don't test size of remaining data. This causes false positives with memory buffers. Also impose XML_MAX_HUGE_LENGTH limit when parsing with XML_PARSE_HUGE.
Nick Wellnhofer e8fb3d63 2024-01-02T17:45:54 parser: Convert some "internal errors" to meaningful codes
Nick Wellnhofer 9912c369 2024-01-02T17:23:59 SAX2: Enforce size limit in xmlSAX2Text with XML_PARSE_HUGE
Nick Wellnhofer 5cb4b05c 2024-01-02T17:16:22 parser: Lower maximum entity nesting depth Limit entity nesting depth to 20 or 40 with XML_PARSE_HUGE. Change error code to XML_ERR_RESOURCE_LIMIT.
Nick Wellnhofer a2cc7f5f 2024-01-02T17:02:21 parser: Set depth limit to 2048 with XML_PARSE_HUGE Deeply nested documents can cause performance problems, so the nesting depth should always be limited to a reasonable value. Also remove the global xmlParserMaxDepth setting which isn't thread-safe and seems unused.
Nick Wellnhofer 875bb084 2023-09-07T03:25:45 parser: Implement xmlCtxtSetOptions Surprisingly, some options can only be enabled with xmlCtxtUseOptions and it's impossible to unset them. Add a new API function xmlCtxtSetOptions which sets or clears all options. Finally document all parser options. Make sure to synchronize option bits and struct members.
Nick Wellnhofer 33ec407a 2023-09-07T03:33:09 parser: Always prefer option members over bitmask If an option has an extra member in xmlParserCtxt, it takes precedence over the value from the options bitmask. Fix a few places where this was ignored.
Nick Wellnhofer 22fd571f 2023-09-06T22:15:20 parser: Don't modify SAX2 handler if XML_PARSE_SAX1 is set It's a bad idea to modify members of the SAX handler struct for option state management. Ideally, ctxt->options should be the preferred source of truth.
Nick Wellnhofer 37c6618b 2023-12-30T02:50:34 parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.
Nick Wellnhofer 4dcc2d74 2024-01-02T14:04:44 save: Output U+FFFD replacement characters This degrades more gracefully and helps to diagnose errors. We stop raising errors for now, since there's no way to report malloc failures during error handling yet.
Nick Wellnhofer 2b79f106 2023-12-29T21:07:04 parser: Simplify entity size accounting
Nick Wellnhofer 08d9b258 2023-12-29T15:20:56 parser: Support namespace scope in NsData struct The previous approach of recreating the NsData struct was flawed.
Nick Wellnhofer 5de48d12 2023-12-29T14:41:40 parser: Simplify error handling when parsing entities
Nick Wellnhofer f0dc52d0 2023-12-29T06:00:20 parser: Move cleanup of element stacks to xmlParseContent
Nick Wellnhofer a1ed589b 2023-12-29T23:12:06 parser: Avoid unwanted expansion of parameter entities Remove PE handling from xmlSkipBlankChars and add a separate version that handles PEs. Only call xmlSkipBlankCharsPE when parsing DTD constructs. This should make sure that PEs don't get expanded accidentally, for example in text declarations.
Nick Wellnhofer 16b0dbc1 2023-12-29T18:47:30 parser: Fix XML_ERR_UNSUPPORTED_ENCODING errors Commit 45157261 added the check in the wrong place. Also allow unsupported encoding in xmlNewInputInternal. Fixes #654.
Nick Wellnhofer e45a4d71 2023-12-29T00:00:21 io: Always forward IO errors to global handler The HTTP module raises errors without context. This won't be fixed, so send them to the global error handler.
Nick Wellnhofer a73483ed 2023-12-29T00:22:02 parser: Remove extraneous error message This is not an "internal error" but some other error reported elsewhere.
Nick Wellnhofer 7e0bbbc1 2023-12-27T18:33:30 parser: New input API Provide a new set of functions to create xmlParserInputs. These can be used for the document entity or from external entity loaders. - Don't require xmlParserInputBuffer. - All functions take a base URI. - All functions take an encoding as string. - xmlNewInputURL also takes a public ID. - xmlNewInputMemory takes a size_t. - Optimization hints for memory buffers. Improve documentation. Only call xmlInitParser before allocating a new parser context. Call xmlCtxtUseOptions as early as possible.
Nick Wellnhofer 45157261 2023-12-27T21:30:13 parser: Downgrade XML_ERR_UNSUPPORTED_ENCODING to warning If the actual encoding is UTF-8 or ASCII, we don't want to fail.
Nick Wellnhofer 24b7144f 2023-12-27T15:50:58 parser: More refactoring of entity parsing Remove xmlCreateEntityParserCtxtInternal. Rework xmlNewEntityInputStream.
Nick Wellnhofer d3ceea0b 2023-12-27T15:18:09 parser: Fix encoding handling in xmlParserInputBufferCreateIO Don't pass encoding to xmlParserInputBufferCreateIO but use xmlSwitchEncoding to make sure that the encoding sticks.
Nick Wellnhofer d025cfbb 2023-12-27T03:53:24 parser: Always copy content from entity to target. Make sure that references from IDs are updated. Note that if there are IDs with the same value in a document, the last one will now be returned. IDs should be unique, but maybe this should be addressed.
Nick Wellnhofer 6337ff79 2023-12-27T03:29:13 parser: Simplify control flow in xmlParseReference
Nick Wellnhofer 579186f2 2023-12-27T03:03:26 parser: Remove xmlSetEntityReferenceFunc feature This has been deprecated for a long time.
Nick Wellnhofer b848338c 2023-12-27T01:46:40 parser: More refactoring of entity loading This sets input->entity also for general entities.
Nick Wellnhofer 4ecc85d2 2023-12-27T00:44:16 parser: Push general entity input streams on the stack This allows the error handler to give more context.
Nick Wellnhofer a5dcf0f4 2023-12-26T03:27:23 parser: Mark more parser context members as unused
Nick Wellnhofer 6a9a88a1 2023-12-26T03:13:05 parser: Move progressive flag into input struct
Nick Wellnhofer 4f14fe9c 2023-12-26T02:44:38 parser: Remove remaining ctxt->instate checks Now ctxt->instate is only used for push parser states.
Nick Wellnhofer d944a415 2023-12-26T02:10:35 parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.
Nick Wellnhofer 477a7ed8 2023-12-28T19:06:32 html: Abort earlier on fatal errors
Nick Wellnhofer 5f319304 2023-12-28T19:05:51 SAX2: Fix error code Today I learned that the TSCII character encoding [1] can blow up the size of text 12 times when converted to UTF-8: $ printf '\x82' |iconv -f TSCII -t UTF-8 |hexdump -C 00000000 e0 ae b8 e0 af 8d e0 ae b0 e0 af 80 0000000c [1] https://en.wikipedia.org/wiki/Tamil_Script_Code_for_Information_Interchange
Nick Wellnhofer ab631971 2023-12-28T17:07:03 uri: Keep fragment intact when resolving filesystem paths
Nick Wellnhofer b8313b58 2023-12-26T21:59:08 xpath: Rewrite substring-before and substring-after Don't use buffers. Check malloc failures.
Nick Wellnhofer 3874e5d0 2023-12-26T01:42:23 tests: Remove unneeded error formatting code
Nick Wellnhofer 2a2fbe1e 2023-12-28T16:42:03 xinclude: Only set xml:base if necessary
Nick Wellnhofer 8a685a3d 2023-12-26T00:42:22 xinclude: Allow empty nodesets There's no reason to treat an empty nodeset as error.
Nick Wellnhofer f3fa34dc 2023-12-26T22:37:26 parser: Fix general entity parsing Clear namespace database. Ignore non-fatal errors.
Nick Wellnhofer ecfbcc8a 2023-12-25T04:33:00 parser: Rework general entity parsing Don't create a new parser context but reuse the existing one. This exposes bug #601 in a more obvious way.
Nick Wellnhofer c2ef78f7 2023-12-24T23:56:57 io: Fix close error handling There's no way to report error codes from closing an output buffer yet.
Nick Wellnhofer 6d27c549 2023-12-24T17:59:02 io: Fix read/write error handling Handle short reads/writes from fd. Fix stdio error handling.
Nick Wellnhofer 0bef93bf 2023-12-23T04:03:41 io: More refactoring and unescaping fixes Merge Windows wrappers into relevant functions. Remove more unnecessary unescaping. Merge *OpenW into *Open functions. Use unbuffered IO for output.
Nick Wellnhofer 331dcd62 2023-12-23T01:40:54 error: Reenable full error reports to default handler This should make console output include some information about nodes again. Note that this extra information must be disabled if a custom generic error handler was set. Many downstream test suites rely on this behavior.
Nick Wellnhofer c1bddd4c 2023-12-23T01:09:17 parser: Mark 'length' member of xmlParserInput as unused
Nick Wellnhofer 955c177f 2023-12-23T00:58:36 parser: Stop using 'directory' struct member This was only used as a pointless fallback for URI resolution.
Nick Wellnhofer 60841beb 2023-12-25T18:31:22 parser: Make XML_IO_NETWORK_ATTEMPT behave as before Always reported to generic error, not to parser context for backward compatibility. Several downstream test suites rely on this behavior.
Nick Wellnhofer a2693410 2023-12-23T00:35:30 io: Move some code from xmlIO.c to parserInternals.c Move everything related to parser contexts to parserInternals.c.
Nick Wellnhofer 8ab1b122 2023-12-23T00:00:15 Fix filename and URI handling Many strings are passed to the library that could be either URIs or filesystem paths. We now assume that strings are a URI if they contain the substring "://". This means that they have a scheme and an authority. Otherwise, URI resolution wouldn't make much sense. Fix xmlBuildURI to work with filesystem paths. If the base URI doesn't contain "://" it is treated as filename. The resolved URI is unescaped, appended and the result is normalized. Rewrite xmlNormalizePath to handle Windows quirks. All special handling for Windows paths is removed in xmlCanonicPath. If the path looks like an URI, only escape characters allowed in Legacy Extended IRIs. Make xmlPathToURI only call xmlCanonicPath. Theh additional round-trip through URI parser and serializer seems useless. Add a helper function xmlConvertUriToPath in xmlIO.c which checks for file URIs and unescapes them. Always process strings with xmlCanonicPath in xmlLoadExternalEntity. This should be harmless now. Should help with #334, #387, #611.
Nick Wellnhofer 28913232 2023-12-22T23:58:43 uri: Clean up special parsing modes Add function to handle unreserved check. Give flags meaningful names. Add support to allow ucschars from Legacy Extended IRIs.
Nick Wellnhofer 6e3a2ac6 2023-12-22T21:38:50 xinclude: Rework xml:base fixup The xml:base fixup was broken in more complex cases. Also avoid parsing and building the included URI multiple times.
Nick Wellnhofer 35a4bc50 2023-12-22T15:14:19 xinclude: Report to xmlGenericError
Nick Wellnhofer e8de3401 2023-12-22T02:57:19 parser: Also set document properties when push parsing Add new function xmlFinishDocument which invokes the endDocument SAX handler and sets the document's properties.
Nick Wellnhofer c73de050 2023-12-23T04:50:47 include: Move non-generated parts from xmlversion.h.in xmlexports.h originally only included symbol visibility macros but it's a good place for other macros as well.
Nick Wellnhofer a18d9416 2023-12-21T18:39:44 Update NEWS
Nick Wellnhofer 229e5ff7 2023-12-21T18:09:42 io: Remove support for HTTP POST This feature is unlikely to be used these days.
Nick Wellnhofer 9c2c87b5 2023-12-24T15:33:12 dict: Move local RNG state to global state Don't use TLS variables directly.
Nick Wellnhofer 2e9e758d 2023-12-24T14:27:46 dict: Get random seed from system PRNG
Nick Wellnhofer c49572e5 2023-12-23T15:03:22 malloc-fail: Fix erroneous report in xmlStringGetNodeList The parser can produce invalid attribute content in recovery mode. Unless this is fixed, xmlStringGetNodeList should ignore such errors silently.
Nick Wellnhofer c8f1f4a2 2023-12-21T17:30:38 doc: Improve documentation of error handlers
Nick Wellnhofer 882b3a80 2023-12-21T15:34:24 runtest: Fix return code in rngTest
Nick Wellnhofer f0df3e6d 2023-12-21T14:35:18 tests: Try to fix RelaxNG test cases These were added recently in ea695ac0 and 8074b881 but were a total mess of symbolic links and apparently mixed up files. Symbolic links don't work on Windows. Try to salvage one of the tests.
Nick Wellnhofer 8cd56317 2023-12-21T02:32:01 html: Don't close fd in htmlCtxtReadFd Long-standing bug. The XML fix from 2003 was never ported to the HTML parser. htmlReadFd was fixed with fe6890e2.
Nick Wellnhofer 0a658c0f 2023-12-20T23:53:19 io: Don't use "-" to read from stdin To implement this feature on such a low level is a disaster waiting to happen. Remove these checks from the IO code and move them to xmllint. Note that the serialization API will still treat "-" as stdout.
Nick Wellnhofer c9a46a91 2023-12-20T20:11:09 io: Rework initialization
Nick Wellnhofer b75fc1ab 2023-12-20T20:01:19 io: Rearrange code
Nick Wellnhofer 13043691 2023-12-20T00:33:34 parser: Rename xmlErrParser to xmlCtxtErr
Nick Wellnhofer 8d0aaf4b 2023-12-19T20:47:36 parser: Remove xmlErrEncoding Use xmlFatalErr or xmlCtxtErrIO.
Nick Wellnhofer 9fbe46ba 2023-12-19T20:10:10 io: Consolidate error messages
Nick Wellnhofer 23345a1c 2023-12-19T19:52:28 io: Report IO errors through xmlCtxtErrIO This is also a new public API function to be used in external entity loaders.
Nick Wellnhofer e62b0dbd 2023-12-19T19:47:07 xzlib: Fix harmless unsigned integer overflow
Nick Wellnhofer 1ef35663 2023-12-19T19:36:35 io: Always use unbuffered input Before, we often used unbuffered input via the lzma or gzip handlers, more or less inadvertently. Change the default file handlers from buffered (stdc FILE) to unbuffered (POSIX fds).
Nick Wellnhofer 7e14c05d 2023-12-19T17:05:08 io: Fix detection of compressed streams Make sure that we don't try to open uncompressed streams with a compression handler in copying mode.
Nick Wellnhofer 7e511f35 2023-12-19T15:41:37 io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile This allows to report the reason why opening a file failed to the parser context and improve error messages. Now we can also remove the stat call before opening a file.
Nick Wellnhofer b2dbcc43 2023-12-19T13:33:59 io: Rework default callbacks Register a dummy callback struct for default callbacks. Handle them in a separate function which will later allow to return meaningful error codes.