Log

Author Commit Date CI Message
Nick Wellnhofer a7356dfe 2024-01-03T18:02:46 parser: Clear invalid entity content This was removed in earlier commits, but we really want to make sure that entity content is syntactically valid.
Nick Wellnhofer 30d83977 2024-01-04T15:18:14 fuzz: Disable catalogs The catalogs API doesn't report OOM errors. It's basically impossible to use it safely in its current form.
Nick Wellnhofer ca5965d5 2024-01-02T21:49:43 save: Report more malloc failures
Nick Wellnhofer 2c9cd0b6 2024-01-02T18:51:24 fuzz: Abort on internal errors
Nick Wellnhofer e8fb3d63 2024-01-02T17:45:54 parser: Convert some "internal errors" to meaningful codes
Nick Wellnhofer 9912c369 2024-01-02T17:23:59 SAX2: Enforce size limit in xmlSAX2Text with XML_PARSE_HUGE
Nick Wellnhofer 661ef936 2024-01-02T18:50:59 valid: Fix some error codes
Nick Wellnhofer 5cb4b05c 2024-01-02T17:16:22 parser: Lower maximum entity nesting depth Limit entity nesting depth to 20 or 40 with XML_PARSE_HUGE. Change error code to XML_ERR_RESOURCE_LIMIT.
Nick Wellnhofer 0821efc8 2024-01-02T18:33:57 encoding: Check whether encoding handlers support input/output The "HTML" encoding handler doesn't support input which could lead to a wrong error report.
Nick Wellnhofer 85f99023 2024-01-02T17:52:43 parser: Fix buffer size checks Don't test size of remaining data. This causes false positives with memory buffers. Also impose XML_MAX_HUGE_LENGTH limit when parsing with XML_PARSE_HUGE.
Nick Wellnhofer a2cc7f5f 2024-01-02T17:02:21 parser: Set depth limit to 2048 with XML_PARSE_HUGE Deeply nested documents can cause performance problems, so the nesting depth should always be limited to a reasonable value. Also remove the global xmlParserMaxDepth setting which isn't thread-safe and seems unused.
Nick Wellnhofer 875bb084 2023-09-07T03:25:45 parser: Implement xmlCtxtSetOptions Surprisingly, some options can only be enabled with xmlCtxtUseOptions and it's impossible to unset them. Add a new API function xmlCtxtSetOptions which sets or clears all options. Finally document all parser options. Make sure to synchronize option bits and struct members.
Nick Wellnhofer 33ec407a 2023-09-07T03:33:09 parser: Always prefer option members over bitmask If an option has an extra member in xmlParserCtxt, it takes precedence over the value from the options bitmask. Fix a few places where this was ignored.
Nick Wellnhofer 22fd571f 2023-09-06T22:15:20 parser: Don't modify SAX2 handler if XML_PARSE_SAX1 is set It's a bad idea to modify members of the SAX handler struct for option state management. Ideally, ctxt->options should be the preferred source of truth.
Nick Wellnhofer 37c6618b 2023-12-30T02:50:34 parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.
Nick Wellnhofer 4dcc2d74 2024-01-02T14:04:44 save: Output U+FFFD replacement characters This degrades more gracefully and helps to diagnose errors. We stop raising errors for now, since there's no way to report malloc failures during error handling yet.
Nick Wellnhofer 2b79f106 2023-12-29T21:07:04 parser: Simplify entity size accounting
Nick Wellnhofer 08d9b258 2023-12-29T15:20:56 parser: Support namespace scope in NsData struct The previous approach of recreating the NsData struct was flawed.
Nick Wellnhofer 5de48d12 2023-12-29T14:41:40 parser: Simplify error handling when parsing entities
Nick Wellnhofer f0dc52d0 2023-12-29T06:00:20 parser: Move cleanup of element stacks to xmlParseContent
Nick Wellnhofer a1ed589b 2023-12-29T23:12:06 parser: Avoid unwanted expansion of parameter entities Remove PE handling from xmlSkipBlankChars and add a separate version that handles PEs. Only call xmlSkipBlankCharsPE when parsing DTD constructs. This should make sure that PEs don't get expanded accidentally, for example in text declarations.
Nick Wellnhofer 16b0dbc1 2023-12-29T18:47:30 parser: Fix XML_ERR_UNSUPPORTED_ENCODING errors Commit 45157261 added the check in the wrong place. Also allow unsupported encoding in xmlNewInputInternal. Fixes #654.
Nick Wellnhofer e45a4d71 2023-12-29T00:00:21 io: Always forward IO errors to global handler The HTTP module raises errors without context. This won't be fixed, so send them to the global error handler.
Nick Wellnhofer a73483ed 2023-12-29T00:22:02 parser: Remove extraneous error message This is not an "internal error" but some other error reported elsewhere.
Nick Wellnhofer 7e0bbbc1 2023-12-27T18:33:30 parser: New input API Provide a new set of functions to create xmlParserInputs. These can be used for the document entity or from external entity loaders. - Don't require xmlParserInputBuffer. - All functions take a base URI. - All functions take an encoding as string. - xmlNewInputURL also takes a public ID. - xmlNewInputMemory takes a size_t. - Optimization hints for memory buffers. Improve documentation. Only call xmlInitParser before allocating a new parser context. Call xmlCtxtUseOptions as early as possible.
Nick Wellnhofer 45157261 2023-12-27T21:30:13 parser: Downgrade XML_ERR_UNSUPPORTED_ENCODING to warning If the actual encoding is UTF-8 or ASCII, we don't want to fail.
Nick Wellnhofer 24b7144f 2023-12-27T15:50:58 parser: More refactoring of entity parsing Remove xmlCreateEntityParserCtxtInternal. Rework xmlNewEntityInputStream.
Nick Wellnhofer d3ceea0b 2023-12-27T15:18:09 parser: Fix encoding handling in xmlParserInputBufferCreateIO Don't pass encoding to xmlParserInputBufferCreateIO but use xmlSwitchEncoding to make sure that the encoding sticks.
Nick Wellnhofer d025cfbb 2023-12-27T03:53:24 parser: Always copy content from entity to target. Make sure that references from IDs are updated. Note that if there are IDs with the same value in a document, the last one will now be returned. IDs should be unique, but maybe this should be addressed.
Nick Wellnhofer 6337ff79 2023-12-27T03:29:13 parser: Simplify control flow in xmlParseReference
Nick Wellnhofer 579186f2 2023-12-27T03:03:26 parser: Remove xmlSetEntityReferenceFunc feature This has been deprecated for a long time.
Nick Wellnhofer b848338c 2023-12-27T01:46:40 parser: More refactoring of entity loading This sets input->entity also for general entities.
Nick Wellnhofer 4ecc85d2 2023-12-27T00:44:16 parser: Push general entity input streams on the stack This allows the error handler to give more context.
Nick Wellnhofer a5dcf0f4 2023-12-26T03:27:23 parser: Mark more parser context members as unused
Nick Wellnhofer 6a9a88a1 2023-12-26T03:13:05 parser: Move progressive flag into input struct
Nick Wellnhofer 4f14fe9c 2023-12-26T02:44:38 parser: Remove remaining ctxt->instate checks Now ctxt->instate is only used for push parser states.
Nick Wellnhofer d944a415 2023-12-26T02:10:35 parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.
Nick Wellnhofer 477a7ed8 2023-12-28T19:06:32 html: Abort earlier on fatal errors
Nick Wellnhofer 5f319304 2023-12-28T19:05:51 SAX2: Fix error code Today I learned that the TSCII character encoding [1] can blow up the size of text 12 times when converted to UTF-8: $ printf '\x82' |iconv -f TSCII -t UTF-8 |hexdump -C 00000000 e0 ae b8 e0 af 8d e0 ae b0 e0 af 80 0000000c [1] https://en.wikipedia.org/wiki/Tamil_Script_Code_for_Information_Interchange
Nick Wellnhofer ab631971 2023-12-28T17:07:03 uri: Keep fragment intact when resolving filesystem paths
Nick Wellnhofer b8313b58 2023-12-26T21:59:08 xpath: Rewrite substring-before and substring-after Don't use buffers. Check malloc failures.
Nick Wellnhofer 3874e5d0 2023-12-26T01:42:23 tests: Remove unneeded error formatting code
Nick Wellnhofer 2a2fbe1e 2023-12-28T16:42:03 xinclude: Only set xml:base if necessary
Nick Wellnhofer 8a685a3d 2023-12-26T00:42:22 xinclude: Allow empty nodesets There's no reason to treat an empty nodeset as error.
Nick Wellnhofer f3fa34dc 2023-12-26T22:37:26 parser: Fix general entity parsing Clear namespace database. Ignore non-fatal errors.
Nick Wellnhofer ecfbcc8a 2023-12-25T04:33:00 parser: Rework general entity parsing Don't create a new parser context but reuse the existing one. This exposes bug #601 in a more obvious way.
Nick Wellnhofer c2ef78f7 2023-12-24T23:56:57 io: Fix close error handling There's no way to report error codes from closing an output buffer yet.
Nick Wellnhofer 6d27c549 2023-12-24T17:59:02 io: Fix read/write error handling Handle short reads/writes from fd. Fix stdio error handling.
Nick Wellnhofer 0bef93bf 2023-12-23T04:03:41 io: More refactoring and unescaping fixes Merge Windows wrappers into relevant functions. Remove more unnecessary unescaping. Merge *OpenW into *Open functions. Use unbuffered IO for output.
Nick Wellnhofer 331dcd62 2023-12-23T01:40:54 error: Reenable full error reports to default handler This should make console output include some information about nodes again. Note that this extra information must be disabled if a custom generic error handler was set. Many downstream test suites rely on this behavior.
Nick Wellnhofer c1bddd4c 2023-12-23T01:09:17 parser: Mark 'length' member of xmlParserInput as unused
Nick Wellnhofer 955c177f 2023-12-23T00:58:36 parser: Stop using 'directory' struct member This was only used as a pointless fallback for URI resolution.
Nick Wellnhofer 60841beb 2023-12-25T18:31:22 parser: Make XML_IO_NETWORK_ATTEMPT behave as before Always reported to generic error, not to parser context for backward compatibility. Several downstream test suites rely on this behavior.
Nick Wellnhofer a2693410 2023-12-23T00:35:30 io: Move some code from xmlIO.c to parserInternals.c Move everything related to parser contexts to parserInternals.c.
Nick Wellnhofer 8ab1b122 2023-12-23T00:00:15 Fix filename and URI handling Many strings are passed to the library that could be either URIs or filesystem paths. We now assume that strings are a URI if they contain the substring "://". This means that they have a scheme and an authority. Otherwise, URI resolution wouldn't make much sense. Fix xmlBuildURI to work with filesystem paths. If the base URI doesn't contain "://" it is treated as filename. The resolved URI is unescaped, appended and the result is normalized. Rewrite xmlNormalizePath to handle Windows quirks. All special handling for Windows paths is removed in xmlCanonicPath. If the path looks like an URI, only escape characters allowed in Legacy Extended IRIs. Make xmlPathToURI only call xmlCanonicPath. Theh additional round-trip through URI parser and serializer seems useless. Add a helper function xmlConvertUriToPath in xmlIO.c which checks for file URIs and unescapes them. Always process strings with xmlCanonicPath in xmlLoadExternalEntity. This should be harmless now. Should help with #334, #387, #611.
Nick Wellnhofer 28913232 2023-12-22T23:58:43 uri: Clean up special parsing modes Add function to handle unreserved check. Give flags meaningful names. Add support to allow ucschars from Legacy Extended IRIs.
Nick Wellnhofer 6e3a2ac6 2023-12-22T21:38:50 xinclude: Rework xml:base fixup The xml:base fixup was broken in more complex cases. Also avoid parsing and building the included URI multiple times.
Nick Wellnhofer 35a4bc50 2023-12-22T15:14:19 xinclude: Report to xmlGenericError
Nick Wellnhofer e8de3401 2023-12-22T02:57:19 parser: Also set document properties when push parsing Add new function xmlFinishDocument which invokes the endDocument SAX handler and sets the document's properties.
Nick Wellnhofer c73de050 2023-12-23T04:50:47 include: Move non-generated parts from xmlversion.h.in xmlexports.h originally only included symbol visibility macros but it's a good place for other macros as well.
Nick Wellnhofer a18d9416 2023-12-21T18:39:44 Update NEWS
Nick Wellnhofer 229e5ff7 2023-12-21T18:09:42 io: Remove support for HTTP POST This feature is unlikely to be used these days.
Nick Wellnhofer 2e9e758d 2023-12-24T14:27:46 dict: Get random seed from system PRNG
Nick Wellnhofer 9c2c87b5 2023-12-24T15:33:12 dict: Move local RNG state to global state Don't use TLS variables directly.
Nick Wellnhofer c49572e5 2023-12-23T15:03:22 malloc-fail: Fix erroneous report in xmlStringGetNodeList The parser can produce invalid attribute content in recovery mode. Unless this is fixed, xmlStringGetNodeList should ignore such errors silently.
Nick Wellnhofer c8f1f4a2 2023-12-21T17:30:38 doc: Improve documentation of error handlers
Nick Wellnhofer 882b3a80 2023-12-21T15:34:24 runtest: Fix return code in rngTest
Nick Wellnhofer f0df3e6d 2023-12-21T14:35:18 tests: Try to fix RelaxNG test cases These were added recently in ea695ac0 and 8074b881 but were a total mess of symbolic links and apparently mixed up files. Symbolic links don't work on Windows. Try to salvage one of the tests.
Nick Wellnhofer 8cd56317 2023-12-21T02:32:01 html: Don't close fd in htmlCtxtReadFd Long-standing bug. The XML fix from 2003 was never ported to the HTML parser. htmlReadFd was fixed with fe6890e2.
Nick Wellnhofer 0a658c0f 2023-12-20T23:53:19 io: Don't use "-" to read from stdin To implement this feature on such a low level is a disaster waiting to happen. Remove these checks from the IO code and move them to xmllint. Note that the serialization API will still treat "-" as stdout.
Nick Wellnhofer c9a46a91 2023-12-20T20:11:09 io: Rework initialization
Nick Wellnhofer b75fc1ab 2023-12-20T20:01:19 io: Rearrange code
Nick Wellnhofer 13043691 2023-12-20T00:33:34 parser: Rename xmlErrParser to xmlCtxtErr
Nick Wellnhofer 8d0aaf4b 2023-12-19T20:47:36 parser: Remove xmlErrEncoding Use xmlFatalErr or xmlCtxtErrIO.
Nick Wellnhofer 9fbe46ba 2023-12-19T20:10:10 io: Consolidate error messages
Nick Wellnhofer 23345a1c 2023-12-19T19:52:28 io: Report IO errors through xmlCtxtErrIO This is also a new public API function to be used in external entity loaders.
Nick Wellnhofer e62b0dbd 2023-12-19T19:47:07 xzlib: Fix harmless unsigned integer overflow
Nick Wellnhofer 1ef35663 2023-12-19T19:36:35 io: Always use unbuffered input Before, we often used unbuffered input via the lzma or gzip handlers, more or less inadvertently. Change the default file handlers from buffered (stdc FILE) to unbuffered (POSIX fds).
Nick Wellnhofer 7e14c05d 2023-12-19T17:05:08 io: Fix detection of compressed streams Make sure that we don't try to open uncompressed streams with a compression handler in copying mode.
Nick Wellnhofer 7e511f35 2023-12-19T15:41:37 io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile This allows to report the reason why opening a file failed to the parser context and improve error messages. Now we can also remove the stat call before opening a file.
Nick Wellnhofer b2dbcc43 2023-12-19T13:33:59 io: Rework default callbacks Register a dummy callback struct for default callbacks. Handle them in a separate function which will later allow to return meaningful error codes.
Nick Wellnhofer 531d06ad 2023-12-18T22:48:24 error: Stop printing some errors by default Unfortunately, it's long-standing behavior for libxml2 to print all reported errors to stderr by default. This default behavior is now partially disabled. If no error handler is set, only parser and validation errors are passed to a generic error handler or printed to stderr. Other errors are still available via xmlGetLastError and can be captured with a structured error handler.
Nick Wellnhofer 0c7a364f 2023-12-18T21:55:50 error: Remove xmlSimpleError
Nick Wellnhofer f9f5c2d8 2023-12-18T21:44:06 xmllint: Don't use xmlGenericError
Nick Wellnhofer e9c01c30 2023-12-18T21:41:30 runtest: Test with per-context error handlers Only set the global error handler where needed. Don't use xmlGenericError.
Nick Wellnhofer 05d9bacd 2023-12-18T21:39:51 regexp: Improve error handling Handle malloc failure from xmlRaiseError. Use xmlRaiseMemoryError. Remove argument from memory error handler. Remove TODO macro.
Nick Wellnhofer ecb4c9fb 2023-12-18T21:32:49 misc: Improve error handling Remove calls to generic error handler or use stderr for - legacy deprecation warnings - nanohttp, nanoftp in standalone mode - memory debug messages Use xmlRaiseMemoryError. Remove TODO macro. Don't raise errors in xmlmodule.c.
Nick Wellnhofer bc1e0306 2023-12-18T21:30:22 save: Improve error handling Handle malloc failrue from xmlRaiseError. Use xmlRaiseMemoryError. Stop using xmlGenericError. Remove argument from memory error handler. Remove TODO macro.
Nick Wellnhofer 664db89e 2023-12-18T21:25:28 schematron: Improve error handling Implement xmlSchematronVErr. Handle malloc failure from xmlRaiseError. Stop using xmlGenericError. Remove argument from memory error handler. Use xmlRaiseMemoryError. Remove TODO macro.
Nick Wellnhofer 83c6aeef 2023-12-18T21:12:29 relaxng: Improve error handling Pass RelaxNG structured error handler to XML parser. Handle malloc failure from xmlRaiseError. Remove argument from memory error handler. Use xmlRaiseMemoryError. Don't use xmlGenericError. Remove TODO macro.
Nick Wellnhofer be76b7df 2023-12-18T21:09:39 debug: Improve error handling Print to debug output instead of global error handler.
Nick Wellnhofer 25e22011 2023-12-18T20:58:42 c14n: Improve error handling Handle malloc failure from xmlRaiseError. Add context argument to error functions. Remove argument from memory error handler. Use xmlRaiseMemoryError.
Nick Wellnhofer 6cb8420a 2023-12-18T20:54:26 catalog: Improve error handling Handle malloc failures from xmlRaiseError. Remove arguments from memor error handler. Remove TODO macro. Make debugging code print to stderr instead of xmlGenericError.
Nick Wellnhofer 06c00f65 2023-12-18T19:51:32 schemas: Improve error handling Introduce xmlSchema*ErrFull which checks for memory allocation failures during error reporting. Remove arguments from memory error handlers. Use xmlRaiseMemoryError. Remove TODO macro.
Nick Wellnhofer ed6596a4 2023-12-18T19:47:47 reader: Simplify error handling Only use structured error handlers for parser, Schemas and RelaxNG contexts. Also use structured error handler for XInclude context. Remove TODO macro.
Nick Wellnhofer 2829a21a 2023-12-18T19:43:55 xinclude: Improve error handling Introduce xmlXIncludeSetErrorHandler allowing to set a structured error handler for an XInclude context and forwarding errors from parser. Remove arguments from memory error handlers. Use xmlRaiseMemoryError.
Nick Wellnhofer 954b8984 2023-12-18T19:39:38 xpath: Improve error handling Introduce xmlXPathSetErrorHandler allowing to set a structured error handler for an XPath context. Remove arguments from memory error handlers. Use xmlRaiseMemoryError. Remove TODO, STRANGE and CHECK_CTXT macros. Remove remaining uses of xmlGenericError.
Nick Wellnhofer 54c70ed5 2023-12-18T19:31:29 parser: Improve error handling Introduce xmlCtxtSetErrorHandler allowing to set a structured error for a parser context. There already was the "serror" SAX handler but this always receives the parser context as argument. Start to use xmlRaiseMemoryError. Remove useless arguments from memory error functions. Rename xmlErrMemory to xmlCtxtErrMemory. Remove a few calls to xmlGenericError. Remove support for runtime entity debugging.
Nick Wellnhofer c5a8aef2 2023-12-18T19:12:08 error: Refactor error reporting Introduce xmlStrVASPrintf, trying to handle buggy snprintf implementations. Introduce xmlSetError to set errors atomically. Introduce xmlUpdateError to set an error, fixing up node, file and line. Introduce helper function xmlRaiseMemoryError. Make legacy error handlers call xmlReportError, avoiding checks in xmlVRaiseError. Remove fragile support for getting file and line info from XInclude nodes.
Mike Dalessio ed3ad3e1 2023-12-20T11:42:08 Makefile.am: omit $(top_builddir) from DEPS and LDADDS BSD make is less liberal than GNU make in matching targets, and so it tries to resolve the dependency `./libxml2.la` and fails to match the target `libxml2.la`.