include/private


Log

Author Commit Date CI Message
Nick Wellnhofer 469c847f 2025-07-22T23:44:10 parser: Split out xmlParserInputGetWindow
Nick Wellnhofer 144ed959 2025-07-22T22:38:05 parser: Move xmlSaturatedAdd to private header
Nick Wellnhofer 7a41b18c 2025-07-22T01:08:38 parser: Remove xmlHaltParser Always halt the parser on resource limit and entity loop errors and remove the remaining calls which seem unnecessary.
Nick Wellnhofer c5e7ff09 2025-07-21T12:26:36 tree: More xmlNodeParseContent cleanup - Rename to xmlNodeParseAttValue - Rework argument types - Remove wrapper function
Daniel P. Berrangé ac5fcb0e 2025-06-25T15:24:24 relaxng: ensure thread safe global initialization Relying on a plain integer flag, with no synchronization primitives does not give thread-safe initialization. All reads & writes of the xmlSchemaTypesInitialized flag need to be protected by a mutex to ensure suitable memory barriers & thus correct ordering wrt any speculative execution. A separate internal initializer tied to xmlParserInit is used to create the mutex used for synchronization, similarly to how catalog.c works. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Daniel P. Berrangé 80798c40 2025-06-25T15:24:24 xmlschemastypes: ensure thread safe global initialization Relying on a plain integer flag, with no synchronization primitives does not give thread-safe initialization. All reads & writes of the xmlSchemaTypesInitialized flag need to be protected by a mutex to ensure suitable memory barriers & thus correct ordering wrt any speculative execution. A separate internal initializer tied to xmlParserInit is used to create the mutex used for synchronization, similarly to how catalog.c works. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Nick Wellnhofer 2b6b3945 2025-06-03T16:12:56 Revert "SAX1: Align handling of default attributes with SAX2" This reverts commit db65b2fc51ef0d6e4d2e9dc65ba12fe948da49f3. This didn't check for duplicate default attributes.
Nick Wellnhofer db65b2fc 2025-05-20T22:41:08 SAX1: Align handling of default attributes with SAX2 The SAX1 parser is legacy code, but it seems more maintainable to align it with SAX2.
Nick Wellnhofer 2f3655c9 2025-05-20T19:40:06 parser: Pop PEs that start markup declarations explicitly We currently only handle "Validity constraint: Proper Declaration/PE Nesting", but we must detect "Well-formedness constraint: PE Between Declarations" separately: > The replacement text of a parameter entity reference in a DeclSep must > match the production extSubsetDecl. PEs in DeclSeps are PEs that start with a full markup declaration (or another PE). These are handled in xmParse{Internal|External}Subset. We set a flag on these PEs and don't close them implicitly in xmlSkipBlankCharsPE. This will make unterminated declarations in such PEs cause a parser error. The PEs are closed explicitly in xmParse{Internal|External}Subset, the only location where they are allowed to end.
Nick Wellnhofer dd1961e0 2025-05-20T16:37:18 valid: Skip more validity checks if not validating
Nick Wellnhofer 7008740a 2025-05-18T01:52:38 parser: Consolidate scanning of XML Names Use new productions by default. Fixes #194. Fixes #364. See #707.
Nick Wellnhofer c4926b19 2025-05-16T02:12:23 codegen: Merge xmlunicode.c into xmlregexp.c Include generated parts. Generate xmlChRangeGroups instead of functions for Unicode blocks.
Nick Wellnhofer a40f36e7 2025-05-14T04:04:28 include: Stop using *Ptr typedefs in public headers
Nick Wellnhofer f602c0c1 2025-05-12T00:04:22 html: Rework serialization of meta encoding attributes Don't allocate memory.
Nick Wellnhofer 05b8fe0a 2025-04-12T23:10:40 html: Don't escape RAWTEXT and PLAINTEXT Align with HTML5.
Nick Wellnhofer 777e2adf 2025-05-09T23:53:03 io: Consolidate escaping code Use generated table approach of xmlSerializeText for xmlEscapeText. Move most code to xmlIO.c.
Nick Wellnhofer dad11630 2025-05-09T22:05:38 entities: Always replace invalid chars when escaping The previous refactor painstakingly recreated the different behavior of separate functions that were merged. It makes Optimize IS_CHAR check for non-ASCII chars.
Nick Wellnhofer 971038e5 2025-05-09T20:26:33 html: Call lower-level escaping functions Removes the need to pass a document around.
Nick Wellnhofer 63535d39 2025-05-09T20:13:43 tree: Make xmlNodeListGetStringInternal work with escape flags
Nick Wellnhofer 46f05ea4 2025-05-09T00:21:47 html: Rework meta charset handling Don't use encoding from meta tags when serializing. Only use the value in `doc->encoding`, matching the XML serializer. This is the actual encoding used when parsing. Stop modifying the input document by setting meta tags before serializing. Meta tags are now injected during serialization. Add full support for <meta charset=""> which is also used when adding meta tags. Align with HTML5 and implement the "algorithm for extracting a character encoding from a meta element". Only modify the encoding substring in Content-Type meta tags. Only switch encoding once when parsing. Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading UTF-8 charset. Fixes #909.
Nick Wellnhofer f7c41287 2025-05-02T15:57:17 doc: Remove more comment block headers
Nick Wellnhofer 69879da8 2025-04-28T14:04:30 doc: Remove email addresses from documentation Also remove authorship information from generated files, hash.c and globals.c which were rewritten.
Nick Wellnhofer b3492259 2025-03-14T00:01:11 include: Change some return types from int to enum This also affects some new functions from 2.13.
Nick Wellnhofer fd1b9391 2025-03-13T23:20:16 include: Convert some macros to enums
Nick Wellnhofer 69b83bb6 2025-03-10T02:18:51 encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.
Nick Wellnhofer 03a8d5f9 2025-03-04T16:00:08 unicode: Make Unicode functions private
Nick Wellnhofer 3d37ff84 2025-03-04T15:10:09 globals: Also use global state struct if threads are disabled
Nick Wellnhofer 361f7bff 2025-03-04T13:02:36 parser: Make nodePush, nodePop, namePush, namePop private
Nick Wellnhofer 9c16a153 2025-02-13T18:41:33 Revert "include: Make most IS_* macros private" This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
Nick Wellnhofer a78843be 2025-01-28T20:13:58 xmllint: Support compressed input from stdin Another regression related to reading from stdin. Making a "-" filename read from stdin was deeply baked into the core IO code but is inherently insecure. I really want to reenable this dangerous feature as sparingly as possible. This now enables compressed input when using the "Fd" API functions which wan't supported before. But XML_PARSE_NO_UNZIP will be inverted later. Allow compressed stdin in xmlReadFile to support xmlstarlet and older versions of xsltproc. So far, these are the only known command-line tools that rely on "-" meaning stdin.
Nick Wellnhofer bfe6af2e 2025-01-17T17:09:04 fuzz: Remove hacks to build lint fuzzer Don't include source file directly.
Nick Wellnhofer c134e8b4 2024-12-19T21:05:49 include: Make INPUT_CHUNK macro private
Nick Wellnhofer 84a6c82f 2024-12-19T20:59:10 include: Make most IS_* macros private Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.
Nick Wellnhofer 2e18e5dc 2024-12-16T18:54:36 memory: Grow dynamic arrays by 50% Growing by a factor lower than the golden ratio increases the chances of reusing memory freed from earlier allocations. Set growth rate to 1.5 which also reduces internal fragmentation.
Nick Wellnhofer 5320a4aa 2024-12-15T23:35:28 memory: Implement xmlGrowCapacity to safely grow arrays xmlGrowCapacity makes sure that dynamic arrays don't grow beyond an explicit maximum size. size_t considerations are also taken into account. A macro XML_MAX_ITEMS is provided as default maximum with value 1 billion. When fuzzing, the initial size is set to 1 to cause more reallocations. This can require adjustments if callers really need larger arrays.
Nick Wellnhofer 0dd910e8 2024-12-18T23:37:35 save: Fix handling of catastrophic errors Don't overwrite catastrophic errors xmlSaveErr. Overwrite non-catastrophic errors in xmlOutputBufferClose.
Nick Wellnhofer 57087e5f 2024-11-25T20:59:06 parser: Don't overwrite catastrophic errors Stop reporting errors after a catastrophic error. Also make sure that ctxt->errNo matches ctxt->lastError.code.
Nick Wellnhofer 0bc4608c 2024-09-15T20:28:49 html: Use hash table to check for duplicate attributes
makise-homura a3043b47 2024-08-14T23:40:16 threads: define _WIN32_WINNT as 0x0600 to use InitOnceExecuteOnce()
Nick Wellnhofer a530ff12 2024-07-29T14:18:57 io: Always consume encoding handler when creating output buffers Also free encoding handler in error case. Remove xmlAllocOutputBufferInternal which was identical to xmlAllocOutputBuffer.
Nick Wellnhofer 4e93425a 2024-07-16T20:02:13 threads: Prefer Win32 over pthreads
Nick Wellnhofer 769e5a4a 2024-07-16T01:12:21 threads: Allocate global RMutexes statically Avoid memory allocations during initialization.
Nick Wellnhofer 79e11995 2024-07-15T19:43:28 error: Make xmlLastError const
Nick Wellnhofer a6f54f05 2024-07-07T18:52:17 io: Fine-tune initial IO buffer size
Nick Wellnhofer 34c9108f 2024-07-07T18:38:31 encoding: Add sizeOut argument to xmlCharEncInput When push parsing, we want to convert as much of the input as possible. When pull parsing memory buffers, we want to convert data chunk by chunk to save memory.
Nick Wellnhofer a221cd78 2024-07-07T03:01:51 buf: Rework xmlBuf code Always use what the old implementation called the "IO" allocation scheme, allowing to move the content pointer past the initial allocation. This is inexpensive and allows efficient shrinking. Optimize xmlBufGrow, reusing shrunken memory as much as possible. Simplify xmlBufAdd. Make xmlBufBackToBuffer return an error on overflow. Make "size" exclude the terminating NULL byte. Always provide an initial size. Reintroduce static buffers. Remove xmlBufResize and several other functions.
Nick Wellnhofer 1cfc5b80 2024-07-12T03:07:57 entities: Rework serialization of numeric character references
Nick Wellnhofer 8d160626 2024-07-12T02:01:06 entities: Rework text escaping
Nick Wellnhofer 72886980 2024-07-15T14:35:47 error: Add helper functions to print errors and abort
Nick Wellnhofer 8af55c8d 2024-07-06T22:14:21 parser: Rename new input API functions These weren't made public yet.
Nick Wellnhofer d74ca594 2024-07-06T22:04:06 parser: Rename internal xmlNewInput functions
Nick Wellnhofer 4f329dc5 2024-07-10T03:27:47 parser: Implement xmlCtxtParseContent This implements xmlCtxtParseContent, a better alternative to xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a parser context and a parser input, making it a lot more versatile. xmlParseInNodeContext is now implemented in terms of xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never modifies the target document, improving thread safety. xmlParseInNodeContext is also more lenient now with regard to undeclared entities. Fixes #727.
Nick Wellnhofer 38195cf5 2024-07-06T14:58:16 parser: Don't produce names with invalid UTF-8 in recovery mode
Nick Wellnhofer 16e7ecd4 2024-07-01T16:01:24 xinclude: Check URI length Don't report long URIs as OOM errors.
Nick Wellnhofer f505dcae 2024-06-26T14:11:34 tree: Remove underscores from xmlRegisterCallbacks
Nick Wellnhofer 598ee0d2 2024-06-26T01:18:55 error: Remove underscores from xmlRaiseError
Nick Wellnhofer 1341deac 2024-06-16T17:57:12 xmllint: Move shell to xmllint Move source code for xmllint shell to shell.c and move it from the libxml2 library to the xmllint executable. Also allow shell to run without XPath and debug modules. Add stubs for old shell API functions in legacy build mode.
Nick Wellnhofer 84666581 2024-06-15T20:34:07 catalog: Fix initialization Initialize mutex via xmlInitParser. Fix some other initialization calls.
Nick Wellnhofer 52384043 2024-06-11T19:10:41 parser: Pass resource type to resource loader
Nick Wellnhofer 89fcae4d 2024-06-11T16:19:58 parser: Don't report malloc failures when creating context We don't want messages to stderr before an error handler could be set on a parser context.
Nick Wellnhofer b9d2f3c9 2024-06-11T02:15:18 parser: Introduce new input API - xmlInputCreateUrl - xmlInputCreateMemory - xmlInputCreateString - xmlInputCreateFd - xmlInputCreateIO - xmlInputSetEncoding These functions don't take a parser context and work on xmlParserInputs, replacing functions working on xmlParserInputBuffers. xmlInputCreateUrl and xmlInputSetEncoding offer fine-grained error handling. Several XML_INPUT_* flags offer additional control.
Nick Wellnhofer ff3b0919 2024-06-11T00:00:32 parser: Implement XML_PARSE_NO_UNZIP option
Nick Wellnhofer 1432949d 2024-06-10T23:57:52 io: Pass input flags to xmlParserInputBufferCreateUrl
Nick Wellnhofer b5890cb4 2024-06-10T18:51:56 io: Remove xmlParserInputBufferCreateFilenameSafe
Nick Wellnhofer 1b1e8b3c 2024-06-10T16:39:57 io: Stop invoking generic error handler for IO errors
Nick Wellnhofer b47a95fe 2024-05-20T13:10:41 parser: Don't make xmlCtxtErrIO public
Nick Wellnhofer 6a49bb77 2024-03-17T17:16:55 tree: Introduce xmlSearchNsSafe After the failed experiment with a static XML namespace, introduce versions of xmlSearchNs that report malloc failures. Optimize the no-document case by only adding the XML namespace declaration if it wasn't found in an ancestor.
Nick Wellnhofer 047ea3ec 2024-03-17T16:23:31 Revert "tree: Allocate XML namespace statically" This reverts commit 2840e33c5e4b51589a0b96e8102638eeaea6df72.
Nick Wellnhofer 9f049afa 2024-03-11T15:57:14 tree: Refactor element creation and parsing of attribute values Replace xmlStringGetNodeList and xmlStringLenGetNodeList with xmlNodeParseContentInternal which also updates an optional parent node. Don't look up entities a second time via xmlNewReference.
Nick Wellnhofer 2840e33c 2024-03-04T07:34:25 tree: Allocate XML namespace statically
Nick Wellnhofer 84a71860 2024-02-26T15:14:28 xmlreader: Fix xmlTextReaderConstEncoding Regression from commit f1c1f5c6. Fixes #697.
Nick Wellnhofer e314109a 2024-02-16T15:42:38 save: Don't write directly to internal buffer Make sure that OOM errors are reported.
Nick Wellnhofer fbe10a46 2024-02-01T19:01:57 save: Move DTD serialization code to xmlsave.c
Nick Wellnhofer 8961056f 2024-01-23T00:47:44 parser: Make experimental input API private This needs to be reworked.
Nick Wellnhofer 37c6618b 2023-12-30T02:50:34 parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.
Nick Wellnhofer 7e0bbbc1 2023-12-27T18:33:30 parser: New input API Provide a new set of functions to create xmlParserInputs. These can be used for the document entity or from external entity loaders. - Don't require xmlParserInputBuffer. - All functions take a base URI. - All functions take an encoding as string. - xmlNewInputURL also takes a public ID. - xmlNewInputMemory takes a size_t. - Optimization hints for memory buffers. Improve documentation. Only call xmlInitParser before allocating a new parser context. Call xmlCtxtUseOptions as early as possible.
Nick Wellnhofer 6a9a88a1 2023-12-26T03:13:05 parser: Move progressive flag into input struct
Nick Wellnhofer d944a415 2023-12-26T02:10:35 parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.
Nick Wellnhofer 0bef93bf 2023-12-23T04:03:41 io: More refactoring and unescaping fixes Merge Windows wrappers into relevant functions. Remove more unnecessary unescaping. Merge *OpenW into *Open functions. Use unbuffered IO for output.
Nick Wellnhofer a2693410 2023-12-23T00:35:30 io: Move some code from xmlIO.c to parserInternals.c Move everything related to parser contexts to parserInternals.c.
Nick Wellnhofer 9c2c87b5 2023-12-24T15:33:12 dict: Move local RNG state to global state Don't use TLS variables directly.
Nick Wellnhofer c9a46a91 2023-12-20T20:11:09 io: Rework initialization
Nick Wellnhofer 13043691 2023-12-20T00:33:34 parser: Rename xmlErrParser to xmlCtxtErr
Nick Wellnhofer 8d0aaf4b 2023-12-19T20:47:36 parser: Remove xmlErrEncoding Use xmlFatalErr or xmlCtxtErrIO.
Nick Wellnhofer 9fbe46ba 2023-12-19T20:10:10 io: Consolidate error messages
Nick Wellnhofer 23345a1c 2023-12-19T19:52:28 io: Report IO errors through xmlCtxtErrIO This is also a new public API function to be used in external entity loaders.
Nick Wellnhofer 7e511f35 2023-12-19T15:41:37 io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile This allows to report the reason why opening a file failed to the parser context and improve error messages. Now we can also remove the stat call before opening a file.
Nick Wellnhofer 0c7a364f 2023-12-18T21:55:50 error: Remove xmlSimpleError
Nick Wellnhofer 954b8984 2023-12-18T19:39:38 xpath: Improve error handling Introduce xmlXPathSetErrorHandler allowing to set a structured error handler for an XPath context. Remove arguments from memory error handlers. Use xmlRaiseMemoryError. Remove TODO, STRANGE and CHECK_CTXT macros. Remove remaining uses of xmlGenericError.
Nick Wellnhofer 54c70ed5 2023-12-18T19:31:29 parser: Improve error handling Introduce xmlCtxtSetErrorHandler allowing to set a structured error for a parser context. There already was the "serror" SAX handler but this always receives the parser context as argument. Start to use xmlRaiseMemoryError. Remove useless arguments from memory error functions. Rename xmlErrMemory to xmlCtxtErrMemory. Remove a few calls to xmlGenericError. Remove support for runtime entity debugging.
Nick Wellnhofer c5a8aef2 2023-12-18T19:12:08 error: Refactor error reporting Introduce xmlStrVASPrintf, trying to handle buggy snprintf implementations. Introduce xmlSetError to set errors atomically. Introduce xmlUpdateError to set an error, fixing up node, file and line. Introduce helper function xmlRaiseMemoryError. Make legacy error handlers call xmlReportError, avoiding checks in xmlVRaiseError. Remove fragile support for getting file and line info from XInclude nodes.
Nick Wellnhofer f19a9510 2023-12-10T17:50:22 parser: Report malloc failures Fix many places where malloc failures aren't reported. Make xmlErrMemory public. This is useful for custom external entity loaders. Introduce new API function xmlSwitchEncodingName. Change the way how we store whether the the parser is stopped. This used to be signaled by setting ctxt->instate to XML_PARSER_EOF which was misdesigned and error-prone. Set ctxt->disableSAX to 2 instead and introduce a macro PARSER_STOPPED. Also stop to remove parser inputs in xmlHaltParser. This allows to remove many checks of ctxt->instate. Introduce xmlErrParser to handle errors if a parser context is available.
Nick Wellnhofer 1a354d5b 2023-12-10T17:09:45 regexp: Report malloc failures Fix places where malloc failures aren't reported.
Nick Wellnhofer e632d9f0 2023-12-10T16:56:16 xpath: Report malloc failures Fix many places where malloc failures aren't reported. Rework XPath object cache to store free objects in a linked list to avoid allocating an additional array. Remove some unneeded object pools.
Nick Wellnhofer f3455ecd 2023-12-10T15:46:53 error: Report malloc failures Don't ignore malloc failures in xmlRaiseError and xmlCopyError. Don't print filename if context has no input. Introduce xmlVRaiseError taking a va_list.
Nick Wellnhofer aca16fb3 2023-12-10T16:37:43 tree: Report malloc failures Fix many places where malloc failures aren't reported. Make some API function return an error code. Changing the return type from void to int is technically an ABI break but should be safe on most platforms. - xmlNodeSetContent - xmlNodeSetContentLen - xmlNodeAddContent - xmlNodeAddContentLen - xmlNodeSetBase Introduce new API functions that return a separate error code if a memory allocation fails. - xmlNodeGetAttrValue - xmlNodeGetBaseSafe - xmlGetNsListSafe Introduce private functions xmlTreeEnsureXMLDecl and xmlSplitQName4. Don't report low-level errors to the global error handler. Fix tree Introduce xmlGetNsListSafe Fix tree
Nick Wellnhofer 58598494 2023-11-04T23:47:33 parser: Fix combination of hash values This bug resulted in a stuck bit in hash values which can have a severe performance impact.
Nick Wellnhofer c082ef46 2023-08-09T16:59:36 parser: Stop switching to ISO-8859-1 on encoding errors Use U+FFFD Replacement Character if invalid UTF-8 is encountered in recovery mode. Also rewrite xmlNextChar and xmlCurrentChar. Fixes #598.
Nick Wellnhofer eb69c1d3 2023-10-02T12:16:05 parser: Fix initialization of namespace data Move initialization to xmlInitSAXParserCtxt. Also add missing XML_HIDDEN to xmlParserNsFree. Fixes #597.
Nick Wellnhofer e0dd330b 2023-09-29T00:18:44 parser: Use hash tables to avoid quadratic behavior Use a hash table to lookup namespaces by prefix. The hash table stores an index into the namespace table. Auxiliary data for namespaces is stored in a separate array along the main namespace table. Use a hash table to verify attribute uniqueness. The hash table stores an index into the attribute table. Reuse hash value from the dictionary to avoid computing them twice. See #346.