testparser.c


Log

Author Commit Date CI Message
Nick Wellnhofer d1c3391e 2025-07-23T01:05:46 tests: Silence testparser Regressed with bd9d5e39.
Nick Wellnhofer 8689523a 2025-07-22T23:57:03 parser: Implement xmlCtxtGetInputWindow See #762.
Nick Wellnhofer bd9d5e39 2025-07-09T13:10:31 parser: Fix handling of invalid char refs in recovery mode Revert to the old behavior which handles invalid char refs more gracefully. Probably regressed with 37c6618b (version 2.13.0).
Nick Wellnhofer c34742f3 2025-06-30T16:23:03 tests: Fix build --without-output
Nick Wellnhofer 1b737cc8 2025-06-27T19:52:54 parser: Another fix to ]]> detection in push parser The original fix for issue #850 in commit 9efe1414 was incomplete.
Nick Wellnhofer 825f3a9d 2025-05-11T21:38:16 html: Always serialize attributes with double quotes Align with HTML5.
Nick Wellnhofer 46f05ea4 2025-05-09T00:21:47 html: Rework meta charset handling Don't use encoding from meta tags when serializing. Only use the value in `doc->encoding`, matching the XML serializer. This is the actual encoding used when parsing. Stop modifying the input document by setting meta tags before serializing. Meta tags are now injected during serialization. Add full support for <meta charset=""> which is also used when adding meta tags. Align with HTML5 and implement the "algorithm for extracting a character encoding from a meta element". Only modify the encoding substring in Content-Type meta tags. Only switch encoding once when parsing. Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading UTF-8 charset. Fixes #909.
Nick Wellnhofer 72906f16 2025-04-25T11:41:50 parser: Make undeclared entities in XML content fatal When parsing XML content with functions like xmlParseBalancedChunk or xmlParseInNodeContext, make undeclared entities always a fatal error to match 2.13 behavior. This was deliberately changed in 4f329dc5, probably to make the tests pass. Should fix #895.
Nick Wellnhofer 8a791fdd 2025-04-21T17:31:29 save: Fix xmlDocDump with encoding Short-lived regression.
Nick Wellnhofer 936e3d52 2025-04-20T19:25:04 save: Fix xmlSave with NULL encoding Regressed with cc45f618.
Nick Wellnhofer a5c4a6ef 2025-03-28T16:31:14 parser: Fix XML_PARSE_NOBLANKS dropping non-whitespace text Regressed with 1f5b5371. Fixes #884.
Nick Wellnhofer b3492259 2025-03-14T00:01:11 include: Change some return types from int to enum This also affects some new functions from 2.13.
Nick Wellnhofer 84c6524e 2025-03-13T19:45:35 encoding: Support input-only and output-only converters Make it possible to open an encoding handler only for input or output. This avoids the creation of unnecessary converters. Should also fix #863.
Nick Wellnhofer 69b83bb6 2025-03-10T02:18:51 encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.
Nick Wellnhofer 87c9e000 2025-03-09T22:20:23 encoding: Rework custom encoding implementation API
Nick Wellnhofer 8cf6129b 2025-02-13T18:20:46 html: Stop implying <p> start tags Only <html>, <head> or <body> should be implied. Opening extra <p> tags has always been a libxml2 quirk.
Nick Wellnhofer 79ab721c 2025-02-11T11:39:08 tests: Fix error return in testHugeEncodedChunk Fixes #859.
Nick Wellnhofer 9efe1414 2025-01-31T13:07:35 parser: Fix detection of ']]>' when push-parsing Fixes #850.
Nick Wellnhofer 3eced32e 2025-01-29T23:49:56 parser: Fix push parser with encoding and single chunk When push-parsing with an encoding handler, we must convert the whole buffer in the initial conversion. Otherwise, parsing a single chunk larger than ~4KB would fail. Regressed with commit 34c9108f.
Nick Wellnhofer a8d8a70c 2025-01-27T13:31:08 uri: Fix handling of Windows drive letters Allow drive letters in URI paths. Technically, these should be treated as URI schemes, but this is not what users expect. This also makes sure that paths with drive letters are resolved as filesystem paths and unescaped, for example when used in libxslt's document() function. Should fix #832.
Nick Wellnhofer be579a26 2025-01-15T12:52:53 reader: Fix return value of xmlTextReaderReadString again Make sure to return NULL for node types except elements or text to match the old behavior. Note that CDATA sections are still treated like text nodes and will have their content returned. Fixes #838.
Nick Wellnhofer efb57ddb 2024-10-30T14:02:36 parser: Fix downstream code that swaps DTDs Downstream code like the nginx xslt module can change the document's DTD pointers in a SAX callback. If an entity from a separate DTD is parsed lazily, its content must not reference the current document. Regressed with commit d025cfbb. Fixes #815.
Nick Wellnhofer 8af55c8d 2024-07-06T22:14:21 parser: Rename new input API functions These weren't made public yet.
Nick Wellnhofer 4f329dc5 2024-07-10T03:27:47 parser: Implement xmlCtxtParseContent This implements xmlCtxtParseContent, a better alternative to xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a parser context and a parser input, making it a lot more versatile. xmlParseInNodeContext is now implemented in terms of xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never modifies the target document, improving thread safety. xmlParseInNodeContext is also more lenient now with regard to undeclared entities. Fixes #727.
Nick Wellnhofer da686399 2024-07-09T12:29:53 io: Fix return value of xmlFileRead This broke in commit 6d27c54. Fixes #766.
Nick Wellnhofer 944cc23c 2024-07-03T15:54:32 tree: Fix handling of empty strings in xmlNodeParseContent We shouldn't create an empty text node to match the old behavior. Fixes #759.
Nick Wellnhofer f9065261 2024-07-02T23:43:28 SAX2: Fix HTML IDs Short-lived regression. Fixes #755.
Nick Wellnhofer 282ec1d5 2024-06-28T19:06:57 encoding: Rework xmlCharEncodingHandler layout Reuse some of the old members. The "input" and "output" function pointers are actually of type xmlCharEncConvFunc, accepting an additional argument. For default handlers, this argument is unused, so this should work with most ABIs. For iconv handlers, these function pointers used to be NULL but now point to a function which requires the extra argument. "iconv_in" and "iconv_out" are made void pointers. "uconv_in" and "uconv_out" are renamed and made void pointers. This is unlikely to cause issues. We now expect that the built-in conversion functions correctly report XML_ENC_ERR_SPACE. For UTF8ToHtml and the ISO-8859-X code, this will be done in the following commits.
Nick Wellnhofer 221df375 2024-06-28T00:34:52 parser: Support custom charset conversion implementations Implement xmlCtxtSetCharEncConvImpl. I agree that the name is terrible.
Nick Wellnhofer b1a416bf 2024-06-27T12:00:45 encoding: Restore old lookup order in xmlOpenCharEncodingHandler When looking up encodings with xmlLookupCharEncodingHandler, the returned handler can have a different name than requested (capitalization, internal aliases). This should eventually be fixed. For now we revert part of commit 5b893fa9, start the lookup with xmlFindHandler and add an explicit check for UTF-8. Should fix the encoding name issue mentioned in #749.
Nick Wellnhofer 54c6c7e4 2024-06-23T21:51:52 uri: Only set file scheme for special Windows paths Fixes 2ce70cde. Also fix a test case.
Nick Wellnhofer 2ce70cde 2024-06-23T16:24:46 uri: Handle filesystem paths in xmlBuildRelativeURISafe This mainly fixes issues on Windows but should also fix a few general corner cases. Should fix #745.
Nick Wellnhofer 208f27f9 2024-06-15T19:13:08 include: Don't define ATTRIBUTE_UNUSED in public header Stop polluting namespace with unprefixed names.
Nick Wellnhofer b8597f46 2024-04-30T15:58:01 tree: Handle predefined entities in xmlBufGetEntityRefContent It's possible to create references to predefined entities using the tree API. This edge case was exposed by making predefined entities const in commit 63ce5f9a.
Nick Wellnhofer 5aa56e73 2024-04-18T14:21:19 reader: Add tests for content accessors
Nick Wellnhofer 047ea3ec 2024-03-17T16:23:31 Revert "tree: Allocate XML namespace statically" This reverts commit 2840e33c5e4b51589a0b96e8102638eeaea6df72.
Nick Wellnhofer 2840e33c 2024-03-04T07:34:25 tree: Allocate XML namespace statically
Nick Wellnhofer 84a71860 2024-02-26T15:14:28 xmlreader: Fix xmlTextReaderConstEncoding Regression from commit f1c1f5c6. Fixes #697.
Nick Wellnhofer b55ee729 2024-02-26T13:22:08 html: Regression test for #696 This was already fixed in the master branch, so we only add a test.
Nick Wellnhofer df618f08 2024-01-15T17:15:02 tests: Add test for issue #661
Nick Wellnhofer d2b55a7a 2024-01-05T20:31:10 writer: Implement xmlTextWriterClose This function can be used to make sure that closing the output stream succeeded. Fixes #513.
Nick Wellnhofer 16b0dbc1 2023-12-29T18:47:30 parser: Fix XML_ERR_UNSUPPORTED_ENCODING errors Commit 45157261 added the check in the wrong place. Also allow unsupported encoding in xmlNewInputInternal. Fixes #654.
Nick Wellnhofer ecfbcc8a 2023-12-25T04:33:00 parser: Rework general entity parsing Don't create a new parser context but reuse the existing one. This exposes bug #601 in a more obvious way.
Nick Wellnhofer 6e3a2ac6 2023-12-22T21:38:50 xinclude: Rework xml:base fixup The xml:base fixup was broken in more complex cases. Also avoid parsing and building the included URI multiple times.
Nick Wellnhofer ed6596a4 2023-12-18T19:47:47 reader: Simplify error handling Only use structured error handlers for parser, Schemas and RelaxNG contexts. Also use structured error handler for XInclude context. Remove TODO macro.
Nick Wellnhofer 89d19534 2023-10-28T03:04:59 encoding: Fix decoding of large chunks After 95e81a36, we must support XML_ENC_ERR_SPACE when using built-in encoding handlers. Should fix #610.
Nick Wellnhofer a9ada183 2023-10-22T13:56:55 tests: Start with testparser.c for extra tests Several issues require customized tests. Start with a test that push parses large documents. See #539.