Log

Author Commit Date CI Message
Nick Wellnhofer e4cbc295 2025-05-20T21:57:01 parser: Check attribute normalization standalone constraint To fully implement "VC: Standalone Document Declaration", we have to check for normalization changes caused by non-CDATA attribute types declared externally. Fixes #119.
Nick Wellnhofer 682195c8 2025-05-20T22:00:57 parser: Fix "Proper Declaration/PE Nesting" validity constraint Now that we handle "WFC: PE Between Declarations" correctly, we can turn "Proper Declaration/PE Nesting" from a WFC into VC as specified. Fixes #118.
Nick Wellnhofer 2f3655c9 2025-05-20T19:40:06 parser: Pop PEs that start markup declarations explicitly We currently only handle "Validity constraint: Proper Declaration/PE Nesting", but we must detect "Well-formedness constraint: PE Between Declarations" separately: > The replacement text of a parameter entity reference in a DeclSep must > match the production extSubsetDecl. PEs in DeclSeps are PEs that start with a full markup declaration (or another PE). These are handled in xmParse{Internal|External}Subset. We set a flag on these PEs and don't close them implicitly in xmlSkipBlankCharsPE. This will make unterminated declarations in such PEs cause a parser error. The PEs are closed explicitly in xmParse{Internal|External}Subset, the only location where they are allowed to end.
Nick Wellnhofer dd1961e0 2025-05-20T16:37:18 valid: Skip more validity checks if not validating
Nick Wellnhofer 6c2bd975 2025-05-20T15:51:18 valid: Don't validate unused default attributes See erratum E9 of XML 1.0 Second Edition. See #120.
Nick Wellnhofer 2a60ca06 2025-05-20T16:50:32 valid: Don't check enum values Rely on the parser to pass valid arguments.
Nick Wellnhofer fca0860d 2025-05-19T21:17:39 tree: Deprecate public struct members related to DTDs Let's deprecate these members for now. If these are really used, they can be undeprecated later.
Nick Wellnhofer 74ff6c00 2025-05-20T22:00:29 error: Fix line number in entities Allow line numbers from more domains, see code above.
Nick Wellnhofer 4aa7192f 2025-05-21T16:32:17 tests: Add dtor for xmlElementContent in testapi.c
Nick Wellnhofer fc1cabc8 2025-05-25T14:03:50 valid: Also raise duplicate ID error without validation support Whether an error is raised should not depend on config options.
Dag-Erling Smørgrav 3ab040c2 2025-05-24T01:12:15 Fix unidiomatic use of vsnprintf(). * Don't terminate an already-terminated buffer. * Consistently use 1024-byte buffers. * While here, consistently use ap for a va_list.
Dag-Erling Smørgrav 8ea253b8 2025-05-24T01:00:25 Remove bogus casts. * Casting a string literal to `char *` and then immediately passing or assigning the result to a `const char *` makes no sense. * There is no need to cast `int` to `Py_ssize_t` as they have the same sign and the latter is at least as wide as the former.
Nick Wellnhofer 7c9b5535 2025-05-19T19:10:55 doc: Document unused error domains
Nick Wellnhofer 47aca2c6 2025-05-19T18:43:14 parser: Only check validity contraints when validating
Nick Wellnhofer 3a68d0b7 2025-05-19T18:59:51 SAX2: Handle xml:id errors separately
Nick Wellnhofer 172550d2 2025-05-18T17:45:11 parser: Only validate EnumerationTypes when requested This has quadratic behavior and is only a validity constraint.
Nick Wellnhofer 7008740a 2025-05-18T01:52:38 parser: Consolidate scanning of XML Names Use new productions by default. Fixes #194. Fixes #364. See #707.
Nick Wellnhofer 657254a8 2025-05-18T01:21:43 parser: Factor out xmlIsNameCharNew/Old
Nick Wellnhofer 315bd443 2025-05-17T18:59:52 meson: Switch to cfg_data.set10()
Nick Wellnhofer 4e5945fc 2025-05-17T14:41:28 cmake: Avoid overlinking with non-CMake libxml2-config.cmake Align libxml2-config.cmake generated by Autotools and Meson with the CMake version and only add dependencies to libraries when linking statically. Also set LIBXML_STATIC for static builds. Fixes #918.
Nick Wellnhofer faaa01b8 2025-05-17T12:20:32 cmake: Make iconv a private dependency This was only needed for the headers before 2.14.
Nick Wellnhofer 70e5d664 2025-05-17T01:30:41 doc: Don't document deprecated headers
Nick Wellnhofer 7c82391c 2025-05-17T01:01:03 codegen: Factor out code to generate range tables
Nick Wellnhofer 502c5f65 2025-05-17T00:11:03 meson: Dependency on directory doesn't work
Nick Wellnhofer 210f5a37 2025-05-16T21:18:16 chvalid: Mark functions as deprecated
Nick Wellnhofer 954aae90 2025-05-16T21:13:17 doc: Improve regexp documentation
Nick Wellnhofer cbad60ff 2025-05-16T18:31:16 xmllint: Remove unused macros
Nick Wellnhofer 2132150d 2025-05-16T18:27:00 xmllint: Switch to xmlCtxtGetDocument
Nick Wellnhofer c5b45fbc 2025-05-16T16:54:09 doc: Misc fixes
Nick Wellnhofer c4926b19 2025-05-16T02:12:23 codegen: Merge xmlunicode.c into xmlregexp.c Include generated parts. Generate xmlChRangeGroups instead of functions for Unicode blocks.
Nick Wellnhofer 4cb767e9 2025-05-16T01:52:44 codegen: Only generate tables for character ranges The rest can be easily maintained manually.
Nick Wellnhofer 770c6dec 2025-05-16T01:19:19 buf: Remove ABI compatibility hack I think this was required when some struct members like xmlParserInputBuffer::buffer were changed from xmlBuffer to xmlBuf (20+ years ago). Unfortunately, I missed the opportunity to align xmlBuffer with xmlBuf before the ABI break.
Nick Wellnhofer 344190db 2025-05-16T00:54:51 doc: Document deprecated xmlThrDef* functions
Nick Wellnhofer 6f4b4527 2025-05-15T23:43:32 parser: Stop using ctxt->linenumbers I think this was used to avoid setting the `line` member before it was added (20+ years ago).
Nick Wellnhofer 5ce48ec1 2025-05-15T22:51:54 SAX2: Rework xmlSAX2Text Simplify and make more readable.
Nick Wellnhofer d834437b 2025-05-15T19:12:25 python: Add deprecation warning
Nick Wellnhofer a05fa9a9 2025-05-15T18:41:35 codegen: Rerun codegen scripts
Nick Wellnhofer 258d8706 2025-05-15T17:49:49 codegen: Consolidate tools for code generation Move tools, source files and output tables into codegen directory. Rename some files. Adjust tools to match modified files. Remove generation date and source files from output. Distribute all tools and sources.
Nick Wellnhofer 0d34d690 2025-05-15T17:11:33 README: Update configuration options Python is disabled by default now. Mention --prefix.
Nick Wellnhofer adfbeb7e 2025-05-14T04:58:21 doc: Stop using *Ptr typedefs in documentation
Nick Wellnhofer a40f36e7 2025-05-14T04:04:28 include: Stop using *Ptr typedefs in public headers
Nick Wellnhofer 0da20b83 2025-05-14T04:20:07 autotools: Quote filenames in doc/Makefile.am
Nick Wellnhofer 2d83a84c 2025-05-14T00:29:19 doc: Misc improvements
Nick Wellnhofer 87087def 2025-05-13T16:19:42 tests: Remove result files committed by accident
Nick Wellnhofer d6151c23 2025-05-13T13:28:28 libxml2.doap: Remove inactive maintainer
Nick Wellnhofer af4fae5a 2025-05-13T12:05:15 html: Add some comments regarding HTML5 serialization It seems that the specification of the HTML output method in XSLT 1.0 had a lot of influence on how the HTML serializer in libxml2 ended up: https://www.w3.org/TR/xslt-10/#section-HTML-Output-Method There are two remaining behaviors suggested by XSLT 1.0 that don't match the HTML5 fragment serialization algorithm: We escape non-ASCII characters in URI attributes (the list of which is probably outdated). This was originally recommended in appendix B of the HTML 4.01 spec, but only for user agents: https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.1 From my experience, any tool that processes HTML should escape as little as possible. For example, we used to escape many more characters which are invalid in URIs, but often used in template languages. (Note that we still escape whitespace and control chars.) Nevertheless, I guess that some libxslt users continue to expect this behavior from libxml2. Then we collapse Boolean attributes using an outdated list. This is mostly a cosmetic issue, but a somewhat important one for libxslt users. We probably need a serialization option for the xmlsave module that enables fully HTML5-conformant output.
Nick Wellnhofer b0234633 2025-05-13T20:19:39 encoding: Preserve original encoding label When using built-in encodings, the label would be normalized which causes various issues. We now create a copy of the handler with the original name. This is somewhat dangerous as it will require users to free built-in encodings with xmlCharEncCloseFunc. But to handle the general case, this was already required. Fixes #916 in another way than originally proposed.
Nick Wellnhofer fcb7a777 2025-05-13T22:38:15 io: Make xmlOutputBufferCreate* not free encoder on error Revert a530ff12 which was an inadvertent API change.
Nick Wellnhofer 5b71dca6 2025-05-12T21:39:54 Fix -Wunterminated-string-initialization warnings Don't use strings for table.
Nick Wellnhofer cdce17c3 2025-05-12T21:21:25 html: Only map HTML encodings from meta tag
Nick Wellnhofer 19b99311 2025-05-12T21:07:41 encoding: Fix -Wswitch warning
Nick Wellnhofer 39ae5d12 2025-05-12T21:04:41 save: Add NULL check in xmlBufDumpEntityContent Short-lived regression.
Nick Wellnhofer c2929b5d 2025-05-12T21:01:35 html: Ignore namespaces when handling meta tags Revert to old behavior to fix issues with XHTML documents.
Nick Wellnhofer 4df8d557 2025-05-12T17:31:14 io: Fix stack use after scope Short-lived regression.
Nick Wellnhofer f0983199 2025-05-12T13:00:20 html: Map some encodings according to HTML5 Windows-1252 is a superset of ISO-8859-1 and should be used instead. Same for ASCII. Also map UCS-2 and UTF-16 to UTF-16LE.
Nick Wellnhofer 93f67106 2025-05-12T12:27:54 encoding: Add HTML5 aliases
Nick Wellnhofer 628006f4 2025-05-12T11:47:40 encoding: Add windows-1252 Fixes #915.
Nick Wellnhofer a7016bae 2025-05-12T02:40:36 tools: Remove unnecessary data from iso8859x.inc
Nick Wellnhofer c92374f1 2025-05-12T02:15:11 tools: Recreate script to generate iso8859x.inc The script to create these tables was never committed to version control.
Nick Wellnhofer f602c0c1 2025-05-12T00:04:22 html: Rework serialization of meta encoding attributes Don't allocate memory.
Nick Wellnhofer 7654c2ef 2025-05-11T23:37:38 html: Rework serialization of URIs Don't allocate memory.
Nick Wellnhofer bd777e4f 2025-05-11T22:18:31 html: Speed up htmlIsBooleanAttr This is used when serializing.
Nick Wellnhofer 825f3a9d 2025-05-11T21:38:16 html: Always serialize attributes with double quotes Align with HTML5.
Nick Wellnhofer 5c4cc456 2025-05-11T21:19:22 html: Escape encoding in meta tags
Nick Wellnhofer 0674ccb7 2025-05-11T20:55:57 html: Stop omitting end tags when serializing Align with HTML5.
Nick Wellnhofer 05b8fe0a 2025-04-12T23:10:40 html: Don't escape RAWTEXT and PLAINTEXT Align with HTML5.
Nick Wellnhofer 809ded58 2025-04-12T22:50:56 html: Add more empty elements Add empty HTML5 elements <bgsound>, <keygen>, <source>, <track> and <wbr>. Make <embed> an empty element.
Nick Wellnhofer 5f8ebc88 2025-05-10T00:56:18 save: Avoid xmlOutputBufferWriteQuotedString xmlOutputBufferWriteQuotedString should be reserved for things like system IDs.
Nick Wellnhofer 0d81d6f8 2025-05-10T00:52:22 html: Use xmlOutputBufferWrite if possible
Nick Wellnhofer 89fcfe3a 2025-05-10T00:14:05 html: Start to use xmlSerializeText Avoid temporary copy to speed up serialization.
Nick Wellnhofer 777e2adf 2025-05-09T23:53:03 io: Consolidate escaping code Use generated table approach of xmlSerializeText for xmlEscapeText. Move most code to xmlIO.c.
Nick Wellnhofer cdaf657f 2025-05-09T23:02:32 html: Don't escape < and > when serializing attribute values Align with HTML5. This will break some test suites.
Nick Wellnhofer e0e0a1f0 2025-05-09T22:44:54 html: Remove special handling of &{...} when serializing See https://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1 Align with HTML5.
Nick Wellnhofer dad11630 2025-05-09T22:05:38 entities: Always replace invalid chars when escaping The previous refactor painstakingly recreated the different behavior of separate functions that were merged. It makes Optimize IS_CHAR check for non-ASCII chars.
Nick Wellnhofer c8cea39d 2025-05-09T21:31:07 save: Fix serialization of attribute defaults containing &lt; Long-standing bug that produced invalid XML.
Nick Wellnhofer 971038e5 2025-05-09T20:26:33 html: Call lower-level escaping functions Removes the need to pass a document around.
Nick Wellnhofer 63535d39 2025-05-09T20:13:43 tree: Make xmlNodeListGetStringInternal work with escape flags
Nick Wellnhofer 442c1903 2025-05-09T18:52:36 doc: Fix some damage from automated conversions Add some newlines, fix returns.
Nick Wellnhofer 98a61c9d 2025-05-09T16:48:09 doc: Fix briefs in tree docs
Nick Wellnhofer 4b4bc15a 2025-05-09T16:24:35 doc: Misc fixes to buffer docs
Nick Wellnhofer ad390a5d 2025-05-09T15:34:53 parser: Set doc properties in endDocument SAX handler
Nick Wellnhofer c7c49643 2025-05-09T15:26:15 html: Move DTD creation to endDocument SAX callback
Nick Wellnhofer 46f05ea4 2025-05-09T00:21:47 html: Rework meta charset handling Don't use encoding from meta tags when serializing. Only use the value in `doc->encoding`, matching the XML serializer. This is the actual encoding used when parsing. Stop modifying the input document by setting meta tags before serializing. Meta tags are now injected during serialization. Add full support for <meta charset=""> which is also used when adding meta tags. Align with HTML5 and implement the "algorithm for extracting a character encoding from a meta element". Only modify the encoding substring in Content-Type meta tags. Only switch encoding once when parsing. Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading UTF-8 charset. Fixes #909.
Nick Wellnhofer 9aaa52fe 2025-05-08T22:49:20 tree: Make xmlNodeAddContent work with attributes
Nick Wellnhofer 655ac5f8 2025-05-07T16:35:09 html: Add comment regarding hack for XML documents
Nick Wellnhofer f3a080bc 2025-05-07T14:32:42 html: Ignore U+0000 in body text Align with HTML5. Fixes #908.
Nick Wellnhofer a1e83b24 2025-05-07T20:16:17 io: Fix negation of potentially unsigned value
Nick Wellnhofer b3854fe9 2025-05-07T20:20:31 reader: Fix null deref on malloc failure Short-lived regression from 177067ea.
Nick Wellnhofer 6684eb93 2025-05-07T20:13:59 fuzz: Fix out-of-tree build
Nick Wellnhofer 6bd380ce 2025-05-07T14:32:26 fuzz: Update README
Nick Wellnhofer 967df734 2025-05-07T13:03:11 malloc-fail: Handle malloc failure in xmlSchemaCopyValue Avoid null pointer dereference. Fixes #905.
Pavel Kopylov 4ed71574 2025-05-09T11:58:01 python: fix use-after-free in functions xmlPythonFileReadRaw(), xmlPythonFileRead() with python2. Fixes #910.
Nick Wellnhofer 38ea8fa9 2025-05-06T18:31:45 doc: Fix varargs
Nick Wellnhofer 9bbffec5 2025-05-06T17:42:46 doc: Move brief to top, params to bottom of doc comments
Nick Wellnhofer 7bc7ae9d 2025-05-06T15:30:46 doc: Enable Doxygen autobrief
Nick Wellnhofer ab13fbfd 2025-05-06T14:06:43 doc: Misc fixes to error docs
Nick Wellnhofer b1685459 2025-05-06T12:50:52 doc: Misc fixes to xmlsave docs
Nick Wellnhofer 7d689fab 2025-05-06T10:54:46 doc: Fix doc installation with Autotools
Nick Wellnhofer 7b59e74c 2025-05-06T10:54:18 doc: Always use case sensitive filenames with Doxygen Avoid platform-specific behavior.
Nick Wellnhofer 298f70b3 2025-05-05T21:36:36 doc: Misc fixes to HTML tree docs