|
46f05ea4
|
2025-05-09T00:21:47
|
|
html: Rework meta charset handling
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.
Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.
Add full support for <meta charset=""> which is also used when adding
meta tags.
Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.
Only switch encoding once when parsing.
Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.
Fixes #909.
|
|
8cf6129b
|
2025-02-13T18:20:46
|
|
html: Stop implying <p> start tags
Only <html>, <head> or <body> should be implied. Opening extra <p> tags
has always been a libxml2 quirk.
|
|
59511792
|
2024-09-03T15:52:44
|
|
html: Parse named character references according to HTML5
|
|
85c112a0
|
2017-06-12T18:26:11
|
|
Add test cases for bug 758518
test/HTML/758518-entity.html exposed a bug in pushParseTest() in
runtest.c which assumed that an input file was at least 4 bytes long.
That test case is only 3 bytes, so we now take the minimum of 4 bytes
or the length of the test input. We also now use 'chunkSize' in place
of the hard-coded value '1024' later in the function.
|