|
b3492259
|
2025-03-14T00:01:11
|
|
include: Change some return types from int to enum
This also affects some new functions from 2.13.
|
|
84c6524e
|
2025-03-13T19:45:35
|
|
encoding: Support input-only and output-only converters
Make it possible to open an encoding handler only for input or output.
This avoids the creation of unnecessary converters.
Should also fix #863.
|
|
69b83bb6
|
2025-03-10T02:18:51
|
|
encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
|
|
ef44c240
|
2025-03-10T14:15:35
|
|
encoding: Fix memory leak in xmlCharEncNewCustomHandler
Short-lived regression.
|
|
87c9e000
|
2025-03-09T22:20:23
|
|
encoding: Rework custom encoding implementation API
|
|
38f47507
|
2025-03-05T21:06:05
|
|
encoding: Make conversion callbacks more type-safe
|
|
a846d964
|
2025-03-05T16:49:42
|
|
encoding: Remove compatibility struct members
|
|
0b27097a
|
2025-03-04T12:55:25
|
|
encoding: Rename unprefixed public functions
|
|
3793eaad
|
2025-02-16T13:54:56
|
|
fuzz: Fix build
|
|
9c16a153
|
2025-02-13T18:41:33
|
|
Revert "include: Make most IS_* macros private"
This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
|
|
cfc854b8
|
2025-02-11T00:21:12
|
|
fuzz: Work around glibc iconv() bug
|
|
c4f760be
|
2025-02-01T15:29:56
|
|
encoding: Handle iconv() returning EOPNOTSUPP on Apple
iconv() really shouldn't return undocumented error codes.
|
|
cdfb54ff
|
2025-01-31T18:38:40
|
|
Fix typos
|
|
6ec616ba
|
2025-01-24T18:26:55
|
|
encoding: Don't allow POSIX indicator suffixes in encoding names
Suffixes like "//IGNORE" change the behavior of iconv.
Also add comment on how we currently rely on GNU libiconv behavior
which technically violates the POSIX spec.
|
|
fbaacfe2
|
2025-01-16T15:57:35
|
|
encoding: Clean up UCS-4 encodings
Use "UCS-*" instead of "ISO-10646-UCS-*". While the XML spec recommends
"ISO-10646-UCS-2" and "ISO-10646-UCS-4", GNU iconv doesn't understand
these names.
Ignore UCS4_2143 and UCS4_3412 which were never supported.
|
|
df0f16fa
|
2024-12-15T21:34:59
|
|
encoding: Check reallocations for overflow
|
|
dae160c6
|
2024-09-13T12:08:20
|
|
encoding: Fix table entry for "UTF16"
|
|
6e503eb7
|
2024-09-10T03:32:37
|
|
encoding: Handle more ICU error codes
U_ILLEGAL_ESCAPE_SEQUENCE and U_UNSUPPORTED_ESCAPE_SEQUENCE can occur
with ISO-2022.
|
|
55d36c59
|
2024-09-10T03:11:18
|
|
encoding: Fix error code in xmlUconvConvert
Broke in 46ec621e.
|
|
34c9108f
|
2024-07-07T18:38:31
|
|
encoding: Add sizeOut argument to xmlCharEncInput
When push parsing, we want to convert as much of the input as possible.
When pull parsing memory buffers, we want to convert data chunk by chunk
to save memory.
|
|
1cfc5b80
|
2024-07-12T03:07:57
|
|
entities: Rework serialization of numeric character references
|
|
69f12d6d
|
2024-07-13T00:17:18
|
|
encoding: Deprecate xmlByteConsumed
This was only used by Chromium/WebKit to detect whether xmlParseContent
really succeeded. It's a horrible, overcomplicated hack.
See 8c5848bd and #767.
|
|
d0997956
|
2024-07-10T22:26:19
|
|
encoding: Readd some UTF-8 validation to encoders
This isn't strictly needed but avoids generating invalid UTF-16 and
unsigned integer overflows.
|
|
f48eefe3
|
2024-07-09T14:09:15
|
|
encoding: Rework xmlByteConsumed
Don't loop infinitely if input buffer is too large. Allocate conversion
buffer on the heap.
|
|
f86d17c1
|
2024-07-04T15:14:54
|
|
encoding: Fix xmlParseCharEncoding
Make "UTF-16" return the UTF16LE handler as before.
Fix error return.
|
|
46ec621e
|
2024-07-03T15:48:01
|
|
encoding: Clarify xmlUconvConvert
|
|
48fec242
|
2024-07-03T15:11:20
|
|
encoding: Remove duplicate code
Fix recent commit.
|
|
71fb2579
|
2024-07-03T14:35:49
|
|
encoding: Fix ICU build
|
|
9a4770ef
|
2024-07-02T02:18:03
|
|
doc: Improve documentation
|
|
1e3da9f4
|
2024-06-27T21:37:18
|
|
encoding: Start with callbacks
|
|
6d8427dc
|
2024-06-27T20:39:52
|
|
encoding: Rework encoding lookup
Add missing xmlCharEncoding enum values.
Simplify and speed up encoding lookup by using a table mapping names to
xmlCharEncoding enums and binary search. Rearrange the default handler
table to match the enum layout.
For some encodings we now only lookup the provided or most canonical
name instead of trying several names, expecting that iconv or ICU handle
aliases:
- IBM037 (EBCDIC)
- UCS-2
- UCS-4
- Shift_JIS
|
|
0b0dd989
|
2024-06-28T23:13:38
|
|
parser: Fix EBCDIC detection
|
|
37a9ff11
|
2024-06-28T22:42:46
|
|
encoding: Simplify xmlCharEncCloseFunc
|
|
1167c334
|
2024-06-28T21:51:21
|
|
encoding: Don't include iconv.h from libxml/encoding.h
|
|
30be984a
|
2024-06-28T20:37:47
|
|
encoding: Rework ISO-8859-X conversion
Optimize code. Pass tables as context parameter. Check for
XML_ENC_ERR_SPACE.
|
|
282ec1d5
|
2024-06-28T19:06:57
|
|
encoding: Rework xmlCharEncodingHandler layout
Reuse some of the old members.
The "input" and "output" function pointers are actually of type
xmlCharEncConvFunc, accepting an additional argument. For default
handlers, this argument is unused, so this should work with most ABIs.
For iconv handlers, these function pointers used to be NULL but now
point to a function which requires the extra argument.
"iconv_in" and "iconv_out" are made void pointers. "uconv_in" and
"uconv_out" are renamed and made void pointers. This is unlikely to
cause issues.
We now expect that the built-in conversion functions correctly report
XML_ENC_ERR_SPACE. For UTF8ToHtml and the ISO-8859-X code, this will be
done in the following commits.
|
|
57e37dff
|
2017-06-17T21:43:48
|
|
encoding: Rework UTF-16 conversion functions
Optimize UTF-16 conversion functions. Avoid misaligned memory access.
Don't rely on 'sizeof(short) == 2'. Check for XML_ENC_ERR_SPACE. Add
some tests for UTF-16 conversion.
|
|
bb8e81c7
|
2024-06-28T04:36:14
|
|
encoding: Rework simple conversions function
Use a single function for ASCII conversion. Optimize code. Check for
XML_ENC_ERR_SPACE.
|
|
501e5d19
|
2024-06-28T04:10:03
|
|
encoding: Stop using XML_ENC_ERR_PARTIAL
|
|
c59c2449
|
2024-06-27T23:32:58
|
|
encoding: Support custom implementations
|
|
f4e63f7a
|
2024-06-27T15:15:06
|
|
Regenerate libxml2-api.xml and testapi.c
|
|
b1a416bf
|
2024-06-27T12:00:45
|
|
encoding: Restore old lookup order in xmlOpenCharEncodingHandler
When looking up encodings with xmlLookupCharEncodingHandler, the
returned handler can have a different name than requested
(capitalization, internal aliases). This should eventually be fixed.
For now we revert part of commit 5b893fa9, start the lookup with
xmlFindHandler and add an explicit check for UTF-8.
Should fix the encoding name issue mentioned in #749.
|
|
c4d8343b
|
2024-06-24T19:41:32
|
|
encoding: Make xmlFindCharEncodingHandler return UTF-8 handler
xmlFindCharEncodingHandler must always return a handler.
Remove UTF-8 handler from default handler list.
Fixes 5b893fa9.
|
|
5b893fa9
|
2024-06-22T19:15:17
|
|
encoding: Fix encoding lookup with xmlOpenCharEncodingHandler
Make xmlOpenCharEncodingHandler call xmlParseCharEncoding first so we
prefer our own handlers for names like "UTF8". Only UTF-16 needs an
exception.
Make callers check the return value. For UTF-8, a NULL encoding doesn't
mean an error.
Remove unnecessary UTF-8 check from htmlFindOutputEncoder. Don't try to
look up ASCII handler since the HTML handler is always available.
Fix return code of xmlParseCharEncoding.
Should fix #744.
|
|
2def7b4b
|
2024-06-18T13:55:34
|
|
clang-tidy: move assignments out of if
Found with bugprone-assignment-in-if-condition
Signed-off-by: Rosen Penev <rosenp@gmail.com>
|
|
63ce5f9a
|
2024-04-28T17:32:35
|
|
Make some globals const
|
|
072facc4
|
2024-03-18T14:17:57
|
|
encoding: Don't shrink input too early in xmlCharEncOutput
Some exotic encodings like ISO646-FR don't support '#' characters, so
encoding a character reference can actually fail. Don't skip the
offending input in this case so the error will be reported on the next
call.
|
|
0821efc8
|
2024-01-02T18:33:57
|
|
encoding: Check whether encoding handlers support input/output
The "HTML" encoding handler doesn't support input which could lead to a
wrong error report.
|
|
023aecc4
|
2023-12-13T23:45:53
|
|
encoding: Support ASCII in xmlLookupCharEncodingHandler
Return our built-in ASCII handler. This was never implemented and
triggered the new and stricter error checks.
|
|
bd5ad030
|
2023-12-10T14:56:21
|
|
encoding: Report malloc failures
Introduce new API functions that return a separate error code if a
memory allocation fails.
- xmlOpenCharEncodingHandler
- xmlLookupCharEncodingHandler
Fix a few places where malloc failures weren't reported.
|
|
89d19534
|
2023-10-28T03:04:59
|
|
encoding: Fix decoding of large chunks
After 95e81a36, we must support XML_ENC_ERR_SPACE when using built-in
encoding handlers.
Should fix #610.
|
|
1734d27d
|
2023-10-02T15:04:18
|
|
encoding: Suppress -Wcast-align warnings
|
|
0533daf5
|
2023-09-29T02:45:20
|
|
encoding: Fix infinite loop in xmlCharEncInput
Short-lived regression from 95e81a36.
|
|
8c084ebd
|
2023-09-21T22:57:33
|
|
doc: Make apibuild.py happy
|
|
699299ca
|
2023-09-20T18:54:39
|
|
globals: Stop including globals.h
|
|
7909ff08
|
2023-09-20T17:38:26
|
|
include: Remove unnecessary includes
- Don't include tree.h from encoding.h
- Don't include parser.h from xmlIO.h
|
|
507f11ed
|
2023-08-16T15:43:47
|
|
encoding: Remove debugging code
|
|
95e81a36
|
2023-08-08T15:21:31
|
|
parser: Decode all data in xmlCharEncInput
Even with flush set to true, xmlCharEncInput didn't guarantee to decode
all data. This complicated the push parser.
Remove the flush flag and always decode all available data.
Also fix ICU code where the flush flag has a different meaning. Always
set flush to false and retry even with empty input buffers.
|
|
4ee08155
|
2023-08-08T15:19:51
|
|
encoding: Move rawconsumed accounting to xmlCharEncInput
|
|
b236b7a5
|
2023-06-08T21:53:05
|
|
parser: Halt parser when growing buffer results in OOM
Fix short-lived regression from previous commit.
It might be safer to make xmlBufSetInputBaseCur use the original buffer
even in case of errors.
Found by OSS-Fuzz.
|
|
db21cd5d
|
2023-06-06T14:25:30
|
|
malloc-fail: Handle malloc failures in xmlAddEncodingAlias
Avoid memory errors if an allocation fails.
See #344. Fixes #553.
|
|
2f12e3a9
|
2023-04-30T18:46:05
|
|
encoding: Stop calling xmlEncodingErr
This invokes the global error handler which should be avoided.
|
|
320f5084
|
2023-04-30T18:25:09
|
|
parser: Improve handling of encoding and IO errors
Make sure that xmlCharEncInput, xmlParserInputBufferPush and
xmlParserInputBufferGrow set the correct error code in the
xmlParserInputBuffer. Handle errors when calling these functions.
|
|
3ff6abbf
|
2023-02-22T17:11:20
|
|
encoding: Rework error codes
Use an enum instead of magic numbers. Fix a few error codes. Simplify
handling of "space" and "partial" errors.
See #506.
|
|
33fb297b
|
2023-04-15T16:53:00
|
|
encoding: Fix compiler warning in ICU build
|
|
a6b9e55a
|
2023-03-26T15:42:02
|
|
encoding: Fix error code in asciiToUTF8
Use correct error code when invalid ASCII bytes are encountered.
Found by OSS-Fuzz.
|
|
98840d40
|
2023-03-21T19:07:12
|
|
parser: Rework EBCDIC code page detection
To detect EBCDIC code pages, we used to switch the encoding twice and
had to be very careful not to decode data after the XML declaration
before the second switch. This relied on a hard-coded expected size of
the XML declaration and was complicated and unreliable.
Now we convert the first 200 bytes to EBCDIC-US and parse the encoding
declaration manually.
|
|
1c5e1fc1
|
2023-02-14T13:56:21
|
|
malloc-fail: Check for malloc failure in xmlFindCharEncodingHandler
Don't return encoding handlers with a NULL name.
Found with libFuzzer, see #344.
|
|
d18f9c11
|
2023-02-14T13:50:46
|
|
malloc-fail: Fix leak of xmlCharEncodingHandler
Also free handler if its name is NULL.
Found with libFuzzer, see #344.
|
|
3cc900f0
|
2023-02-16T11:50:52
|
|
encoding: Cast toupper argument to unsigned char
Fixes undefined behavior.
Also cast return value explicitly to fix implicit-integer-sign-change
checks.
|
|
2355eac5
|
2023-01-22T14:52:06
|
|
malloc-fail: Fix null deref if growing input buffer fails
Also add some error checks.
Found with libFuzzer, see #344.
|
|
0f54af74
|
2022-12-08T18:36:45
|
|
encoding.c: Fix for documentation generator
Top-level macro invocations throw off the documentation parser.
|
|
53ab3840
|
2022-11-25T14:26:59
|
|
encoding: Make init function private
|
|
3e9d5e4f
|
2022-11-25T14:19:36
|
|
encoding: Remove unused variable xmlDefaultCharEncodingHandler
|
|
1406b20f
|
2022-11-24T19:14:33
|
|
encoding: Allocate default handlers statically
|
|
2059df53
|
2022-11-14T22:27:58
|
|
buf: Deprecate static/immutable buffers
|
|
ad338ca7
|
2022-09-01T01:18:30
|
|
Remove explicit integer casts
Remove explicit integer casts as final operation
- in assignments
- when passing arguments
- when returning values
Remove casts
- to the same type
- from certain range-bound values
The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.
Document some explicit casts as truncating and add a few missing ones.
|
|
0f568c0b
|
2022-08-26T01:22:33
|
|
Consolidate private header files
Private functions were previously declared
- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.
Consolidate all private header files in include/private.
|
|
c14cac8b
|
2022-05-25T18:13:07
|
|
xmlBufAvail() should return length without including a byte for NUL terminator
* buf.c:
(xmlBufAvail):
- Return the number of bytes available in the buffer, but do not
include a byte for the NUL terminator so that it is reserved.
* encoding.c:
(xmlCharEncFirstLineInput):
(xmlCharEncInput):
(xmlCharEncOutput):
* xmlIO.c:
(xmlOutputBufferWriteEscape):
- Remove code that subtracts 1 from the return value of
xmlBufAvail(). It was implemented inconsistently anyway.
|
|
21561e83
|
2016-05-20T15:21:43
|
|
Mark more static data as `const`
Similar to 8f5710379, mark more static data structures with
`const` keyword.
Also fix placement of `const` in encoding.c.
Original patch by Sarah Wilkin.
|
|
40483d0c
|
2022-03-06T13:55:48
|
|
Deprecate module init and cleanup functions
These functions shouldn't be part of the public API. Most init
functions are only thread-safe when called from xmlInitParser. Global
variables should only be cleaned up by calling xmlCleanupParser.
|
|
f2072a8b
|
2022-03-05T18:23:34
|
|
Fix memory leak in xmlFindCharEncodingHandler
Fix memory leak in an unlikely error condition. Thanks to Wentao Liang
for the report.
Fixes #342.
|
|
21ddad52
|
2022-03-04T01:07:40
|
|
Remove ICONV_CONST test
We can simply cast the offending pointer to (void *).
|
|
776d15d3
|
2022-03-02T00:29:17
|
|
Don't check for standard C89 headers
Don't check for
- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h
Stop including non-standard headers
- malloc.h
- strings.h
|
|
b66ce0bb
|
2022-03-01T12:39:02
|
|
Don't include ICU headers in public headers
There's no need to make these implementation details public.
|
|
c41bc10d
|
2022-02-22T19:57:12
|
|
Fix unused variable warnings with disabled features
|
|
346c3a93
|
2022-02-20T18:46:42
|
|
Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
|
|
7abc6e6a
|
2022-01-25T02:27:53
|
|
Fix integer conversion warning in xmlIconvWrapper
Use size_t for return value of iconv(3) to avoid an UBSan integer
conversion warning.
|
|
eb4c1bf8
|
2021-11-03T09:48:13
|
|
Fix random dropping of characters on dumping ASCII encoded XML
Fix a bug in xmlCharEncOutput return value which will cause
xmlNodeDumpOutput to drop characters randomly.
xmlCharEncOutput returns zero if the length of the input buffer is
zero but ignores the fact that it may already encoded the input buffer
and the input's length is zero due to the fact that xmlEncOutputChunk
returned -2 errors and underlying code tries to fix the error by
encoding the input.
xmlCharEncOutput is collecting the number of bytes written to the
output buffer but is returning zero instead of the total number of
bytes in this situation. This commit will fix this issue by returning
the total number of bytes instead. So the xmlNodeDumpOutput will also
continue writing and will not stop due to the fact that it mistakenly
thinks the output buffer is not changed in that iteration.
Fixes #314
|
|
03bb9293
|
2021-07-07T18:23:18
|
|
Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8().
* encoding.c:
(UTF16LEToUTF8):
- Fix comment to describe what the code does.
(UTF16BEToUTF8):
- Fix undefined behavior which was applied to UTF16LEToUTF8() in
2f9382033e.
- Add bounds check to while() loop which was applied to
UTF16LEToUTF8() in be803967db.
- Do not return -2 when (in >= inend) to fix the bug. This was
applied to UTF16LEToUTF8() in 496a1cf592.
- Inline (<< 8) statements to match UTF16LEToUTF8().
Add the following tests and results:
test/text-4-byte-UTF-16-BE-offset.xml
test/text-4-byte-UTF-16-BE.xml
test/text-4-byte-UTF-16-LE-offset.xml
test/text-4-byte-UTF-16-LE.xml
|
|
b92b16f6
|
2021-05-19T10:15:54
|
|
Remove unused variable in xmlCharEncOutFunc
Fixes a compiler warning:
encoding.c: In function 'xmlCharEncOutFunc__internal_alias':
encoding.c:2632:9: warning: unused variable 'output' [-Wunused-variable]
2632 | int output = 0;
https://gitlab.gnome.org/GNOME/libxml2/-/issues/254
|
|
dcb80b92
|
2021-02-20T20:30:43
|
|
Fix slow parsing of HTML with encoding errors
Under certain circumstances, the HTML parser would try to guess and
switch input encodings multiple times, leading to slow processing of
documents with encoding errors. The repeated scanning of the input
buffer when guessing encodings could even lead to quadratic behavior.
The code htmlCurrentChar probably assumed that if there's an encoding
handler, it is guaranteed to produce valid UTF-8. This holds true in
general, but if the detected encoding was "UTF-8", the UTF8ToUTF8
encoding handler simply invoked memcpy without checking for invalid
UTF-8. This still must be fixed, preferably by not using this handler
at all.
Also leave a note that switching encodings twice seems impossible to
implement correctly. Add a check when handling UTF-8 encoding errors
in htmlCurrentChar to avoid this situation, even if encoders produce
invalid UTF-8.
Found by OSS-Fuzz.
|
|
649d02ea
|
2020-12-07T20:19:53
|
|
encoding: fix memleak in xmlRegisterCharEncodingHandler()
The return type of xmlRegisterCharEncodingHandler() is void. The invoker
cannot determine whether xmlRegisterCharEncodingHandler() is executed
successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the
"handler" is not added to the array "handlers". As a result, the memory
of "handler" cannot be managed and released: memory leakage.
so add "xmlfree(handler)" to fix memory leakage on the failure branch of
xmlRegisterCharEncodingHandler().
Reported-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
|
|
b516ed18
|
2020-11-12T12:53:43
|
|
Fix building with ICU 68.
ICU 68 no longer defines the TRUE macro.
Closes #204.
|
|
1e41e4fa
|
2020-06-30T02:43:57
|
|
Fix return values and documentation in encoding.c
Make xmlEncInputChunk and xmlEncOutputChunk return 0 on success and
never a positive value.
Make xmlCharEncFirstLineInt, xmlCharEncFirstLineInt and
xmlCharEncOutFunc return the number of bytes written.
|
|
2f938203
|
2020-06-15T15:45:47
|
|
Fix undefined behavior in UTF16LEToUTF8
Don't perform arithmetic on null pointer.
Found with libFuzzer and UBSan.
|
|
a697ed1e
|
2020-06-15T14:49:22
|
|
Fix return value of xmlCharEncOutput
Commit 407b393d introduced a regression caused by xmlCharEncOutput
returning 0 in case of success instead of the number of bytes written.
Always use its return value for nbchars in xmlOutputBufferWrite.
Fixes #166.
|
|
20c60886
|
2020-03-08T17:19:42
|
|
Fix typos
Resolves #133.
|
|
2a350ee9
|
2019-09-30T17:04:54
|
|
Large batch of typo fixes
Closes #109.
|
|
d2293cdb
|
2018-01-30T15:04:11
|
|
Remove a misleading line from xmlCharEncOutput
Closes: https://bugzilla.gnome.org/show_bug.cgi?id=793028
It seams this line was accidentally copied over from xmlCharEncOutFunc.
In xmlCharEncOutput output is a pointer so incrementing it by ret can
point it where it wasn't supposed to be pointing. Luckily the current
implementation doesn't dereference the pointer after advancing it.
Signed-off-by: Daniel Veillard <veillard@redhat.com>
|