xmlIO.c


Log

Author Commit Date CI Message
Nick Wellnhofer 1e4d8c55 2024-11-06T16:42:05 xmlIO: Fix reading from non-regular files like pipes Commit 7e14c05d removed unnecessary copying of uncompressed input through zlib or xzlib. This broke input from non-regular files like pipes which can't be reopened. Try to detect such files by checking whether they're seekable and always pipe them through zlib or xzlib. Also remove seemingly unnecessary calls to gzread and gzrewind to support unseekable files. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/124.
Nick Wellnhofer 55ddccb6 2024-09-14T00:03:56 io: Make sure not to pass partial UTF-8 to write callback We cannot split UTF-8 at arbitrary boundaries.
triallax 67ff748c 2024-08-26T23:53:29 io: don't set the executable bit when creating files Issue seems to have been introduced in 0bef93bf24def68c448af0e71844b942e0ed93ec.
Nick Wellnhofer f2c48847 2024-08-13T14:38:07 io: Add missing calls to xmlInitParser This is required after c9a46a91. Should fix #782.
Nick Wellnhofer a530ff12 2024-07-29T14:18:57 io: Always consume encoding handler when creating output buffers Also free encoding handler in error case. Remove xmlAllocOutputBufferInternal which was identical to xmlAllocOutputBuffer.
Nick Wellnhofer 36ea881b 2024-07-26T18:07:27 malloc-fail: Fix memory leak in xmlOutputBufferCreateFilename Close encoding handler on error.
Nick Wellnhofer 7b98e8d6 2024-07-18T01:54:22 io: Don't call getcwd in xmlParserGetDirectory The "directory" value isn't used internally. Calling getcwd is unnecessary and can cause problems in sandboxed environments. Fixes #770.
Nick Wellnhofer eb66d03e 2024-07-07T23:15:54 io: Deprecate a few functions
Nick Wellnhofer 97680d6c 2024-07-07T21:29:18 io: Rework xmlParserInputBufferGrow Remove dubious (len != 4) check. Remove compression-related code. This should already be set when opening the input.
Nick Wellnhofer a6f54f05 2024-07-07T18:52:17 io: Fine-tune initial IO buffer size
Nick Wellnhofer 7148b778 2024-07-07T16:11:08 parser: Optimize memory buffer I/O Reenable zero-copy IO for zero-terminated static memory buffers. Don't stream zero-terminated dynamic memory buffers on top of creating a copy.
Nick Wellnhofer 34c9108f 2024-07-07T18:38:31 encoding: Add sizeOut argument to xmlCharEncInput When push parsing, we want to convert as much of the input as possible. When pull parsing memory buffers, we want to convert data chunk by chunk to save memory.
Nick Wellnhofer a221cd78 2024-07-07T03:01:51 buf: Rework xmlBuf code Always use what the old implementation called the "IO" allocation scheme, allowing to move the content pointer past the initial allocation. This is inexpensive and allows efficient shrinking. Optimize xmlBufGrow, reusing shrunken memory as much as possible. Simplify xmlBufAdd. Make xmlBufBackToBuffer return an error on overflow. Make "size" exclude the terminating NULL byte. Always provide an initial size. Reintroduce static buffers. Remove xmlBufResize and several other functions.
Nick Wellnhofer 8d160626 2024-07-12T02:01:06 entities: Rework text escaping
Nick Wellnhofer cc45f618 2024-07-11T22:06:31 save: Rework text escaping Stop using xmlOutputBufferWriteEscape except when using deprecated xmlSaveSetEscape. Rewrite xmlOutputBufferWriteEscape to use an extra buffer and call xmlOutputBufferWrite. Introduce xmlSerializeText to serialize both text and attribute content. Don't read encoding from document when serializing and remove all hacks that temporarily changed the document's encoding.
Nick Wellnhofer 0ab07b21 2024-07-11T20:04:39 io: Rework xmlOutputBufferWrite Simplify code, handle short writes from callback.
Nick Wellnhofer e0494c0d 2024-07-15T15:10:18 io: Add some deprecation warnings
Nick Wellnhofer da686399 2024-07-09T12:29:53 io: Fix return value of xmlFileRead This broke in commit 6d27c54. Fixes #766.
Nick Wellnhofer 84a4f84c 2024-06-22T02:11:24 build: Don't check for required headers and functions Unless we are on Windows, the following POSIX headers are required. They're part of the earliest POSIX specs and it doesn't make sense to check for them. - fcntl.h - unistd.h - sys/stat.h - sys/time.h On Windows, io.h, fcntl.h and sys/stat.h are always available.
Nick Wellnhofer dba1ed85 2024-06-12T18:19:55 ftp: Remove FTP support Remove the built-in FTP client. If you configure --with-legacy, old symbols are retained for ABI compatibility.
Nick Wellnhofer ab5e6deb 2024-06-11T18:11:51 parser: Introduce XML_INPUT_NETWORK input flag This allows to disable network access when creating parser inputs with xmlInputCreateUrl.
Nick Wellnhofer 64ad2725 2024-06-11T03:51:43 parser: Introduce per-context resource loader
Nick Wellnhofer b9d2f3c9 2024-06-11T02:15:18 parser: Introduce new input API - xmlInputCreateUrl - xmlInputCreateMemory - xmlInputCreateString - xmlInputCreateFd - xmlInputCreateIO - xmlInputSetEncoding These functions don't take a parser context and work on xmlParserInputs, replacing functions working on xmlParserInputBuffers. xmlInputCreateUrl and xmlInputSetEncoding offer fine-grained error handling. Several XML_INPUT_* flags offer additional control.
Nick Wellnhofer ff3b0919 2024-06-11T00:00:32 parser: Implement XML_PARSE_NO_UNZIP option
Nick Wellnhofer 1432949d 2024-06-10T23:57:52 io: Pass input flags to xmlParserInputBufferCreateUrl
Nick Wellnhofer b5890cb4 2024-06-10T18:51:56 io: Remove xmlParserInputBufferCreateFilenameSafe
Nick Wellnhofer 1b1e8b3c 2024-06-10T16:39:57 io: Stop invoking generic error handler for IO errors
Nick Wellnhofer a331526c 2024-06-10T16:21:12 io: Don't report write errors twice
Nick Wellnhofer 717f3a7b 2024-06-10T18:50:28 io: Fix resetting xmlParserInputBufferCreateFilename hook We don't want to invoke the default function.
Nick Wellnhofer e75e878e 2024-05-20T13:58:22 doc: Update and fix documentation
Nick Wellnhofer a4c2b723 2024-05-05T17:26:31 io: Don't set close callback in xmlParserInputBufferCreateFd
Nick Wellnhofer a279aae3 2024-03-18T14:20:19 io: Allocate output buffer with XML_BUFFER_ALLOC_IO This allows efficient shrinking of memory buffers. Support IO buffers in xmlBufDetach.
Nick Wellnhofer c1fe9e72 2024-03-06T15:21:49 io: Report more malloc failures when writing to output buffer
Nick Wellnhofer 67e475b7 2024-02-19T11:09:39 http: Improve error message for HTTPS redirects
Nick Wellnhofer e314109a 2024-02-16T15:42:38 save: Don't write directly to internal buffer Make sure that OOM errors are reported.
Nick Wellnhofer 0d170aca 2024-02-01T11:51:58 io: Report malloc failure in xmlOutputBufferWrite Fixes #676.
Nick Wellnhofer d2b55a7a 2024-01-05T20:31:10 writer: Implement xmlTextWriterClose This function can be used to make sure that closing the output stream succeeded. Fixes #513.
Nick Wellnhofer e45a4d71 2023-12-29T00:00:21 io: Always forward IO errors to global handler The HTTP module raises errors without context. This won't be fixed, so send them to the global error handler.
Nick Wellnhofer 7e0bbbc1 2023-12-27T18:33:30 parser: New input API Provide a new set of functions to create xmlParserInputs. These can be used for the document entity or from external entity loaders. - Don't require xmlParserInputBuffer. - All functions take a base URI. - All functions take an encoding as string. - xmlNewInputURL also takes a public ID. - xmlNewInputMemory takes a size_t. - Optimization hints for memory buffers. Improve documentation. Only call xmlInitParser before allocating a new parser context. Call xmlCtxtUseOptions as early as possible.
Nick Wellnhofer c2ef78f7 2023-12-24T23:56:57 io: Fix close error handling There's no way to report error codes from closing an output buffer yet.
Nick Wellnhofer 6d27c549 2023-12-24T17:59:02 io: Fix read/write error handling Handle short reads/writes from fd. Fix stdio error handling.
Nick Wellnhofer 0bef93bf 2023-12-23T04:03:41 io: More refactoring and unescaping fixes Merge Windows wrappers into relevant functions. Remove more unnecessary unescaping. Merge *OpenW into *Open functions. Use unbuffered IO for output.
Nick Wellnhofer a2693410 2023-12-23T00:35:30 io: Move some code from xmlIO.c to parserInternals.c Move everything related to parser contexts to parserInternals.c.
Nick Wellnhofer 8ab1b122 2023-12-23T00:00:15 Fix filename and URI handling Many strings are passed to the library that could be either URIs or filesystem paths. We now assume that strings are a URI if they contain the substring "://". This means that they have a scheme and an authority. Otherwise, URI resolution wouldn't make much sense. Fix xmlBuildURI to work with filesystem paths. If the base URI doesn't contain "://" it is treated as filename. The resolved URI is unescaped, appended and the result is normalized. Rewrite xmlNormalizePath to handle Windows quirks. All special handling for Windows paths is removed in xmlCanonicPath. If the path looks like an URI, only escape characters allowed in Legacy Extended IRIs. Make xmlPathToURI only call xmlCanonicPath. Theh additional round-trip through URI parser and serializer seems useless. Add a helper function xmlConvertUriToPath in xmlIO.c which checks for file URIs and unescapes them. Always process strings with xmlCanonicPath in xmlLoadExternalEntity. This should be harmless now. Should help with #334, #387, #611.
Nick Wellnhofer 229e5ff7 2023-12-21T18:09:42 io: Remove support for HTTP POST This feature is unlikely to be used these days.
Nick Wellnhofer 0a658c0f 2023-12-20T23:53:19 io: Don't use "-" to read from stdin To implement this feature on such a low level is a disaster waiting to happen. Remove these checks from the IO code and move them to xmllint. Note that the serialization API will still treat "-" as stdout.
Nick Wellnhofer c9a46a91 2023-12-20T20:11:09 io: Rework initialization
Nick Wellnhofer b75fc1ab 2023-12-20T20:01:19 io: Rearrange code
Nick Wellnhofer 13043691 2023-12-20T00:33:34 parser: Rename xmlErrParser to xmlCtxtErr
Nick Wellnhofer 9fbe46ba 2023-12-19T20:10:10 io: Consolidate error messages
Nick Wellnhofer 23345a1c 2023-12-19T19:52:28 io: Report IO errors through xmlCtxtErrIO This is also a new public API function to be used in external entity loaders.
Nick Wellnhofer 1ef35663 2023-12-19T19:36:35 io: Always use unbuffered input Before, we often used unbuffered input via the lzma or gzip handlers, more or less inadvertently. Change the default file handlers from buffered (stdc FILE) to unbuffered (POSIX fds).
Nick Wellnhofer 7e14c05d 2023-12-19T17:05:08 io: Fix detection of compressed streams Make sure that we don't try to open uncompressed streams with a compression handler in copying mode.
Nick Wellnhofer 7e511f35 2023-12-19T15:41:37 io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile This allows to report the reason why opening a file failed to the parser context and improve error messages. Now we can also remove the stat call before opening a file.
Nick Wellnhofer b2dbcc43 2023-12-19T13:33:59 io: Rework default callbacks Register a dummy callback struct for default callbacks. Handle them in a separate function which will later allow to return meaningful error codes.
Nick Wellnhofer 54c70ed5 2023-12-18T19:31:29 parser: Improve error handling Introduce xmlCtxtSetErrorHandler allowing to set a structured error for a parser context. There already was the "serror" SAX handler but this always receives the parser context as argument. Start to use xmlRaiseMemoryError. Remove useless arguments from memory error functions. Rename xmlErrMemory to xmlCtxtErrMemory. Remove a few calls to xmlGenericError. Remove support for runtime entity debugging.
Nick Wellnhofer c5a8aef2 2023-12-18T19:12:08 error: Refactor error reporting Introduce xmlStrVASPrintf, trying to handle buggy snprintf implementations. Introduce xmlSetError to set errors atomically. Introduce xmlUpdateError to set an error, fixing up node, file and line. Introduce helper function xmlRaiseMemoryError. Make legacy error handlers call xmlReportError, avoiding checks in xmlVRaiseError. Remove fragile support for getting file and line info from XInclude nodes.
Nick Wellnhofer c2bbeed1 2023-12-12T23:51:32 io: Fix memory lifetime issue with input buffers xmlParserInputBufferCreateMem must make a copy of the buffer. This fixes a regression from 2.11 which could cause reads from freed memory depending on the use case. Undeprecate xmlParserInputBufferCreateStatic which can avoid copying the whole buffer.
Nick Wellnhofer f19a9510 2023-12-10T17:50:22 parser: Report malloc failures Fix many places where malloc failures aren't reported. Make xmlErrMemory public. This is useful for custom external entity loaders. Introduce new API function xmlSwitchEncodingName. Change the way how we store whether the the parser is stopped. This used to be signaled by setting ctxt->instate to XML_PARSER_EOF which was misdesigned and error-prone. Set ctxt->disableSAX to 2 instead and introduce a macro PARSER_STOPPED. Also stop to remove parser inputs in xmlHaltParser. This allows to remove many checks of ctxt->instate. Introduce xmlErrParser to handle errors if a parser context is available.
Nick Wellnhofer 455c61d6 2023-11-23T15:59:41 Remove VMS support This was last updated 10 years ago and is most likely broken.
Nick Wellnhofer 11a1839d 2023-09-20T17:54:48 globals: Move remaining globals back to correct header files This undoes a lot of damage.
Nick Wellnhofer 4e1c13eb 2023-09-18T14:45:10 debug: Remove debugging code This is barely useful these days and only clutters the code base.
Nick Wellnhofer 95e81a36 2023-08-08T15:21:31 parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.
Nick Wellnhofer 834b8123 2023-08-08T15:21:28 parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.
Nick Wellnhofer 4ee08155 2023-08-08T15:19:51 encoding: Move rawconsumed accounting to xmlCharEncInput
Nick Wellnhofer ec7be506 2023-08-08T15:19:46 parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.
Nick Wellnhofer b230861d 2023-04-30T18:38:16 xmlIO: Remove some calls to xmlIOErr The xmlIOErr functions use the global error handler and should be avoided if possible.
Nick Wellnhofer 320f5084 2023-04-30T18:25:09 parser: Improve handling of encoding and IO errors Make sure that xmlCharEncInput, xmlParserInputBufferPush and xmlParserInputBufferGrow set the correct error code in the xmlParserInputBuffer. Handle errors when calling these functions.
Nick Wellnhofer 97086fd7 2023-02-14T14:45:58 malloc-fail: Fix memory leak in xmlParserInputBufferCreateMem Found with libFuzzer, see #344.
Nick Wellnhofer 2355eac5 2023-01-22T14:52:06 malloc-fail: Fix null deref if growing input buffer fails Also add some error checks. Found with libFuzzer, see #344.
Nick Wellnhofer 2059df53 2022-11-14T22:27:58 buf: Deprecate static/immutable buffers
Nick Wellnhofer 249cee4b 2022-11-13T20:19:13 io: Fix a few integer overflows in I/O statistics There are still many places where arithmetic on "consumed" stats isn't checked for overflow, affecting platforms with a 32-bit long type.
Nick Wellnhofer 1ef4938f 2022-11-13T17:55:28 io: Rework xmlParserInputBufferGrow with encodings Read data directly into the "raw" buffer when converting encodings. Make sure not to grow memory input buffers.
Nick Wellnhofer 46cd7d22 2022-11-13T16:30:46 io: Remove xmlInputReadCallbackNop In some cases, for example when using encoders, the read callback was set to NULL, in other cases it was set to xmlInputReadCallbackNop. xmlGROW only tested for xmlInputReadCallbackNop, resulting in errors when parsing large encoded content from memory. Always use a NULL callback for memory buffers to avoid ambiguities. Fixes #262.
Nick Wellnhofer 22d879bf 2022-11-13T15:08:44 io: Fix "buffer full" error with certain buffer sizes Remove a useless check in xmlParserInputBufferGrow that could be triggered after changing xmlBufAvail in c14cac8b. Fixes #438.
Nick Wellnhofer 5bffa33a 2022-09-02T05:03:03 Stop including sys/types.h
Nick Wellnhofer 2cac6269 2022-09-01T03:14:13 Don't use sizeof(xmlChar) or sizeof(char)
Nick Wellnhofer ad338ca7 2022-09-01T01:18:30 Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.
Nick Wellnhofer 0f568c0b 2022-08-26T01:22:33 Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.
David Kilzer c14cac8b 2022-05-25T18:13:07 xmlBufAvail() should return length without including a byte for NUL terminator * buf.c: (xmlBufAvail): - Return the number of bytes available in the buffer, but do not include a byte for the NUL terminator so that it is reserved. * encoding.c: (xmlCharEncFirstLineInput): (xmlCharEncInput): (xmlCharEncOutput): * xmlIO.c: (xmlOutputBufferWriteEscape): - Remove code that subtracts 1 from the return value of xmlBufAvail(). It was implemented inconsistently anyway.
Mehltretter Karl c1632fbd 2022-05-06T10:58:58 fix typo in comment
David Kilzer 21561e83 2016-05-20T15:21:43 Mark more static data as `const` Similar to 8f5710379, mark more static data structures with `const` keyword. Also fix placement of `const` in encoding.c. Original patch by Sarah Wilkin.
Joey Arhar b7b29df9 2022-03-29T16:07:51 Add windows includes to xmlIO.c xmlIO.c calls read() and getcwd() which need io.h and direct.h respectively when compiling on windows. Otherwise, a compiler error may be raised saying that read() and getcwd() were used implicitly. This was regressed recently, I'm guessing it was due to the changes to win32config.h in commit 84085a26
Nick Wellnhofer d99ddd9b 2022-03-05T21:46:40 Improve buffer allocation scheme In most places, we really need the double-it scheme to avoid quadratic behavior. The hybrid scheme still can cause many reallocations and the bounded scheme doesn't seem to provide meaningful protection in xmlreader.c.
Nick Wellnhofer 776d15d3 2022-03-02T00:29:17 Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h
Nick Wellnhofer b094e814 2022-03-01T00:02:59 Remove broken Windows CE support
Nick Wellnhofer 655cf3f4 2022-02-28T23:39:00 Always fopen files with "rb" We never want translation of newlines when reading files, so it should be safe to always specify "rb". On sane platforms, the "b" flag is simply ignored.
Nick Wellnhofer 3f8655db 2022-02-28T23:22:50 Remove __DJGPP__ checks Drop broken support for DJGPP.
Nick Wellnhofer 2489c1d0 2022-02-28T22:42:10 Remove useless __CYGWIN__ checks From what I can tell, some really early Cygwin versions from around 1998-2000 used to erroneously define _WIN32. This was eventually fixed, but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is unnecessary. Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether to use __declspec.
Nick Wellnhofer c41bc10d 2022-02-22T19:57:12 Fix unused variable warnings with disabled features
Nick Wellnhofer 346c3a93 2022-02-20T18:46:42 Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.
David King d7f11fd0 2021-07-14T17:03:46 Fix leak in __xmlOutputBufferCreateFilename Found by Coverity. https://bugzilla.redhat.com/show_bug.cgi?id=1938806
Nick Wellnhofer dea91c97 2021-07-27T16:12:54 Fix buffering in xmlOutputBufferWrite Fix a regression introduced with commit a697ed1e which caused xmlOutputBufferWrite to flush internal buffers too late. Fixes #296.
Nick Wellnhofer a697ed1e 2020-06-15T14:49:22 Fix return value of xmlCharEncOutput Commit 407b393d introduced a regression caused by xmlCharEncOutput returning 0 in case of success instead of the number of bytes written. Always use its return value for nbchars in xmlOutputBufferWrite. Fixes #166.
Nick Wellnhofer 20c60886 2020-03-08T17:19:42 Fix typos Resolves #133.
Nick Wellnhofer c2e09f44 2020-02-11T11:32:23 Add xmlPopOutputCallbacks Add function to pop a single set of output callbacks from the stack. This was only implemented for input callbacks before. Fixes #135.
Nick Wellnhofer 40e00bc5 2019-10-14T16:56:59 Fix integer overflow when counting written bytes Check for integer overflow when updating the `written` member of struct xmlOutputBuffer in xmlIO.c. Closes #112. Resolves !54 and !55.
Jared Yanovich 2a350ee9 2019-09-30T17:04:54 Large batch of typo fixes Closes #109.
Nick Wellnhofer 6705f4d2 2019-09-16T15:45:27 Remove executable bit from non-executable files
zhouzhongyuan 4f67dbb0 2019-07-09T15:11:01 fix memory leak in xmlAllocOutputBuffer