tree.c


Log

Author Commit Date CI Message
Nick Wellnhofer 59f2f60e 2022-09-02T00:27:57 Remove "runtime debugging" This doesn't seem useful as configuration option.
Nick Wellnhofer bdcf842c 2022-09-01T20:45:35 Move xmlIsXHTML to tree.c It's declared in tree.h and not guarded by LIBXML_OUTPUT_ENABLED like the other functions in xmlsave.c.
Nick Wellnhofer 2cac6269 2022-09-01T03:14:13 Don't use sizeof(xmlChar) or sizeof(char)
Nick Wellnhofer ad338ca7 2022-09-01T01:18:30 Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.
Nick Wellnhofer d7a334f2 2022-08-26T14:43:28 Silence -Warray-bounds warning This is a hack, but works for now. Fixes #389.
Nick Wellnhofer 0f568c0b 2022-08-26T01:22:33 Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.
Nick Wellnhofer 39745c92 2022-07-19T21:23:44 Improve documentation of tree manipulation API - Discourage use of node constructors without document. - Mention that xmlReconciliateNs is crucial when moving nodes from one document to another.
Nick Wellnhofer 3e7b4f37 2022-05-20T23:28:25 Avoid calling xmlSetTreeDoc Create text nodes with xmlNewDocText or set the document directly to avoid xmlSetTreeDoc being called when the node is inserted.
Nick Wellnhofer 823bf161 2022-05-20T22:38:38 Simplify xmlFreeNode
Nick Wellnhofer a17a1f56 2022-05-18T02:17:31 Don't reset nsDef when changing node content nsDef is only used for element nodes.
Nick Wellnhofer 24646525 2022-05-18T02:16:34 Fix unintended fall-through in xmlNodeAddContentLen
David Kilzer 6ef16dee 2022-05-13T14:43:33 Reserve byte for NUL terminator and report errors consistently in xmlBuf and xmlBuffer This is a follow-up to commit 6c283d83. * buf.c: (xmlBufGrowInternal): - Call xmlBufMemoryError() when the buffer size would overflow. - Account for NUL terminator byte when using XML_MAX_TEXT_LENGTH. - Do not include NUL terminator byte when returning length. (xmlBufAdd): - Call xmlBufMemoryError() when the buffer size would overflow. * tree.c: (xmlBufferGrow): - Call xmlTreeErrMemory() when the buffer size would overflow. - Do not include NUL terminator byte when returning length. (xmlBufferResize): - Update error message in xmlTreeErrMemory() to be consistent with other similar messages. (xmlBufferAdd): - Call xmlTreeErrMemory() when the buffer size would overflow. (xmlBufferAddHead): - Add overflow checks similar to those in xmlBufferAdd().
David Kilzer 4ce2abf6 2022-05-29T09:46:00 Fix missing NUL terminators in xmlBuf and xmlBuffer functions * buf.c: (xmlBufAddLen): - Change check for remaining space to account for the NUL terminator. When adding a length exactly equal to the number of unused bytes, a NUL terminator was not written. (xmlBufResize): - Set `buf->use` and NUL terminator when allocating a new buffer. * tree.c: (xmlBufferResize): - Set `buf->use` and NUL terminator when allocating a new buffer. (xmlBufferAddHead): - Set NUL terminator before returning early when shifting contents.
David Kilzer a6df42e6 2022-05-28T08:08:29 Fix integer overflow in xmlBufferDump() * tree.c: (xmlBufferDump): - Cap the return value to INT_MAX.
David Kilzer 461ef8ac 2022-05-25T14:19:10 Fix double colon typos in xmlBufferResize() Introduced in commit 6c283d83e.
David Kilzer 4bc3ebf3 2022-03-19T17:17:40 Fix ownership of xmlNodePtr & xmlAttrPtr fields in xmlSetTreeDoc() When changing `doc` on an xmlNodePtr or xmlAttrPtr, certain fields must either be a free-standing string, or they must be owned by `doc->dict`. The code to make this change was simply missing, so the crash happened when an xmlAttrPtr was being torn down after `doc` changed from non-NULL to NULL, but the `name` field was not copied. This is scenario 1 below. The xmlNodePtr->name and xmlNodePtr->content fields are also fixed at the same time. Note that xmlNodePtr->content is never added to the dictionary, so NULL is used instead of `newDict` to force a free-standing copy. This change covers all cases of dictionary changes: 1. Owned by old dictionary -> NULL new dictionary - Create free-standing copy of string. 2. Owned by old dictionary -> Non-NULL new dictionary - Get string from new dictionary pool. 3. Not owned by old dictionary -> Non-NULL new dictionary - No action necessary (already a free-standing string). 4. Not owned by old dictionary -> NULL new dictionary - No action necessary (already a free-standing string). * tree.c: (_copyStringForNewDictIfNeeded): Add. (xmlSetTreeDoc): - Update xmlNodePtr->name, xmlNodePtr->content and xmlAttrPtr->name when changing the document, if needed. Found by OSS-Fuzz Issue 45132.
Nick Wellnhofer 6c283d83 2022-03-08T20:10:02 [CVE-2022-29824] Fix integer overflows in xmlBuf and xmlBuffer In several places, the code handling string buffers didn't check for integer overflow or used wrong types for buffer sizes. This could result in out-of-bounds writes or other memory errors when working on large, multi-gigabyte buffers. Thanks to Felix Wilhelm for the report.
Nick Wellnhofer d314046f 2022-04-23T17:41:44 Don't try to copy children of entity references This would result in an error, aborting the whole copy operation. Regressed in commit 7618a3b1. Fixes #371.
Nick Wellnhofer 41afa89f 2022-04-10T14:09:29 Fix short-lived regression in xmlStaticCopyNode Commit 7618a3b1 didn't account for coalesced text nodes. I think it would be better if xmlStaticCopyNode didn't try to coalesce text nodes at all. This code path can only be triggered if some other code doesn't coalesce text nodes properly. In this case, OSS-Fuzz found such behavior in xinclude.c.
Nick Wellnhofer 7618a3b1 2022-02-06T21:11:38 Make xmlStaticCopyNode non-recursive
Nick Wellnhofer d99ddd9b 2022-03-05T21:46:40 Improve buffer allocation scheme In most places, we really need the double-it scheme to avoid quadratic behavior. The hybrid scheme still can cause many reallocations and the bounded scheme doesn't seem to provide meaningful protection in xmlreader.c.
Nick Wellnhofer 4a8c71eb 2022-03-04T03:35:57 Remove DOCBparser This code has been broken and deprecated since version 2.6.0, released in 2003. Because of a bug in commit 961b535c, DOCBparser.c was never compiled since 2012. I couldn't find a Debian package using any of its symbols, so it seems safe to remove this module.
Nick Wellnhofer 776d15d3 2022-03-02T00:29:17 Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h
Nick Wellnhofer c41bc10d 2022-02-22T19:57:12 Fix unused variable warnings with disabled features
Nick Wellnhofer 346c3a93 2022-02-20T18:46:42 Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.
Nick Wellnhofer 57b3abd5 2022-02-07T22:09:25 Fix xmlSetTreeDoc with entity references The children member of entity reference nodes points to the entity declaration and must never be followed when traversing a tree. In the worst case, this could lead to an infinite loop. It's somewhat unclear how moving entity references to other documents should work exactly. For now we simply set the children pointer to NULL to avoid a reference to the original document. Fixes #42.
Nick Wellnhofer ea53fc18 2022-02-07T18:24:03 Properly handle nested documents in xmlFreeNode Client code should never add document nodes as children of other nodes, but even our own XPointer code has a bug that can produce such trees. Make sure to really free nested documents. Also see commits 0815302d and 0762c9b6. Should fix #269.
Nick Wellnhofer ae728bb8 2022-01-16T15:05:41 Fix null pointer deref in xmlStringGetNodeList Check for malloc failure to avoid null deref.
Nick Wellnhofer e20c9c14 2021-03-13T18:41:47 Fix xmlGetNodePath with invalid node types Make xmlGetNodePath return NULL instead of invalid XPath when hitting unsupported node types like DTD content. Reported here: https://mail.gnome.org/archives/xml/2021-January/msg00012.html Original report: https://bugs.php.net/bug.php?id=80680
Nick Wellnhofer ad101bb5 2021-03-02T13:32:53 Clarify xmlNewDocProp documentation
Nick Wellnhofer a6e6498f 2021-03-02T13:09:06 Stop checking attributes for UTF-8 validity I can't see a reason to check attribute content for UTF-8 validity. Other parts of the API like xmlNewText have always assumed valid UTF-8 as extra checks only slow down processing. Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not freeing the old encoding would cause a memory leak. Note that this was last changed in 2008 with commit 6f8611fd which removed unnecessary encoding/decoding steps. Setting attributes should be even faster now. Found by OSS-Fuzz.
Nick Wellnhofer 688b41a0 2021-03-01T14:17:42 Fix quadratic behavior when looking up xml:* attributes Add a special case for the predefined XML namespace when looking up DTD attribute defaults in xmlGetPropNodeInternal to avoid calling xmlGetNsList. This fixes quadratic behavior in - xmlNodeGetBase - xmlNodeGetLang - xmlNodeGetSpacePreserve Found by OSS-Fuzz.
Nick Wellnhofer 01411e7c 2021-02-08T20:58:32 Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
SVGAnimate 07920b43 2021-01-26T05:42:48 Add the copy of type from original xmlDoc in xmlCopyDoc() A bug related to php DOMDocument: https://bugs.php.net/bug.php?id=80665 When copy/clone an html document, the xmlDoc->type goes from XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
Nick Wellnhofer 1d73f07d 2020-12-18T00:55:00 Fix null deref in xmlStringGetNodeList Check for malloc failure to avoid null deref. Found with libFuzzer.
Nick Wellnhofer 20c60886 2020-03-08T17:19:42 Fix typos Resolves #133.
Nick Wellnhofer b0725121 2020-01-10T15:55:07 Fix integer overflow in xmlBufferResize Found by OSS-Fuzz.
Nick Wellnhofer 0815302d 2019-12-06T12:27:29 Fix freeing of nested documents Apparently, some libxslt RVTs can contain nested document nodes, see issue #132. I'm not sure how this happens exactly but it can cause a segfault in xmlFreeNodeList after the changes in commit 0762c9b6. Make sure not to touch the (nonexistent) `content` member of xmlDocs.
Nick Wellnhofer db0c0450 2019-11-02T15:14:10 Enable more undefined behavior sanitizers Minor fix to xmlStringLenGetNodeList to avoid a pointer overflow during API test. Enable pointer-overflow and unsigned-integer-overflow sanitizers in CI tests. Technically, unsigned integer overflows aren't undefined behavior, but they typically indicate programming errors. Some hash functions that really require unsigned integer overflows have already been annotated.
Jared Yanovich 2a350ee9 2019-09-30T17:04:54 Large batch of typo fixes Closes #109.
Nick Wellnhofer 0762c9b6 2019-09-23T17:07:40 Make xmlFreeNodeList non-recursive Avoid call stack overflow when freeing deeply nested documents.
Jan Pokorný 39f10232 2019-08-09T09:44:11 Fix typos: tree: move{ -> s}, reconcil{i -> }ed, h{o -> e}ld by... ...seems to { -> be to} add. Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Nick Wellnhofer cb5541c9 2017-11-13T17:08:38 Fix libz and liblzma detection If libz or liblzma are detected with pkg-config, AC_CHECK_HEADERS must not be run because the correct CPPFLAGS aren't set. It is actually not required have separate checks for LIBXML_ZLIB_ENABLED and HAVE_ZLIB_H. Only check for LIBXML_ZLIB_ENABLED and remove HAVE_ZLIB_H macro. Fixes bug 764657, bug 787041.
J. Peter Mugaas d2c329a9 2017-10-21T13:49:31 Fix -Wimplicit-fallthrough warnings Add "falls through" comments to quench implicit-fallthrough warnings which are enabled by -Wextra under GCC 7.
Nick Wellnhofer d422b954 2017-10-09T13:37:42 Fix pointer/int cast warnings on 64-bit Windows On 64-bit Windows, `long` is 32 bits wide and can't hold a pointer. Switch to ptrdiff_t instead which should be the same size as a pointer on every somewhat sane platform without requiring C99 types like intptr_t. Fixes bug 788312. Thanks to J. Peter Mugaas for the report and initial patch.
Daniel Veillard f19385a5 2017-08-28T20:40:19 Fix a couple of misleading indentation errors Raised by gcc as potential error, no semantic change needed but fixed the indentation
Stéphane Michaut 454e397e 2017-08-28T14:30:43 Porting libxml2 on zOS encoding of code First set of patches for zOS - entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c: ask conversion of code to ISO Latin 1 to avoid having the compiler assume EBCDIC codepoint for characters. - xmlmodule.c: make sure we have support for modules - xmlIO.c: zOS path names are special avoid dsome of the expectstions from Unix/Windows
Nick Wellnhofer 5a0ae66d 2017-06-17T23:20:38 Documentation fixes Fixes bug 347465, bug 599433, bug 624550, bug 698253.
Nick Wellnhofer 8c82f5de 2017-06-07T18:32:49 Fix memory leak in xmlStringLenGetNodeList Avoid expanding the entity recursively. Use the same prevention mechanism as in xmlStringGetNodeList. xmlStringGetNodeList on the other hand wasn't fixing up the 'last' pointer. I think the memory leak can only be triggered in recovery mode. Found with libFuzzer and ASan.
Daniel Veillard bdd66182 2016-05-23T12:27:58 Avoid building recursive entities For https://bugzilla.gnome.org/show_bug.cgi?id=762100 When we detect a recusive entity we should really not build the associated data, moreover if someone bypass libxml2 fatal errors and still tries to serialize a broken entity make sure we don't risk to get ito a recursion * parser.c: xmlParserEntityCheck() don't build if entity loop were found and remove the associated text content * tree.c: xmlStringGetNodeList() avoid a potential recursion
Jan Pokorný bb654feb 2016-04-13T16:56:07 Fix typos: dictio{ nn -> n }ar{y,ies} Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Nick Wellnhofer 220a7bae 2014-12-23T21:28:37 Don't add IDs in xmlSetTreeDoc This partially reverts my previous commit fixing bug #741919.
Nick Wellnhofer f54d6a92 2014-12-19T00:08:35 Account for ID attributes in xmlSetTreeDoc
Philip Withnall 57941042 2014-10-26T18:08:04 Remove various unused value assignments As detected by Coverity (CIDs 60467–60472). https://bugzilla.gnome.org/show_bug.cgi?id=739220
Kurt Roeckx 95ebe53b 2014-10-13T16:06:21 Fix and add const qualifiers For https://bugzilla.gnome.org/show_bug.cgi?id=689483 It seems there are functions that do use the const qualifier for some of the arguments, but it seems that there are a lot of functions that don't use it and probably should. So I created a patch against 2.9.0 that makes as much as possible const in tree.h, and changed other files as needed. There were a lot of cases like "const xmlNodePtr node". This doesn't actually do anything, there the *pointer* is constant not the object it points to. So I changed those to "const xmlNode *node". I also removed some consts, mostly in the Copy functions, because those functions can actually modify the doc or node they copy from
Gaurav Gupta 6d93e9ea 2014-10-06T20:20:00 Unreachable code in tree.c For https://bugzilla.gnome.org/show_bug.cgi?id=705392 Cut out an unused block
Kyle VanderBeek 1db99699 2014-07-29T00:32:15 Support element node traversal in document fragments. https://bugzilla.gnome.org/show_bug.cgi?id=733900
Daniel Veillard 42870f46 2014-07-26T21:04:54 Add couple of missing Null checks For https://bugzilla.gnome.org/show_bug.cgi?id=733710 Reported by Gaurav but with slightly different fixes
Tristan Van Berkom f0dd6e11 2014-04-22T21:15:05 xmlNodeSetName: Allow setting the name to a substring of the currently set name Avoid freeing the currently set name until after having assigned the new name, this allows one to call xmlNodeSetName (node, node->name + 1) to set the new name of the node to a substring of the current name without introducing any crash and without requiring an extra strdup().
Daniel Veillard 7e35abeb 2014-03-28T22:55:31 Fix a doc typo Raised by Blasius Bieselbert on IRC
Nicolas Le Cam 41586ca6 2013-06-17T13:01:33 Fix compilation with minimum and xinclude. xinclude needs xmlAddNextSibling(). Compile out use of xmlLocationSetPtr when xptr is disabled. Include xpath header.
Nicolas Le Cam 77b5b464 2014-02-10T10:32:45 Legacy needs xmlSAX2StartElement() and xmlSAX2EndElement(). Fix compilation with minimum and legacy.
Jan Pokorný 75801652 2013-12-19T15:09:14 Fix typos in {tree,xpath}.c (errror) Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Gaurav 98a4e712 2013-11-29T23:28:21 Fix a couple of missing NULL checks For https://bugzilla.gnome.org/show_bug.cgi?id=708681
Daniel Veillard 75d13092 2013-09-11T15:11:27 Fix a potential NULL dereference in tree code https://bugzilla.gnome.org/show_bug.cgi?id=707750 Also reported by Gaurav, simple fix to check the pointer before dereference
Daniel Veillard 81b96178 2013-07-22T13:01:11 Two smal namespace tweaks An improvement of the documentation, and an extra safety check for xmlSetNs()
Michael Wood fb27e2cd 2012-09-28T08:59:33 Fix spelling of "length".
Daniel Veillard 7d4c529a 2012-09-05T11:45:32 Improve HTML escaping of attribute on output Handle special cases of &{...} constructs as hinted in the spec http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1 and special values as comment <!-- ... --> used for server side includes This is limited to attribute values in HTML content.
Daniel Veillard 968a03a2 2012-08-13T12:41:33 Add support for big line numbers in error reporting Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com> * parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser option not switch on by default, it's an opt-in * SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers in the psvi field of text nodes * tree.c: expand xmlGetLineNo to extract those informations, also make sure we can't fail on recursive behaviour * error.c: in __xmlRaiseError, if a node is provided, call xmlGetLineNo() if we can't get a valid line number. * xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint
Daniel Veillard 28cc42d0 2012-08-10T10:00:18 Regenerating docs and API files Various cleanups * configure.in: force regeneration of APIs in my environment * buf.c buf.h enc.h encoding.c include/libxml/tree.h include/libxml/xmlerror.h save.h tree.c: various comment cleanups pointed by apibuild * doc/apibuild.py: added the 3 new internal headers in the excludes * doc/libxml2-api.xml doc/libxml2-refs.xml: regenerated the API * doc/symbols.xml: listing new entry points for 2.9.0 * doc/devhelp/*: regenerated
Daniel Veillard 3e62adbe 2012-08-09T14:24:02 Adding various checks on node type though the API Specifially checking against namespace nodes before accessing node pointers
Daniel Veillard 6ca24a39 2012-08-08T15:31:55 Namespace nodes can't be unlinked with xmlUnlinkNode
Daniel Veillard c15df7d4 2012-08-07T15:15:04 Avoid using xmlBuffer for serialization Mostly an optimization to avoid xmlBuffer->xmlBuf conversions and use the new code.
Daniel Veillard dddeede0 2012-07-16T14:44:26 Provide new xmlBuf based saving functions * include/libxml/tree.h: adds xmlBufGetNodeContent and xmlBufNodeDump as xmlBuf based equivalents of xmlNodeGetContent and xmlNodeDump * tree.c: implements one new routine and converts xmlNodeBufGetContent to use the xmlBuf equivalent. It should behave better as a result in case of data larger than 2GB.
Daniel Veillard 94431ecb 2012-05-15T10:45:05 Fix various bugs in new code raised by the API checking * testapi.c: regenerated and covering new APIs * tree.c: xmlBufferDetach can't work on immutable buffers * xzlib.c: fix a deallocation error
Daniel Veillard 79ee284a 2012-05-15T10:25:31 Fix various problems with "make dist" * tree.c: missing documentation for xmlBufferDetach * doc/symbols.xml: add two new symbols xmlTextReaderRelaxNGValidateCtxt and xmlBufferDetach * doc/apibuild.py: ignore internal header xzlib.h
Conrad Irwin 7d0d2a50 2012-05-14T14:18:58 Use a hybrid allocation scheme in xmlNodeSetContent On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote: >  Hi Conrad, > > that's interesting ! I was initially afraid of a sudden explosion of > memory allocations for building a tree since by default buffers tend to > "waste" memory by using doubling allocations, but that's not the case. >  xmllint --noout doc/libxml2-api.xml > when compiled with memory debug produce > > paphio:~/XML -> cat .memdump >      MEMORY ALLOCATED : 0, MAX was 12756699 > > and without your patch 12755657, i.e. the increase is minimal. Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the same amount of space as XML_ALLOC_DOUBLEIT on this example (see below). So it turns out that the default realloc() on my system actually handles this case really well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the underlying realloc() after all (sorry for misleading you). If you replace the realloc() with a bad one (like valgrind's), then the performance degrades severely. This patch implements a HYBRID allocator which has the behaviour you describe (it's like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the performance of the synthetic pathological example under valgrind. In summary: max_memory on ./xmllint --noout doc/libxml2-api.xml, valgrind time on https://gist.github.com/2656940 max_memory valgrind time before | 12755657 | 29:18.2 EXACT | 12756699 | 2:58.6 <-- this is the state after the first patch. DOUBLEIT | 12756727 | 0:02.7 HYBRID | 12755754 | 0:02.7 <-- this is the state with both patches. > > There is also the cost of creating the buffers all the time. > I need to read the code and check but I may be interested in an hybrid > approach where we switch to buffer only when the text node starts to > become too big (4k would remove nearly all usuall types of "document" > usage, i.e. not blocks of data) I tried to avoid too much buffer creation by introducing the xmlBufferDetach function, which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack" in API terms though I thought the gains would be worth it. Conrad ------8<------ To keep memory usage tight in normal conditions it's desirable to only allocate as much space as is needed. Unfortunately this can lead to problems when constructing a long string out of small chunks, because every chunk you add will need to resize the buffer. To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big) from using exact allocations to doubling buffer size every time it is full. This limits the number of buffer resizes to O(log n) (down from O(n)), and thus greatly increases the performance of constructing very large strings in this manner.
Conrad Irwin 7d553f83 2012-05-10T20:17:25 Use buffers when constructing string node lists. Hi Veillard and all, Firstly, thanks for libxml: it's awesome! I noticed recently that libxml was taking a surprisingly long time to perform some operations (many minutes instead of milliseconds), and so I did some digging. It turns out that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can be called many (many) times when assigning some content into a node. For background, I'm dealing with XML that contains emails, these can have large attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends with &#13;. This means that xmlNodeAddContentLen() is being called about 200,000 times, and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic example of this at https://gist.github.com/2656940) The attached patch works around that problem by using the existing buffer API to merge the strings together before even creating the text node, this keeps the number of realloc()s at a managable level. I'd love feedback on the patch, and am happy to fix problems with it, or explore other solutions if you think that this is barking up the wrong tree :). Thanks, Conrad P.S. Should I create a bug for this too? ------8<------ Before this change xmlStringGetNodeList would perform a realloc() of the entire new content for every XML entity in the assigned text in order to merge together adjacent text nodes. This had the effect of making xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on inputs that contained a large number of XML entities. After this change the memory management is done by the buffer API, avoiding the need to continually re-measure and realloc() the string. For my test data (6MB of 80 character lines, each ending with &#13;) this takes the time to xmlSetNodeContent from about 500 seconds to around 50ms. I have not profiled smaller cases, though I tried to minimize the performance impact of my change by avoiding unnecessary string copying. Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>
Daniel Veillard 39d027cd 2012-05-11T12:38:23 Fix html serialization error and htmlSetMetaEncoding() For https://bugzilla.gnome.org/show_bug.cgi?id=630682 The python tests were reporting errors, some of it was due to a small change in case encoding, but the main one was about htmlSetMetaEncoding(doc, NULL) being broken by not removing the associated meta tag anymore
Daniel Veillard a6b14bf9 2012-01-26T17:44:35 Clarify the need to use xmlFreeNode after xmlUnlinkNode Just add one small sentence to the xmlUnlinkNode function comments
Daniel Veillard aa54d37c 2010-09-09T18:17:47 Fix handling of XML-1.0 XML namespace declaration Usually 'xml' namespace for XML-1.0 declaration does not need to be carried but Mike Hommey raised the problem that the SVG XSD file fails to parse due to a mishandling. - SAX2.c: failure to create a namespace should not be interpreted as a memory allocation error - tree.c: document better xmlNewNs behaviour, and fix it in the case the 'xml' prefix is being used.
Daniel Veillard e4d1849c 2010-03-09T11:12:30 Fix xmlNodeSetBase() comment
François Delyon 2f700908 2010-02-03T17:32:37 xmlPreviousElementSibling mistake * tree.c: xmlPreviousElementSibling it should look for preceding sibling never for the following ones...
Rob Richards ddb01cbf 2010-01-29T13:32:12 Fix lost namespace when copying node * tree.c: reconcile namespace if not found
Martin Trappel f3703105 2010-01-22T12:08:00 Fix a const warning in xmlNodeSetBase * tree.c: xmlNodeSetName: Remove const from declaration since it is used non-const anyway. Remove unnecessary cast on xmlFree later on.
Daniel Veillard 594e5dfb 2009-09-07T14:58:47 Chasing dead assignments reported by clang-scan * SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations, but this led to a few real bugs and some part not yet understood (relaxng/interleave)
Daniel Veillard 76d36458 2009-09-07T11:19:33 Fixing assorted potential problems raised by scan * encoding.c parser.c relaxng.c runsuite.c tree.c xmlreader.c xmlschemas.c: nothing really serious but better safe than sorry
Daniel Veillard ee20cd7e 2009-08-22T15:18:31 574017 Realloc too expensive on most platform * tree.c: even on BSD there is too much of a penalty hit, to use the doubling buffer size strategy on all arches not just Windows.
Daniel Veillard 8ed1072c 2009-08-20T19:17:36 Add symbol versioning to libxml2 shared libs * libxml2.syms: the symbols with history, going back to 2.4.30 * Makefile.am configure.in: linking flags detection and use * parser.c tree.c valid.c xpointer.c: various cleanup of functions which could be made static or simply discarded, not that many
Petr Pajas 2afca4a1 2009-07-30T17:47:32 Preserve attributes of include start on tree copy * tree.c: copy attributes and namespaces for that kind of node
Daniel Veillard ab2a763d 2009-07-09T08:45:03 A bit of cleanups * tree.c: avoid calling xmlAddID with NULL values * parser.c: add a few xmlInitParser in some entry points
Daniel Veillard 43bc89c1 2009-03-23T19:32:04 add a missing check in xmlAddSibling, patch by Kris Breuker avoid * tree.c: add a missing check in xmlAddSibling, patch by Kris Breuker * xmlIO.c: avoid xmlAllocOutputBuffer using XML_BUFFER_EXACT which leads to performances problems especially on Windows. daniel svn path=/trunk/; revision=3820
Rob Richards 810a78b3 2008-12-31T22:13:57 set doc on last child tree in xmlAddChildList for bug #546772. Fix problem * tree.c: set doc on last child tree in xmlAddChildList for bug #546772. Fix problem adding an attribute via with xmlAddChild reported by Kris Breuker. svn path=/trunk/; revision=3806
Daniel Veillard be2bd6ac 2008-11-27T15:26:28 adds element traversal support avoid a warning regenerated daniel * include/libxml/tree.h tree.c python/generator.py: adds element traversal support * valid.c: avoid a warning * doc/*: regenerated daniel svn path=/trunk/; revision=3804
Daniel Veillard 1dc9feb0 2008-11-17T15:59:21 fix for CVE-2008-4226, a memory overflow when building gigantic text * SAX2.c parser.c: fix for CVE-2008-4226, a memory overflow when building gigantic text nodes, and a bit of cleanup to better handled out of memory problem in that code. * tree.c: fix for CVE-2008-4225, lack of testing leads to a busy loop test assuming one have enough core memory. Daniel svn path=/trunk/; revision=3803
Daniel Veillard da3fee40 2008-09-01T13:08:57 Borland C fix from Moritz Both regenerate, workaround a problem for buffer * trionan.c: Borland C fix from Moritz Both * testapi.c: regenerate, workaround a problem for buffer testing * xmlIO.c HTMLtree.c: new internal entry point to hide even better xmlAllocOutputBufferInternal * tree.c: harden the code around buffer allocation schemes * parser.c: restore the warning when namespace names are not absolute URIs * runxmlconf.c: continue regression tests if we get the expected number of errors * Makefile.am: run the python tests on make check * xmlsave.c: handle the HTML documents and trees * python/libxml.c: convert python serialization to the xmlSave APIs and avoid some horrible hacks Daniel svn path=/trunk/; revision=3790
Daniel Veillard 1572425c 2008-08-30T15:01:04 preparing 2.7.0 release remove some testing traces remove some warnings * configure.in, doc/*: preparing 2.7.0 release * tree.c: remove some testing traces * parser.c xmlIO.c xmlschemas.c: remove some warnings Daniel svn path=/trunk/; revision=3788
Daniel Veillard e83e93e7 2008-08-30T12:52:26 make a new kind of buffer where shrinking and adding in head can avoid * include/libxml/tree.h tree.c: make a new kind of buffer where shrinking and adding in head can avoid reallocation or full buffer memmoves * encoding.c xmlIO.c: use the new kind of buffers for output buffers Daniel svn path=/trunk/; revision=3787
Daniel Veillard 2cba4158 2008-08-27T11:45:41 fix a small initialization problem raised by Ashwin increase testing * threads.c: fix a small initialization problem raised by Ashwin * testapi.c gentest.py: increase testing especially for document with an internal subset, and entities * tree.c: fix a deallocation issue when unlinking entities from a document. * valid.c: fix a missing entry point test not found previously. * doc/*: regenerated the APIs, docs etc. daniel svn path=/trunk/; revision=3778
Daniel Veillard aa6de47e 2008-08-25T14:53:31 applied patch from Aswin to fix tree skipping fixed a comment and added a * xmlreader.c: applied patch from Aswin to fix tree skipping * include/libxml/entities.h entities.c: fixed a comment and added a new xmlNewEntity() entry point * runtest.c: be less verbose * tree.c: space and tabs cleanups daniel svn path=/trunk/; revision=3774