Log

Author Commit Date CI Message
Patrick Gansterer aabc0847 2012-08-10T12:34:24 Fix compiler warnings of wincecompat.c For https://bugzilla.gnome.org/show_bug.cgi?id=681592 *) Add and explicit cast when converting FILE* to int *) Don't assign a c-string to the element of an char-array
Patrick Gansterer fd4f6fdd 2012-08-13T17:54:20 Fix non __GNUC__ build For https://bugzilla.gnome.org/show_bug.cgi?id=681590 Length member of _xmlDictEntry is called "len" and not "l"
Daniel Veillard 3b666224 2012-08-13T17:49:15 Fix const qualifyer to definition of xmlBufferDetach For https://bugzilla.gnome.org/show_bug.cgi?id=676629 As the buffer is beng mdified by the call the const doesn't make sense.
Patrick Gansterer 5a82e48e 2012-08-13T17:39:06 Fix windows unicode build For https://bugzilla.gnome.org/show_bug.cgi?id=638650 After much discussions in the list: https://mail.gnome.org/archives/xml/2012-May/msg00062.html The simplest at this point is to fallback to only officially supporting ASCII names in those APIs, document it and use the "A" entry points on Windows.
Roumen Petrov c3b1d09b 2012-08-13T16:50:48 clean redefinition of {v}snprintf in C-source as those from *config.h are preferable (e.g. win32config.h)
Roumen Petrov 1f0453f7 2012-08-13T16:56:11 minimize use of HAVE_CONFIG_H as build process for supported platforms provide "config.h" header file
Roumen Petrov 8886f335 2012-08-13T16:38:09 fixup regression in Various "make distcheck" and portability fixups Was using the wrong variable and adds proper m4 quoting
Daniel Veillard 968a03a2 2012-08-13T12:41:33 Add support for big line numbers in error reporting Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com> * parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser option not switch on by default, it's an opt-in * SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers in the psvi field of text nodes * tree.c: expand xmlGetLineNo to extract those informations, also make sure we can't fail on recursive behaviour * error.c: in __xmlRaiseError, if a node is provided, call xmlGetLineNo() if we can't get a valid line number. * xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint
Daniel Veillard 264cee69 2012-08-13T12:40:53 Add a missing element check
Daniel Veillard aa017c54 2012-08-10T10:42:56 Release candidate 1 of libxml2-2.9.0 * configure.in libxml.spec.in python/setup.py: bumped release numbers * doc//*: regenerated as part of the release
Daniel Veillard 28cc42d0 2012-08-10T10:00:18 Regenerating docs and API files Various cleanups * configure.in: force regeneration of APIs in my environment * buf.c buf.h enc.h encoding.c include/libxml/tree.h include/libxml/xmlerror.h save.h tree.c: various comment cleanups pointed by apibuild * doc/apibuild.py: added the 3 new internal headers in the excludes * doc/libxml2-api.xml doc/libxml2-refs.xml: regenerated the API * doc/symbols.xml: listing new entry points for 2.9.0 * doc/devhelp/*: regenerated
Daniel Veillard 3e62adbe 2012-08-09T14:24:02 Adding various checks on node type though the API Specifially checking against namespace nodes before accessing node pointers
Daniel Veillard 6ca24a39 2012-08-08T15:31:55 Namespace nodes can't be unlinked with xmlUnlinkNode
Roumen Petrov 89b6f73a 2012-08-04T05:09:56 use xmlBuf... if DEBUG_INPUT is defined
Daniel Veillard c15df7d4 2012-08-07T15:15:04 Avoid using xmlBuffer for serialization Mostly an optimization to avoid xmlBuffer->xmlBuf conversions and use the new code.
Daniel Veillard 7f713494 2012-08-07T14:34:53 Improve compatibility between xmlBuf and xmlBuffer An old xsltproc binary now works correctly with the new libxml2
Daniel Richard G 495a73df 2012-08-07T10:14:56 fix runtests to use pthreads support for various Unix platforms The runtests program currently fails with Specific platform thread support not detected on HP-UX, AIX and other Unix systems which do not match the conditional #if defined(linux) || defined(__sun) || defined(__APPLE_CC__) It is silly to try to enumerate all systems which use pthreads in a conditional like this. I am attaching a patch (against git master) that rewrites the cpp conditional structure so that pthreads is used if HAVE_PTHREAD_H is defined, and moves that section of code down below the Win32 and BeOS cases so that native thread libraries are used preferentially in those two cases.
Daniel Richard G 5d6c02ba 2012-08-07T10:05:34 Various "make distcheck" and portability fixups 2nd part doc/examples/Makefile.am: * Use $(VAR), not @VAR@ * Use $(MKDIR_P) instead of $(mkinstalldirs), as the latter is an * obsolete name * Added $(srcdir) qualification to the various test program invocations * in the "tests" target. More work is needed here (notably, when the reference output contains the path to the input file), but this gets things a lot closer to working correctly in an out-of-source build. doc/examples/reader4.res: * Added "./" path qualifiers so that the reader4 test continues to pass cleanly for in-source builds python/tests/Makefile.am: * Symlink in test input files for out-of-source builds
Daniel Richard G 5706b6d8 2012-08-06T11:32:54 Various "make distcheck" and portability fixups Makefile.am: * Don't use @VAR@, use $(VAR). Autoconf's AC_SUBST provides us the Make variable, it allows overriding the value at the command line, and (notably) it avoids a Make parse error in the libxml2_la_LDFLAGS assignment when @MODULE_PLATFORM_LIBS@ is empty * Changed how the THREADS_W32 mechanism switches the build between testThreads.c and testThreadsWin32.c as appropriate; using AM_CONDITIONAL allows this to work cleanly and plays well with dependencies * testapi.c should be specified as BUILT_SOURCES * Create symlinks to the test/ and result/ subdirs so that the runtests target is usable in out-of-source-tree builds * Don't do MAKEFLAGS+=--silent as this is not portable to non-GNU Makes * Fixed incorrect find(1) syntax in the "cleanup" rule, and doing "rm -f" instead of just "rm" is good form * (DIST)CLEANFILES needed a bit more coverage to allow "make distcheck" to pass configure.in: * Need AC_PROG_LN_S to create test/ and result/ symlinks in Makefile.am * AC_LIBTOOL_WIN32_DLL and AM_PROG_LIBTOOL are obsolete; these have been superceded by LT_INIT * Don't rebuild docs by default, as this requires GNU Make (as implemented) * Check for uint32_t as some platforms don't provide it * Check for some more functions, and undefine HAVE_MMAP if we don't also HAVE_MUNMAP (one system I tested on actually needed this) * Changed THREADS_W32 from a filename insert into an Automake conditional * The "Copyright" file will not be in the current directory if builddir != srcdir doc/Makefile.am: * EXTRA_DIST cannot use wildcards when they refer to generated files; this breaks dependencies. What I did was define EXTRA_DIST_wc, which uses GNU Make $(wildcard) directives to build up a list of files, and EXTRA_DIST, as a literal expansion of EXTRA_DIST_wc. I also added a new rule, "check-extra-dist", to simplify checking that the two variables are equivalent. (Note that this works only when builddir == srcdir) (I can implement this differently if desired; this is just one way of doing it) * Don't define an "all" target; this steps on Automake's toes * Fixed up the "libxml2-api.xml ..." rule by using $(wildcard) for dependencies (as Make doesn't process the wildcards otherwise) and qualifying appropriate files with $(srcdir) (Note that $(srcdir) is not needed in the dependencies, thanks to VPATH, which we can count on as this is GNU-Make-only code anyway) doc/devhelp/Makefile.am: * Qualified appropriate files with $(srcdir) * Added an "uninstall-local" rule so that "make distcheck" passes doc/examples/Makefile.am: * Rather than use a wildcard that doesn't work, use a substitution that most Make programs can handle doc/examples/index.py: * Do the same here include/libxml/nanoftp.h: * Some platforms (e.g. MSVC 6) already #define INVALID_SOCKET: user@host:/cygdrive/c/Program Files/Microsoft Visual Studio/VC98/\ Include$ grep -R INVALID_SOCKET . ./WINSOCK.H:#define INVALID_SOCKET (SOCKET)(~0) ./WINSOCK2.H:#define INVALID_SOCKET (SOCKET)(~0) include/libxml/xmlversion.h.in: * Support ancient GCCs (I was actually able to build the library with 2.5 but for this bit) python/Makefile.am: * Expanded CLEANFILES to allow "make distcheck" to pass python/tests/Makefile.am: * Define CLEANFILES instead of a "clean" rule, and added tmp.xml to allow "make distcheck" to pass testRelax.c: * Use HAVE_MMAP instead of the less explicit HAVE_SYS_MMAN_H (as some systems have the header but not the function) testSchemas.c: * Use HAVE_MMAP instead of the less explicit HAVE_SYS_MMAN_H testapi.c: * Don't use putenv() if it's not available threads.c: * This fixes the following build error on Solaris 8: libtool: compile: cc -DHAVE_CONFIG_H -I. -I./include -I./include \ -D_REENTRANT -D__EXTENSIONS__ -D_REENTRANT -Dsparc -Xa -mt -v \ -xarch=v9 -xcrossfile -xO5 -c threads.c -KPIC -DPIC -o threads.o "threads.c", line 442: controlling expressions must have scalar type "threads.c", line 512: controlling expressions must have scalar type cc: acomp failed for threads.c *** Error code 1 trio.c: * Define isascii() if the system doesn't provide it trio.h: * The trio library's HAVE_CONFIG_H header is not the same as LibXML2's HAVE_CONFIG_H header; this change is needed to avoid a double-inclusion win32/configure.js: * Added support for the LZMA compression option win32/Makefile.{bcb,mingw,msvc}: * Added appropriate bits to support WITH_LZMA=1 * Install the header files under $(INCPREFIX)\libxml2\libxml instead of $(INCPREFIX)\libxml, to mirror the install location on Unix+Autotools xml2-config.in: * @MODULE_PLATFORM_LIBS@ (usually "-ldl") needs to be in there in order for `xml2-config --libs` to provide a complete set of dependencies xmllint.c: * Use HAVE_MMAP instead of the less-explicit HAVE_SYS_MMAN_H
Daniel Veillard e258adec 2012-08-06T11:16:30 Provide new accessors for xmlOutputBuffer To avoid digging into buf->buffer insternal strcuture the two new entry points xmlOutputBufferGetContent() and xmlOutputBufferGetSize() should make the ode cleaner. * include/libxml/xmlIO.h: add two new functions * xmlIO.c: impement the 2 functions based on the new buffer entry points
Daniel Veillard 187e5290 2012-08-06T10:27:58 Fix make dist to include new private header files
Daniel Veillard 18e1f1f1 2012-08-06T10:16:41 Improvements for old buffer compatibility Now tree.h exports LIBXML2_NEW_BUFFER macro indicating that the API uses the new buffers, important to keep code working with both versions. * tree.h buf.h: also export xmlBufContent(), xmlBufEnd(), and xmlBufUse() to help port the old code * buf.c: make sure the compatibility counters are updated on buffer usage, to keep proper working of application compiled against the old structures, but take care of int overflow
Daniel Veillard 3f0c613f 2012-08-03T12:04:09 Expand the limit test program
Daniel Veillard 5353bbf7 2012-08-03T12:03:31 More fixups on the push parser behaviour
Daniel Veillard 2b52aa00 2012-07-31T10:53:47 Strengthen behaviour of the push parser in problematic situations Implement the maximum lookahead stategy, and fix some handling of DTD to speed up processing.
Daniel Veillard e7bf892d 2012-07-30T20:09:25 Improve error reporting on parser errors The extra string was being dismissed when provided. * parser.c: handle bot case properly * result/: this changes a few error reports
Daniel Veillard 48b4cdde 2012-07-30T16:16:04 Enforce XML_PARSER_EOF state handling through the parser That condition is one raised when the parser should positively stop processing further even to report errors. Best is to test is after most GROW call especially within loops
Daniel Veillard 0df83cae 2012-07-30T15:41:10 Fixup limits parser
Daniel Veillard cd852ad1 2012-07-30T10:12:18 Implement some default limits in the XPath module This adds some internal limitationson XPath expression complexity, and limits at runtime like depth of the stack and maximum size for nodeset. * xpath.c: implement the above as well as the maximum Name lenght
Daniel Veillard 52d8ade7 2012-07-30T10:08:45 Introduce some default parser limits Those can be overrided by the XML_PARSE_HUGE option, they are just default limits for Name lenght, dictionary size limits and maximum amount of parser lookup. * include/libxml/parserInternals.h: define the limits * include/libxml/xmlerror.h: add a new error * parser.c parserInternals.c: implements the new limits
Daniel Veillard 7c693dad 2012-07-25T16:32:18 Cleanups and new limit APIs for dictionaries * include/libxml/dict.h dict.c: adding 2 new functions xmlDictGetUsage and xmlDictSetLimit allowing to review the amount of memory allocated for dictionary strings. Aslo cleanup of various signed int used as size values in the code.
Daniel Veillard 6f6feba8 2012-07-25T16:30:56 Fixup for buf.c
Daniel Veillard 57560386 2012-07-24T11:44:23 Cleanup URI module memory allocation code * uri.c: cleanup the code doing the allocations, set up a structured error handler to report memory errors, and set up an abitrary limit on URI saving size * error.c include/libxml/xmlerror.h: add a new FROM_URI indication for structured error reporting, also adding strings for schematron and buffer which were missing
Daniel Veillard 747c2c10 2012-07-19T20:36:43 Extend testlimits
Daniel Veillard f572a78d 2012-07-19T20:36:25 More avoid quadratic behaviour
Daniel Veillard 51304816 2012-07-19T20:34:26 Impose a reasonable limit on PI size Unless the XML_PARSE_HUGE option is given to the parser, the value is XML_MAX_TEXT_LENGTH, i.e. the same than for a text node within content. Also cleanup some unsigned int used for memory size.
Daniel Veillard 0de1f311 2012-07-18T17:43:34 first version of testlimits new test Used to check behaviour on various parsing limits
Daniel Veillard 65686451 2012-07-19T18:25:01 Avoid quadratic behaviour in some push parsing cases avoid rescanning over and over a very long input, just check the incoming chunks
Daniel Veillard 58f73aca 2012-07-19T11:58:47 Impose a reasonable limit on comment size Unless the XML_PARSE_HUGE option is given to the parser, the value is XML_MAX_TEXT_LENGTH, i.e. the same than for a text node within content. Also cleanup some unsigned int used for memory size.
Daniel Veillard e17db994 2012-07-19T11:25:16 Impose a reasonable limit on attribute size Unless the XML_PARSE_HUGE option is given to the parser, the value is XML_MAX_TEXT_LENGTH, i.e. the same than for a text node within content.
Daniel Veillard b60e612e 2012-07-18T16:21:17 Small cleanup of unused variables in test
Daniel Veillard 9ee02f80 2012-07-16T19:57:42 Harden the buffer code and make it more compatible Mimic the old xmlBuffer strcture in xmlBuf to avaoid catastrophic failures in case of old code directly reading ctxt->input->buf->buffer Check on all buffer entry points if an error previously occured on the buffer, and fail the operation if this is the case, the buffer becomes immutable and unreadable.
Daniel Veillard 00ac0d3b 2012-07-16T18:03:01 More cleanups for input/buffers code When calling xmlParserInputBufferPush, the buffer may be reallocated and at the input level the pointers for base, cur and end need to be reevaluated. * buf.c buf.h: add two new functions, one to get the base from the input of the buffer, and another one to reset the pointers based on the cur and base inded * HTMLparser.c parser.c: cleanup to use the new helper functions as well as making sure size_t is used for the indexes computations
Daniel Veillard 61551a1e 2012-07-16T16:28:47 Cleanup function xmlBufResetInput() to set input from Buffer This was scattered in a number of modules, xmlParserInputPtr have usually their base, cur and end pointer set from an xmlBuf used as input. * buf.c buf.h: add a new function implementing this setup * parser.c HTMLparser.c catalog.c parserInternals.c xmlreader.c use the new function instead of digging into the buffer in all those modules
Daniel Veillard 145477d8 2012-07-16T14:59:29 Swicth the test program for characters to new input buffers it was manipulating the buffer content and structures directly this cleans it up
Daniel Veillard 7b9b0719 2012-07-16T14:58:02 Convert the HTML tree module to the new buffers The new input buffers induced a couple of changes, the others are related to the switch to xmlBuf in saving routines.
Daniel Veillard a78d8036 2012-07-16T14:56:50 Convert of the HTML parser to new input buffers Changes similar to the ones done in the XML parser for the routines which are not shared.
Daniel Veillard dbf5411b 2012-07-16T14:54:45 Convert the writer to new output buffer and save APIs Only a handful of places had to be converted for xmlBuf and the new saving entry point.
Daniel Veillard 8aebce3e 2012-07-16T14:42:31 Convert XMLReader to the new input buffers A few direct access were replaced, and also one internal xmlBuffer structure is converted to use xmlBuf instead
Daniel Veillard 50cdab55 2012-07-16T14:52:00 New saving functions using xmlBuf and conversion * save.h: new header providing new functions currently internal and xmlBuf counterparts of old xmlBuffer based ones * xmlsave.c: convert functions to use xmlBuf as much as possible
Daniel Veillard dddeede0 2012-07-16T14:44:26 Provide new xmlBuf based saving functions * include/libxml/tree.h: adds xmlBufGetNodeContent and xmlBufNodeDump as xmlBuf based equivalents of xmlNodeGetContent and xmlNodeDump * tree.c: implements one new routine and converts xmlNodeBufGetContent to use the xmlBuf equivalent. It should behave better as a result in case of data larger than 2GB.
Daniel Veillard 345ee8b6 2012-07-16T14:40:37 Convert XInclude to the new input buffers A few xmlBuffer...() calls changed to their xmlBuf...() counterparts
Daniel Veillard 2a1d2422 2012-07-16T14:38:14 Convert catalog code to the new input buffers Only one place where the buffers fields where accessed directly
Daniel Veillard 53aa293d 2012-07-16T14:37:00 Convert C14N to the new Input buffer one case of direct access cleaned up
Daniel Veillard a6a6e70c 2012-07-16T14:22:54 Convert xmlIO.c to the new input and output buffers Relatively mechanical changes, this also led to a couple of fixes upon review of the I/O code on buffer usage.
Daniel Veillard 768eb3b8 2012-07-16T14:19:49 Convert XML parser to the new input buffers The main changes are when the internal of the buffers structure were adressed directly, we now use routines coming from buf.h The routine xmlParserInputRead() which wasn't used anywhere is deprecated too.
Daniel Veillard 65c7d3b2 2012-07-16T14:13:58 Incompatible change to the Input and Output buffers Since the whole set of structures was public, the only way to switch to size_t clean buffer is to introduce an incompatible API change. Modifying the xmlParserInputBuffer and xmlOutputBuffer structures is the best place to make this change as those structures are deep into the parser feeding data, and no public API suggest to build those manually.
Daniel Veillard 18d0db25 2012-07-13T19:51:15 Adding new encoding function to deal with the new structures * encoding.c: adds xmlCharEncFirstLineInput, xmlCharEncInput and xmlCharEncOutput * enc.h: the functions are not made public but added to this new header
Daniel Veillard ade10f2c 2012-07-12T09:43:27 Convert XPath to xmlBuf Easy as no buffer was exported in the APIs
Daniel Veillard bca22f40 2012-07-11T16:48:47 Adding a new buf module for buffers This also add converter functions between xmlBuf and xmlBuffer * buf.c buf.h: the old xmlBuffer routines but modified for size_t and using xmlBuf instead of xmlBuffer * Makefile.am: add the 2 new files * include/libxml/xmlerror.h: add an entry for the new module * include/libxml/tree.h: expose the xmlBufPtr type but not the structure which stay private
Daniel Veillard 4629ee02 2012-07-23T14:15:40 Do not fetch external parsed entities Unless explicietely asked for when validating or replacing entities with their value. Problem pointed out by Tom Lane <tgl@redhat.com> * parser.c: do not load external parsed entities unless needed * test/errors/extparsedent.xml result/errors/extparsedent.xml*: add a regression test to avoid change of the behaviour in the future
Aron Xu baaf03f8 2012-07-20T15:41:34 Fix an error in previous commit
Daniel Veillard 4f9fdc70 2012-07-18T11:38:17 Fix entities local buffers size problems
Daniel Veillard 459eeb9d 2012-07-17T16:19:17 Fix parser local buffers size problems
Daniel Veillard 740cb1a4 2012-07-18T16:05:37 Memory error within SAX2 reuse common framework There is no reason for that class of errors to not use the same handling allowing strctured error processing.
Daniel Veillard c508fa3f 2012-07-18T17:39:56 Fix a failure to report xmlreader parsing failures Related to https://bugzilla.gnome.org/show_bug.cgi?id=654567 the problem is that the provided patch failed to raise an error on xmlTextReaderRead() return when an actual parsing error occured
Daniel Veillard 549f06a8 2012-07-11T15:21:12 Expand .gitignore with more files
Daniel Veillard 8fc913fc 2012-06-06T11:29:29 Fix compilation on older Visual Studio For https://bugzilla.gnome.org/show_bug.cgi?id=666491 Reported by Matt Budd <matt.budd@gmail.com>, the added support for VS 2010 broke older version 2005 and 2008 because it assumed some of the defines where present in all versions, fix that to check the version of VS
Daniel Veillard 2e1eaca6 2012-05-25T16:44:20 Fix xmllint --xpath node initialization By default it's more sensible to initialize it to the document itself than the root element
Daniel Veillard c943f708 2012-05-23T17:10:59 Release of libxml2-2.8.0 - Makefile.am: don't package .git - configure.in : update to new release - doc/xml.html: added the new release - doc/* testapi.c: regenerated
Daniel Veillard 22030ef8 2012-05-23T15:52:45 Restore code for Windows compilation Try to keep as close to rc1 but still allow the change from Roumen for mingw
Daniel Veillard ee8f1d4c 2012-05-21T11:16:12 Cleanups before 2.8.0-rc2 new symbols, a missing comment and a fix on symbol release
Roumen Petrov 978ff224 2012-05-20T16:07:54 use mingw C99 compatible functions {v}snprintf instead those from MSVC runtime
Daniel Veillard f27c6683 2012-05-21T10:15:40 New symbols added for the next release
Daniel Veillard 59df1e4f 2012-05-21T10:14:34 Avoid an extra operation In the catalog code, tsan also complained of testing the variable without locking and that was done a few lines below
Daniel Veillard d495e6a8 2012-05-20T20:48:34 Part for rand_r checking missing Forgot to push that change in previous commit
Daniel Veillard 379ebc1d 2012-05-18T15:41:31 Cleanup on randomization tsan reported that rand() is not thread safe, so create a thread safe wrapper, use rand_r() if available. Consolidate the function, initialization and cleanup in dict.c and make sure it is initialized in xmlInitParser()
Andy Lutomirski 9d9685ad 2012-05-15T20:10:25 xmlTextReader bails too quickly on error For https://bugzilla.gnome.org/show_bug.cgi?id=654567 I use xmlTextReader to parse failed that might be incomplete. These files are the beginning of a well-formed file, but the end is missing so the file as a whole is not well-formed. The problem is that xmlTextReader starts returning errors when it encounters the early EOF, even though I haven't finished reading all of the valid data in the file. It would be helpful if xmlTextReader kept working until the very end.
Pacho Ramos 1ea6b141 2012-05-15T19:36:02 Fix undefined reference in python module For https://bugzilla.gnome.org/show_bug.cgi?id=622023 when compiled with LDFLAGS="${LDFLAGS} -Wl,-z,-defs -Wl,--no-undefined" the python module would failed due to the undefined. This add an explicit reference to python lib.
Daniel Veillard 0d51cfeb 2012-05-15T11:18:40 Fix a race in xmlNewInputStream For https://bugzilla.gnome.org/show_bug.cgi?id=643148 Reported by Bill Clarke <llib@computer.org>, it used a global variable as a counter for the input id and this was not thread safe. To avoid the race without adding unneeded locking in the parser path, move the id to the parser context instead.
Noam 9313ae85 2012-05-15T11:03:46 Fix weird streaming RelaxNG errors For https://bugzilla.gnome.org/show_bug.cgi?id=512454 The bug was to use compiled determinitic automata when the content model was found to be non-deterministic, leading to random parsing errors.
Daniel Veillard 94431ecb 2012-05-15T10:45:05 Fix various bugs in new code raised by the API checking * testapi.c: regenerated and covering new APIs * tree.c: xmlBufferDetach can't work on immutable buffers * xzlib.c: fix a deallocation error
Daniel Veillard 79ee284a 2012-05-15T10:25:31 Fix various problems with "make dist" * tree.c: missing documentation for xmlBufferDetach * doc/symbols.xml: add two new symbols xmlTextReaderRelaxNGValidateCtxt and xmlBufferDetach * doc/apibuild.py: ignore internal header xzlib.h
Daniel Veillard 9f3cdef0 2012-05-15T09:38:13 Fix a memory leak in the xzlib code The freeing function wasn't called due to a bogus #ifdef surrounding value. Also switch the code to use the normal libxml2 allocation and freeing routines.
Conrad Irwin 7d0d2a50 2012-05-14T14:18:58 Use a hybrid allocation scheme in xmlNodeSetContent On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote: >  Hi Conrad, > > that's interesting ! I was initially afraid of a sudden explosion of > memory allocations for building a tree since by default buffers tend to > "waste" memory by using doubling allocations, but that's not the case. >  xmllint --noout doc/libxml2-api.xml > when compiled with memory debug produce > > paphio:~/XML -> cat .memdump >      MEMORY ALLOCATED : 0, MAX was 12756699 > > and without your patch 12755657, i.e. the increase is minimal. Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the same amount of space as XML_ALLOC_DOUBLEIT on this example (see below). So it turns out that the default realloc() on my system actually handles this case really well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the underlying realloc() after all (sorry for misleading you). If you replace the realloc() with a bad one (like valgrind's), then the performance degrades severely. This patch implements a HYBRID allocator which has the behaviour you describe (it's like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the performance of the synthetic pathological example under valgrind. In summary: max_memory on ./xmllint --noout doc/libxml2-api.xml, valgrind time on https://gist.github.com/2656940 max_memory valgrind time before | 12755657 | 29:18.2 EXACT | 12756699 | 2:58.6 <-- this is the state after the first patch. DOUBLEIT | 12756727 | 0:02.7 HYBRID | 12755754 | 0:02.7 <-- this is the state with both patches. > > There is also the cost of creating the buffers all the time. > I need to read the code and check but I may be interested in an hybrid > approach where we switch to buffer only when the text node starts to > become too big (4k would remove nearly all usuall types of "document" > usage, i.e. not blocks of data) I tried to avoid too much buffer creation by introducing the xmlBufferDetach function, which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack" in API terms though I thought the gains would be worth it. Conrad ------8<------ To keep memory usage tight in normal conditions it's desirable to only allocate as much space as is needed. Unfortunately this can lead to problems when constructing a long string out of small chunks, because every chunk you add will need to resize the buffer. To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big) from using exact allocations to doubling buffer size every time it is full. This limits the number of buffer resizes to O(log n) (down from O(n)), and thus greatly increases the performance of constructing very large strings in this manner.
Conrad Irwin 7d553f83 2012-05-10T20:17:25 Use buffers when constructing string node lists. Hi Veillard and all, Firstly, thanks for libxml: it's awesome! I noticed recently that libxml was taking a surprisingly long time to perform some operations (many minutes instead of milliseconds), and so I did some digging. It turns out that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can be called many (many) times when assigning some content into a node. For background, I'm dealing with XML that contains emails, these can have large attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends with &#13;. This means that xmlNodeAddContentLen() is being called about 200,000 times, and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic example of this at https://gist.github.com/2656940) The attached patch works around that problem by using the existing buffer API to merge the strings together before even creating the text node, this keeps the number of realloc()s at a managable level. I'd love feedback on the patch, and am happy to fix problems with it, or explore other solutions if you think that this is barking up the wrong tree :). Thanks, Conrad P.S. Should I create a bug for this too? ------8<------ Before this change xmlStringGetNodeList would perform a realloc() of the entire new content for every XML entity in the assigned text in order to merge together adjacent text nodes. This had the effect of making xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on inputs that contained a large number of XML entities. After this change the memory management is done by the buffer API, avoiding the need to continually re-measure and realloc() the string. For my test data (6MB of 80 character lines, each ending with &#13;) this takes the time to xmlSetNodeContent from about 500 seconds to around 50ms. I have not profiled smaller cases, though I tried to minimize the performance impact of my change by avoiding unnecessary string copying. Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>
Denis Pauk a0cd075d 2012-05-11T19:31:12 HTML parser error with <noscript> in the <head> For https://bugzilla.gnome.org/show_bug.cgi?id=615785 When the <noscript> is found, <head> is closed and a <body> element is created. The real <body id="xxx"> gets skipped over, so I can't see any of the body's attributes. Just don't close <head> when encountering a <noscript> Add a regression test too
Remi Gacogne 4609e6c9 2012-05-11T15:31:05 XSD: optional element in complex type extension For https://bugzilla.gnome.org/show_bug.cgi?id=609796 Libxml2 fails to validate an instance document against a schema if an element whose type is a complex extension of some base type with an optional child element and that child element is not specified in the instance document. For example, suppose I have some complex type BaseType that is defined to have one child element in a sequence group that has minOccurs set to 0
Daniel Veillard 39d027cd 2012-05-11T12:38:23 Fix html serialization error and htmlSetMetaEncoding() For https://bugzilla.gnome.org/show_bug.cgi?id=630682 The python tests were reporting errors, some of it was due to a small change in case encoding, but the main one was about htmlSetMetaEncoding(doc, NULL) being broken by not removing the associated meta tag anymore
Daniel Veillard 2c437da7 2012-05-11T12:08:15 Fix a wrong return value in previous patch
Daniel Veillard ed35d3d7 2012-05-11T10:52:27 Fix an uninitialized variable use When compiled without SAX1 support
Brandon Slack 0c7109c8 2012-05-11T10:50:59 Fix a compilation problem with --minimum For https://bugzilla.gnome.org/show_bug.cgi?id=636750 Moved a #endif /* LIBXML_OUTPUT_ENABLED */ a few lines down to avoid reference an undefined variable
Daniel Veillard 399aaba1 2012-05-11T10:09:32 Remove redundant and ungarded include of resolv.h For https://bugzilla.gnome.org/show_bug.cgi?id=617053 This broke the build on Interix-6.0
Christian Dywan 040dcb59 2012-05-10T22:55:07 Remove git error message during configure For https://bugzilla.gnome.org/show_bug.cgi?id=635531 If git is not installed but .git was found configure would emit an error message
Patrick R. Gansterer 023206fc 2012-05-10T22:17:51 xmllint: Build fix for endTimer if !defined(HAVE_GETTIMEOFDAY) For https://bugzilla.gnome.org/show_bug.cgi?id=638649 code was broken !
John Hein a4fe9b26 2012-05-10T22:12:46 emove a bashism in confgure.in Not portable, broke on old FreeBSD
Shaun McCance 4cf7325e 2012-05-10T20:59:33 xinclude with parse="text" does not use the entity loader For https://bugzilla.gnome.org/show_bug.cgi?id=552479 The code for xinclude parse="text" was not using the registered entity loader, defeating attempts to control loading of files.
Denis Pauk fdf990c2 2012-05-10T20:40:49 Allow to parse 1 byte HTML files For https://bugzilla.gnome.org/show_bug.cgi?id=605740 File 1 byte long were not accepted by the HTML push parser
Patrick R. Gansterer 204f1f14 2012-05-10T20:24:00 undef ERROR if already defined
Martin Schröder b91111b4 2012-05-10T18:52:37 Patch that fixes the skipping of the HTML_PARSE_NOIMPLIED flag For https://bugzilla.gnome.org/show_bug.cgi?id=642916 I just noticed that the HTML_PARSE_NOIMPLIED flag that you can pass to the HTML-Parser methods doesn't do anything. Its intended purpose is to stop the HTML-parser from forcibly adding a pair of html/body tags if the stream does not contain any. This is highly useful when you don't need this level of strictness. Unfortunately, specifying it doesn't work, because the option is not copied into the parsing context.