Log

Author Commit Date CI Message
Daniel Veillard 87f3287d 2013-04-01T13:33:42 Fix tree iterators broken by 2to3 script
Daniel Veillard 2cb6bf8e 2013-03-30T21:38:20 update all tests for Python3 and Python2
Daniel Veillard 3798c4ad 2013-03-29T13:46:24 Fix compilation on Python3 while still compiling on recent Python2: - change the handling of files, tweak the generator, get the fd instead of the FILE *, dup it and fdopen based on mode, add a Release function on Python3 and call to flush from the generated python stubs - switch to using Capsules instead of CObjects - fix PyString to PyBytes - fix PyInt to PyLong - tweak the module registration to compile on both versions - drop PyInstance check for passed xmlNodes and instead check attributes presence Daniel
Daniel Veillard d8a75bff 2013-03-28T00:16:42 Converting apibuild.py to python3 not finished ....
Daniel Veillard 6f184651 2013-03-29T15:17:40 A few more fixes for python 3 affecting libxml2.py need a few changes to the generator and the libxml.py stub
Daniel Veillard 3cb1ae26 2013-03-27T22:40:54 First pass at starting porting to python3
Daniel Veillard a5e513a5 2013-03-29T14:36:15 Fix a uneeded and wrong extra link parameter
Daniel Veillard b8e3f80d 2013-03-28T09:46:20 updated configure.in for python3
Daniel Veillard 0ab8ce53 2013-03-28T08:47:42 Switched comment in file to UTF-8 encoding
Daniel Veillard 215a7296 2013-03-28T11:23:45 Extend gitignore
Shaun McCance 519bc6a3 2012-09-19T13:41:56 Add support for xpathRegisterVariable in Python
Daniel Veillard 483272f3 2013-03-27T13:37:14 Added a regression tests from bug 694228 data Provided by Mark Rowe <mrowe@apple.com>
Daniel Veillard ab0e3504 2013-03-27T13:21:38 Activate detection of encoding in external subset https://bugzilla.gnome.org/show_bug.cgi?id=694228 the ctxt->encoding was percolated down when parsing the external subset leading to failures
Daniel Veillard 113384f1 2013-03-27T11:43:41 Add documentation for xmllint --xpath https://bugzilla.gnome.org/show_bug.cgi?id=694822 this wasn't documented in the man page, and there was a typo in xmllint help output.
Mikhail Titov 8e2098ae 2013-03-27T11:00:31 Fix an output buffer flushing conversion bug for https://bugzilla.gnome.org/show_bug.cgi?id=694982 On a flush operation, everything must be converted
Denis Pauk e1631e1c 2013-03-10T12:47:37 Few cleanup patches for Windows https://bugzilla.gnome.org/show_bug.cgi?id=690878 provided by Cole <coleharrisjohnson@gmail.com>
Daniel Veillard f7aeda24 2013-03-23T10:31:26 Fix the URL of the SAX docuemntation from James as it has moved
Csaba László 1f6c42cf 2013-03-18T15:30:00 Fix an old bug in xmlSchemaValidateOneElement Recently I have run into the very same problem Tiberius Duluman did back in Wed, 13 May 2009 15:56:55 +0300 ([xml] Bug in xmlSchemaValidateOneElement function). Now I can proof now that his problem is a valid problem. I checked the latest available version of xmlschemas.c (2.9.0.) and the problem is still there! I think I have found a solution to the problem which I'd like proof with you: My quick solution to the problem is to replace line 27849 in xmlschemas.c (v2.9.0.) in function xmlSchemaVDocWalk valRoot = xmlDocGetRootElement(vctxt->doc); with this one: valRoot = vctxt->validationRoot ? vctxt->validationRoot : xmlDocGetRootElement(vctxt->doc); Currently I'm using version 2.7.8. in Windows and this change seems to solve the problem.
Daniel Veillard cff2546f 2013-03-11T15:57:55 Cache presence of '<' in entities content slightly modify how ent->checked is used, and use the lowest bit to keep the information
Daniel Veillard a3f1e3e5 2013-03-11T13:57:53 Avoid extra processing on entities If an entity has already been checked for correctness no need to check it on every reference
Gilles Espinasse a0989068 2013-03-04T22:46:21 Fix configure cannot remove messages this is the other way to solve ./configure cannot remove messages by simply removing rm detection in configure.in There is already a raw 'rm -f' at the end on configure.in
Daniel Veillard c100e69c 2013-02-28T19:02:32 fix schema validation in combination with xsi:nil Based on Thomas Gamper <icicle@cg.tuwien.ac.at> findings and initial patch There is no point doing a regexp validation of further content if there actually is no further content because the element is nilled.
Steve Wolf 19d785b5 2013-02-28T18:22:46 xmlCtxtReadFile doesn't work with literal IPv6 URLs https://bugzilla.gnome.org/show_bug.cgi?id=694185 RedHat Bug 624626 discusses the new behavior of libxml regarding brackets around IPv6 addresses. In earlier versions such as 2.6.27, uri.c stripped the brackets (e.g. uri->server == "fdf2:1e39:73d1:934e::119"); in the current version it returns IPv6 addresses with brackets intact (e.g. uri->server == "[fdf2:1e39:73d1:934e::119]"). Thus in 2.9.0, xmlCtxtReadFile() has a problem when it is passed a URL containing a literal IPv6 address. xmlCtxReadFile() and its subroutines pass uri->server unchanged to getaddrinfo(), which doesn't recognize a bracketed IPv6 address, so the read fails. This strips the [ and ] from IPv6 addresses allowing getaddrinfo() to work properly with such URIs.
Alexey Neyman d749528a 2013-02-27T13:11:47 Silent the new python test on input Just make it silent if there is no error
Alexey Neyman a9016c49 2013-02-25T16:07:09 Fix a few problems with setEntityLoader 1. Setting entity loader does not increment the refcount on the Python object passed in. This works only if the object is not deleted. For example, the following code results in segmentation fault in Python interpreter when attempting to process any document: [[[ def register_entity_loader(): def entity_loader(URL, ID, ctxt): ... libxml2.setEntityLoader(entity_loader register_entity_loader() ]]] 2. setEntityLoader() does not verify if the passed object is callable. If it is not, current implementation attempts to call it anyway and failing that, silently moves on to default entity loader. Attached patch makes setEntityLoader raise ValueError exception if non-callable object is passed. 3. In debug mode, pythonExternalEntityLoader() outputs the result object to stderr, while the messages before and after the object (description + newline) go to stdout. Attached patch makes them all go to stdout.
Alexey Neyman 48da90bc 2013-02-25T15:54:25 Python binding for xmlRegisterInputCallback It is possible to make xmlIO handle any protocol by means of xmlRegisterInputCallback(). However, that function is currently only available in C API. So, the natural solution seems to be implementing Python bindings for the xmlRegisterInputCallback. * python/generator.py: skip xmlPopInputCallbacks * python/libxml.c python/libxml.py python/libxml_wrap.h: implement the wrappers * python/tests/input_callback.py python/tests/Makefile.am: also add a test case
Alexey Neyman e32ceb93 2013-02-20T18:28:25 Python bindings: DOM casts everything to xmlNode I noticed another issue with Python bindings of libxml: the access methods do not cast the pointers to specific classes such as xmlDtd, xmlEntityDecl, etc. For example, with the following document: <?xml version="1.0"?> <!DOCTYPE root [<!ELEMENT root EMPTY>]> <root/> the following script: import libxml2 doc = libxml2.readFile("c.xml", None, libxml2.XML_PARSE_DTDLOAD) print repr(doc.children) prints: <xmlNode (root) object at 0xb74963ec> With properly cast nodes, it outputs the following: <xmlDtd (root) object at 0xb746352c> The latter object (xmlDtd) enables one to use DTD-specific methods such as debugDumpDTD(), copyDTD(), and so on.
Daniel Veillard 23f05e0c 2013-02-19T10:21:49 Detect excessive entities expansion upon replacement If entities expansion in the XML parser is asked for, it is possble to craft relatively small input document leading to excessive on-the-fly content generation. This patch accounts for those replacement and stop parsing after a given threshold. it can be bypassed as usual with the HUGE parser option.
Daniel Veillard bf058dce 2013-02-13T18:19:42 Fix the flushing out of raw buffers on encoding conversions https://bugzilla.gnome.org/show_bug.cgi?id=692915 the new set of converting functions tried to limit the encoding conversion of the raw buffer to the consumption one to work in a more progressive fashion. Unfortunately this was bad for performances and led to errors on progressive parsing when a very large chunk was close to the end of the document. Fix the new internal function and switch back to the old way of converting. Fix another bug in the process.
Daniel Veillard de0cc20c 2013-02-12T16:55:34 Fix some buffer conversion issues https://bugzilla.gnome.org/show_bug.cgi?id=690202 Buffer overflow errors originating from xmlBufGetInputBase in 2.9.0 The pointers from the context input were not properly reset after that call which can do reallocations.
Mark Salter 60adeea9 2013-02-11T12:45:56 Fix rpmbuild --nocheck if the %check section was omitted some of the file needed for packaging would not be generated, move the generation to the proper place.
Daniel Veillard 23922c53 2013-02-11T11:52:44 When calling xmlNodeDump make sure we grow the buffer quickly Make sure the underlying new buffer allocated use a double-it scheme for the time of the dump.
Daniel Veillard 2af19f98 2013-01-28T17:44:53 Cleanup of a duplicate test in an and expression, pointed by Thomas Jarosch <thomas.jarosch@intra2net.com> Daniel
Daniel Veillard eea38159 2013-01-28T16:55:30 Cleanup on duplicate test expressions As pointed out by Thomas Jarosch <thomas.jarosch@intra2net.com> Daniel
Patrick Gansterer 9c8eaabe 2013-01-04T12:41:53 Fix compiler warning after 153cf15905cf4ec080612ada6703757d10caba1e Add missing cast for xmlNop to silence a compiler warning.
Dan Winship cf8f0424 2012-12-21T11:13:31 Fix an error in the progressive DTD parsing code For https://bugzilla.gnome.org/show_bug.cgi?id=689958 We were looking for the wrong character in the input stream
Daniel Veillard e4d16d79 2012-12-21T10:58:14 xmllint should not load DTD by default when using the reader
Daniel Richard a0571ebe 2012-12-12T17:16:00 Fix for win32/configure.js and WITH_THREAD_ALLOC Building git master gives me the following error on Windows; this patch fixes it: icl /EP /nologo /I..\include /D "NOLIBTOOL" /D "_REENTRANT" libxml2.def. src > int.msvc\libxml2.def libxml2.def.src Z:\...\libxml2-git8123c4f6_debug\win32\../include/libxml/xmlversion.h(105): error: unrecognized token #if @WITH_THREAD_ALLOC@ ^ Z:\...\libxml2-git8123c4f6_debug\win32\../include/libxml/xmlversion.h(105): error: expected an expression #if @WITH_THREAD_ALLOC@ ^ Z:\...\libxml2-git8123c4f6_debug\win32\../include/libxml/xmlversion.h(105): error: unrecognized token #if @WITH_THREAD_ALLOC@ ^ NMAKE : fatal error U1077: 'icl' : return code '0x2' Stop.
Petr Sumbera 6f49c73b 2012-12-12T15:41:30 Try IBM-037 when looking for EBCDIC handlers http://en.wikipedia.org/wiki/EBCDIC_037 as it is another variat of EBCDIC
Daniel Veillard 8123c4f6 2012-11-08T16:24:07 Fix Broken multi-arch support in xml2-config partial revert of 87b4d6f6105658a99b976f812223c8edf4469265 coming from Fedora/RHEL/... but breaking other distros as pointed out by Daniel Richard
Michael Wood fb27e2cd 2012-09-28T08:59:33 Fix spelling of "length".
Tim Starling 0ad948ed 2012-10-29T13:41:55 Define LIBXML_THREAD_ALLOC_ENABLED via xmlversion.h Otherwise, direct calls to xmlFree() etc. from the application will use a different set of allocation functions to what was used to allocate the memory internally.
Daniel Veillard 6a36fbe3 2012-10-29T10:39:55 Fix potential out of bound access
Daniel Veillard 4ea74a44 2012-10-29T10:27:18 Fix a portability issue for GCC < 3.4.0
Daniel Veillard 153cf159 2012-10-26T13:50:47 Fix large parse of file from memory https://bugzilla.redhat.com/show_bug.cgi?id=862969 The new code trying to detect excessive input lookup would just get wrong sometimes in the case of very large file parsed directly from memory.
Daniel Veillard 711b15d5 2012-10-25T19:23:26 Fix a bug in the nsclean option of the parser Raised as a side effect of: https://bugzilla.gnome.org/show_bug.cgi?id=663844
Daniel Veillard a7982ce2 2012-10-25T15:39:39 Adding streaming validation to runtest checks
Daniel Veillard 1abd221b 2012-10-25T15:37:50 Add a --pushsmall option to xmllint To test the push parser with small chunks or 10 bytes
Daniel Veillard 6c91aa38 2012-10-25T15:33:59 Fix a regression in 2.9.0 breaking validation while streaming https://bugzilla.gnome.org/show_bug.cgi?id=684774 with help from Kjell Ahlstedt <kjell.ahlstedt@bredband.net>
Daniel Veillard 87b4d6f6 2012-10-11T14:44:22 Spec cleanups and a fix for multiarch support
Daniel Veillard 7457c67f 2012-10-11T12:25:51 Remove potential calls to exit()
Daniel Veillard 713434d2 2012-09-26T10:21:06 Silence a clang warning as reported by Hans Wennborg <hans@chromium.org>
Daniel Veillard 7e86eb5d 2012-09-20T21:46:19 Cleanup the Copyright to be pure MIT Licence wording
Daniel Richard bbe19451 2012-09-18T11:15:06 Windows build fixes Building 2.9.0 on MSVC7.1 was failing This is because HAVE_CONFIG_H is not #defined The patch addresses the above, adds testrecurse.exe and the standard "make check" suite of tests to the MSVC makefile, and also fixes the following (MSVC7.1) warnings: buf.c(674) : warning C4028: formal parameter 1 different from declaration libxml2\timsort.h(71) : warning C4028: formal parameter 1 different from declaration
Friedrich Haubensak 3f6cfbd1 2012-09-12T17:34:53 Fix a thread portability problem cannot compile libxml2-2.9.0 using studio 12.1 compiler on solaris 10 I.M.O. structure initializer (as PTHREAD_ONCE_INIT) cannot be used in a structure assignment anyway
Wouter Van Rooy e7715a59 2012-09-14T14:39:42 rand_seed should be static in dict.c For https://bugzilla.gnome.org/show_bug.cgi?id=683933 rand_seed should be a static variable in dict.c We ran into a problem with another library that exports rand_seed as a function. Combined with 2.7.8 this was not a problem but later versions have this problem.
Jan Pokorný 81d7a824 2012-09-13T15:56:51 Fix typos in parser comments Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Daniel Veillard 5d04ad11 2012-09-11T17:17:15 Downgrade autoconf requirement to 2.63 It was automatically bumped to 2.68 and that's not needed
Daniel Veillard 38bbd341 2012-09-11T15:00:08 Release of libxml2-2.9.0 * libxml.spec.in: update * doc/*: updated and regenerated * libxml2.syms testapi.c: regenerated
Daniel Veillard 7651606f 2012-09-11T14:02:08 Various cleanups to avoid compiler warnings
Daniel Veillard 742a0bbb 2012-09-11T13:37:30 Keep libxml2.syms when running "make distclean"
Daniel Veillard f8e3db04 2012-09-11T13:26:36 Big space and tab cleanup Remove all space before tabs and space and tabs at end of lines.
Csaba Raduly 429d3a0a 2012-09-11T11:50:25 Allow to set the quoting character of an xmlWriter It's otherwise impossible to set the quoting character of attribute values of an xmlWriter.
Daniel Veillard e00778b4 2012-09-08T21:09:26 Followup to LibXML2 docs/examples cleanup patch
Daniel Veillard f933c898 2012-09-07T19:32:12 Keep non-significant blanks node in HTML parser For https://bugzilla.gnome.org/show_bug.cgi?id=681822 Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes are removed from a HTML document, for example: <html> <head> <title>This is a test.</title> </head> <body> <p>This is a test.</p> </body> </html> is read as: <html><head><title>This is a test.</title></head><body> <p>This is a test.</p> </body></html> This changes the default behaviour but the old behaviour is available as expected when using the parser flag HTML_PARSE_NOBLANKS Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com> * HTMLparser.c: change various places in the parser where ignorable_space SAX callback was called without checking for the parser flag preference * xmllint.c: make sure we use the new flag even for HTML parsing * result/HTML/*: this modifies the output of a number of tests
Daniel Richard 878ec9db 2012-09-07T14:52:17 Second round of cleanups for LibXML2 docs/examples configure.am: * Explicitly disallow --enable-rebuild-docs when builddir != srcdir, per what you said about needing to build docs with an in-source build doc/Makefile.am: * Ensure that xmlversion.h is in the source tree before running apibuild.py, to avoid generating an incomplete libxml2-api.xml * Update the .PHONY target (forgot to do this earlier) doc/devhelp/Makefile.am: * Wrap the doc-generating rule in an "if REBUILD_DOCS" conditional so it doesn't cause trouble for regular users * Added a handy-dandy "rebuild" target doc/examples/index.py: * NOTE: You need to run this script to regenerate the files it creates, and then commit the newly-updated files! The generated files currently in git master (e.g. doc/examples/Makefile.am) are out of date even before this patch! * index.html really needs to be in EXTRA_DIST * Wrap the doc-generating rules in an "if REBUILD_DOCS" conditional, because they shouldn't be active otherwise
Daniel Veillard 47881284 2012-09-07T14:24:50 Add a forbidden variable error number and message to XPath Related to https://bugzilla.gnome.org/show_bug.cgi?id=680938 When the XML_XPATH_NOVAR flags is being used it means that variables are forbidden, not that they are missing
Michael Stahl 55b899a2 2012-09-07T12:14:00 Support long path names on WNT so we've got this patch to libxml2 2.7.6 in the LibreOffice code base, inherited from OOo. it fixes a definite problem, which is that Windows has a rather low maximum path length restriction, and there is a special trick on NT whereby path names can be prefixed with "\\?\", in which case the maximum length is 32k, which ought to be sufficient even for bloated office suites :) I'll attach the patch to the xmlCanonicPath function. note that i didn't write this and am by no means an expert on either Microsoftean platforms or libxml so maybe it's not the best way to do it.
Daniel Veillard 1bd45d13 2012-09-05T15:35:19 Change the XPath code to percolate allocation errors looping 1000 time on an error stating that a nodeset has grown out of control is useless, make sure we percolate error up to the various loops and break when errors occurs
Daniel Veillard 7d4c529a 2012-09-05T11:45:32 Improve HTML escaping of attribute on output Handle special cases of &{...} constructs as hinted in the spec http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1 and special values as comment <!-- ... --> used for server side includes This is limited to attribute values in HTML content.
Daniel Veillard 857104cd 2012-09-04T14:25:23 Remove all .cvsignore as they are not used anymore For https://bugzilla.gnome.org/show_bug.cgi?id=682985 suggested by Adrian Bunk <bunk@stusta.de>
Daniel Veillard 7a2215db 2012-09-04T12:05:17 Fix reuse of xmlInitParser While xmlCleanupParser() should not be used unless complete control is insured over the programe making sure libxml2 is not in use anywhere It should still be usable, and allow a sequence of xmlInitParser(); xmlCleanupParser(); calls if needed, the problem is that the thread key wasn't reallocated on subsequent xmlinitParser() calls leading to corruption of pthread keys used by the program. * threads.c: make sure xmlCleanupParser() reset the pthread_once() global variable driving thread key allocation.
Daniel Veillard 510e7583 2012-09-04T11:50:36 Fix a Timsort function helper comment
Daniel Veillard 28f5e1a2 2012-09-04T11:18:39 Fix potential crash on entities errors Related to https://bugs.launchpad.net/lxml/+bug/502959 Basically the core of the issue is that if an entity references another entity, then in case we are replacing entities content, we should always do so by copying the referenced content as long as the reference is done within the entity. Otherwise, if for some reason there is a later parsing error that entity content may be freed. Complex scenario exposed by command: thinkpad:~/XML/diveintopython-5.4/xml -> valgrind --db-attach=yes ../../xmllint --loaddtd --noout --noent diveintopython.xml Document references &a; a references &b; we references b content directly in by linking in the a content a has an error further down we free a, freeing the chunk from b Document references &b; after &a; we try to copy b content, but it was freed already => segfault * parser.c: never reference directly entity content without copying if we aren't in the document main entity
Christian Weisgerber 3b6d7b9a 2012-08-28T23:40:56 xml2-config.1 markup error There is a spurious ".l" in the xml2-config.1 man page. This line can simply be removed. $ mandoc -Tlint -Werror xml2-config.1 xml2-config.1:12:2: ERROR: skipping unknown macro: .l
Arfrever Frehtes Taifersar Arahesis 1f01f49b 2012-08-28T22:16:50 Handle ICU_LIBS as LIBADD, not LDFLAGS to prevent linking errors For https://bugzilla.gnome.org/show_bug.cgi?id=677606 For https://bugs.gentoo.org/show_bug.cgi?id=417539 If libxml2-2.8.0 is built with --with-icu --with-python on a system that has an older version of libxml2 installed, then during "make install", libxml2mod.so gets relinked to the systemwide version of libxml2.so.2 instead of libxml2.so.2 from the build tree, and fails at runtime if symbol versions from the older libxml2.so.2 are not available. This effectively makes it impossible to build a libxml2-2.8.0 binary package on a system that does not already have libxml2-2.8.0 installed. Investigation by Rafał Mużyło and Arfrever Frehtes Taifersar Arahesis revealed the cause of the problem to be that libxml2's configure was adding ICU_LIBS to LDFLAGS instead of to LIBADD. This resulted in GNU libtool using the wrong argument order in its relinking command that gets run during "make install".
Akira TAGOH 961b535c 2012-07-03T14:13:59 Bug 676544 - fails to build with --without-sax1 Added some ifdef'd LIBXML_SAX1_ENABLED to make it buildable with --without-sax1 configure option.
Rob Richards 236ea1ea 2012-08-27T11:56:07 fix builds not having stdint.h
Rob Richards 8f2d6b57 2012-08-27T05:08:54 initialize var
Daniel Veillard 8880170e 2012-08-27T16:20:05 Fix the XPath arity check to also check the XPath stack limits Example xmlXPathNormalizeFunction() would do CHECK_ARITY(1) and the expect valuePop(ctxt); to return an object, except now valuePop() looks at the XPath stack frames and fails returning NULL, and we end up crashing dereferencing the object. Real solution is to exten CHECK_ARITY() and recompile all XPath functions using it.
Pietro Cerutti 890faa54 2012-08-27T13:24:08 Fix problem with specific and generic error handlers It seems that setting up both xmlTextReaderSetStructuredErrorHandler and xmlSetStructuredErrorFunc confuses the code around error.c:592 and following This patch works with any combinations of using xmlSetStructuredErrorFunc, xmlTextReaderSetStructuredErrorHandler, both, or none.
Daniel Veillard 466fcdaa 2012-08-27T12:03:40 Avoid a potential infinite recursion Which can happen when eliminating epsilon transitions, as reported by Pavel Madr <pmadr@opentext.com>
Vojtech Fried 3e031b7d 2012-08-24T16:52:44 Switching XPath node sorting to Timsort I use libxml xpath engine on quite large (and mostly "flat") xml files. It seems that Shellsort, that is used in xmlXPathNodeSetSort is a performance bottleneck for my case. I have read some posts about sorting in libxml in the libxml archive, but I agree that qsort was not the way to go. I experimented with Timsort instead and my results were good for me. For about 10000 nodes, my test was about 5x faster with Timsort, for 1000 nodes about 10% faster, for small data files, the difference was not measurable. * timsort.h: the algorithm, kept in a separate header * xpath.c: plug in the new algorithm in xmlXPathNodeSetSort * Makefile.am: add the header to the EXTRA_DIST * doc/apibuild.py: avoid indexing the new header
Daniel Veillard 73f94c60 2012-08-24T16:38:54 Small cleanup for valgrind target
Nick Wellnhofer 62270539 2012-08-19T19:42:38 Optimizing '//' in XPath expressions When investigating the libxslt performance problem reported in bug #657665, I found that '//' in XPath expressions can be very slow when working on large subtrees. One of the reasons is the seemingly quadratic time complexity of the duplicate checks when merging result nodes. The other is a missed optimization for expressions of the form 'descendant-or-self::node()/axis::test'. Since '//' is expanded to '/descendant-or-self::node()/', this type of expression is quite common. Depending on the axis of the expression following the 'descendant-or-self' step, the following replacements can be made: from descendant-or-self::node()/child::test to descendant::test from descendant-or-self::node()/descendant::test to descendant::test from descendant-or-self::node()/self::test to descendant-or-self::test from descendant-or-self::node()/descendant-or-self::test to descendant-or-self::test 'test' can be any kind of node test. With these replacements the possibly huge result of 'descendant-or-self::node()' doesn't have to be stored temporarily, but can be processsed in one pass. If the resulting nodeset is small, the duplicate checks aren't a problem. I found that there already is a function called xmlXPathRewriteDOSExpression which performs this optimization for a very limited set of cases. It employs a complicated iteration scheme for rewritten expressions. AFAICS, this can be avoided by simply changing the axis of the expression like described above. With the attached patch against libxml2 and the files from bug #657665 I got the following results. Before: $ time xsltproc/xsltproc --noout service-names-port-numbers.xsl service-names-port-numbers.xml real 2m56.213s user 2m56.123s sys 0m0.080s After: $ time xsltproc/xsltproc --noout service-names-port-numbers.xsl service-names-port-numbers.xml real 0m3.836s user 0m3.764s sys 0m0.060s I also ran the libxml2 and libxslt test suites with the patch and couldn't detect any breakage. Nick >From e0f5a8261760e4f257b90410be27657e984237c8 Mon Sep 17 00:00:00 2001 From: Nick Wellnhofer <wellnhofer@aevum.de> Date: Sun, 19 Aug 2012 18:20:22 +0200 Subject: [PATCH] Optimizations for descendant-or-self::node() Currently, the function xmlXPathRewriteDOSExpression optimizes expressions of type '//child'. Instead of adding a 'rewriteType' and doing a compound traversal, the same can be achieved simply by setting the axis of the node test from 'child' to 'descendant'. There are also many other cases that can be optimized similarly. This commit augments xmlXPathRewriteDOSExpression to essentially rewrite the following subexpressions: - descendant-or-self::node()/child:: to descendant:: - descendant-or-self::node()/descendant:: to descendant:: - descendant-or-self::node()/self:: to descendant-or-self:: - descendant-or-self::node()/descendant-or-self:: to descendant-or-self:: Since the '//' shortcut in XPath is translated to '/descendant-or-self::node()/', this greatly speeds up expressions using '//' on large subtrees.
Daniel Veillard c70d185a 2012-08-23T23:28:04 Fix an XSD error when generating internal automata When generating a sequence add an extra epsilon transition to avoid further constructs from entering via the last state Bug reported by Johan Corveleyn <jcorvel@gmail.com>
Daniel Veillard 82cdfc4e 2012-08-22T11:05:09 Expose xmlBufShrink in the public tree API As suggested by Andrew W. Nosenko: Proposal: expose the new xmlBufShrink() to the "public" API for compatibility with xmlBufUse(). Reason: the following scenario: 1. Read something into xmlParserInputBuffer (e.g. using xmlParserInputBufferRead()) 2. Extract content through xmlBufContent() 3. Extract content length through xmlBufUse(). Result have type 'size_t'. 4. Use this content 5. Now, you need to shrink the buffer. How to do it? Doing that through legacy xmlBufferShrink() is unsafe because it uses 'unsigned int' and the whole point of introducing the new API was handling the cases, when 'unsigned int' is not enough. Therefore, need to use the new xmlBufShrink(). But it is "private". Therefore, I propose to expose the new xmlBufShrink() in the same way, as xmlBufContent() and xmlBufUse() are exposed.
Daniel Veillard ff7227f2 2012-08-20T20:58:24 Patch for portability of latin characters in C files Coming from LibreOffice repository: http://cgit.freedesktop.org/libreoffice/core/plain/libxml2/libxml2-latin.patch
Vitaly Ostanin dce1c8ba 2012-08-17T20:42:52 Patch for xinclude of text using multibyte characters for bug https://bugzilla.gnome.org/show_bug.cgi?id=633166 When you xinclude a text file, reading portions (by 4000 bytes) of the buffer incorrectly handled the situation when the end comes across portions of the bytes in a multibyte character.
Daniel Veillard 40851d0c 2012-08-17T20:34:05 Fix a segfault on XSD validation on pattern error As reported by Sven <sven@e7o.de>: The following pattern will cause a segmentation fault in my Apache (using PHP5 to validate a XML against a XSD): <xs:pattern value="(.*)|"/> Fix a cascade of error handling failures which led to the crash in that scenario.
Conrad Irwin b60061a7 2012-07-27T15:42:27 Visible HTML elements close the head tag In HTML email it's common to find arbitrary fragments of HTML, the one that triggered this change was of the form: <meta><font></font><div>... Before this change the <font> tag was part of the implicit <head> that gets created for the <meta> tag, after this change, it is part of the <body>, which more closely matches the behaviour of modern HTML implementations.
John Bradshaw c9a575cf 2012-08-17T11:59:01 libxml(3) manpage typo fix
Daniel Veillard dfc0aa0a 2012-08-17T11:04:24 GetProcAddressA is available only on WinCE As Roumen pointed out "After recent commits I count not link build for mingw* host as GetProcAddressA is missing." Looking around a bit it seems you are right: http://voidnish.wordpress.com/2005/06/14/getprocaddress-in-unicode-builds/ except it was introduced in Windows CE http://msdn.microsoft.com/en-us/library/ms885634.aspx
Daniel Richard G ec4fc529 2012-08-17T10:04:30 More updates and cleanups on autotools and Makefiles Makefile.am, example/Makefile.am: * Replaced the obsolete INCLUDES variable with AM_CPPFLAGS/AM_CFLAGS acinclude.m4: * autoupdate replaced AC_FD_CC with AS_MESSAGE_LOG_FD autogen.sh: * Added -Wall to the autoreconf invocation, which turned up a whole slew of warnings that are fixed by this patch configure.in: * Most of the changes are due to autoupdate, with subsequent manual tidying * Note that autoupdate bumped the AC_PREREQ version from 2.59 to 2.68. If you normally use an older version of Autoconf, and everything works fine if you comment out that directive, feel free to bump down the version accordingly. * Ensure that #include directives in C fragments always have no whitespace to the left of the '#' mark, as some preprocessors need that to be in the first column example/Makefile.am: * Don't need DEPS * Use plain LDADD instead of LDADDS; if all programs in this file need to link against the same set of libraries, then this is all you need
Daniel Richard G 6842ee81 2012-08-17T09:58:38 More cleanups to the documentation part of libxml2 doc/Makefile.am: * Build what's in doc/ before doc/devhelp/, as the dependency graph flows that way * Add "--path $(srcdir)" so that xsltproc can find DTDs in srcdir * Replaced $(top_srcdir)/doc with an equivalent $(srcdir) * Qualified libxml2-api.xml with $(srcdir) as it's always generated there * Rewrote the dependencies for libxml2-api.xml so that xmlversion.h doesn't throw everything off doc/devhelp/Makefile.am: * Use Automake constructs to install the HTML files instead of an install-data-local rule * Reorganized the file a bit (hello whitespace!) * EXTRA_DIST doesn't need to list so many files now that dist_devhelp_DATA is being used * Only print "Rebuilding devhelp files" if rebuilding is actually occurring doc/examples/index.py: * Make the "this file is auto-generated" banner more prominent * Autotools updates: Use AM_CPPFLAGS/AM_CFLAGS instead of INCLUDES * Got rid of DEPS as it's not needed (Automake already sees the dependency on libxml2.la by way of LDADD(S)) * Replaced LDADDS with LDADD, which is applied to all programs listed in the file. Since all the test programs have the same link dependencies, this way is more concise yet equivalent. * Remove the *.tmp files via "make clean" instead of having the test programs do it themselves (more on this later) * Invoke index.py in srcdir, as it pretty much needs to run there * Restructured the index.html rule so that only the xmllint invocation is allowed to fail * Use $(MKDIR_P) instead of $(mkinstalldirs), $(VAR) instead of @VAR@ * Remove symlinks for test?.xml in an out-of-source build * Sort lists for neatness * Better formatting for EXTRA_DIST and noinst_PROGRAMS variables * Simplified the Automake bits printed for each program: *_LDFLAGS doesn't need to be specified as it's empty anyway, *_DEPENDENCIES is redundant, *_LDADD isn't needed due to the global LDADD * Added a bit that symlinks in test?.xml from srcdir in out-of-source builds. This allows the reader4 test to read these files in the current directory, which ensures that the output always looks the same (i.e. does not contain references to srcdir) * Don't hide the test program invocation (or else it's hard to tell which test failed), and don't use superfluous parentheses * NOTE: If you check in these changes, be sure to run this script and also check in the updated files that it generates! doc/examples/*.c: * Updated the test: lines so that + "&&" is used to separate commands instead of ";" so that errors are not masked + reference files are qualified with $(srcdir)/ + no "rm" takes place -- these are a problem because (1) if a test fails, it's useful to have the output file ready for inspection; (2) the "rm" invocation masks a potential non-zero exit status from diff (This is why I added the CLEANFILES line above) doc/examples/io1.res: * Updated this ref file so that the test passes. (This is correct, right?) doc/examples/reader4.res: * Changed this back to its original form, as the symlinking of test?.xml means this file no longer has to contain path prefixes on the filenames doc/examples/testWriter.c: * Changed the output filenames to *.tmp instead of *.res, partly for consistency, partly to not have to add special cases to CLEANFILES doc/examples/xpath1.c: * Removed the "./" prefix on the test invocation, which is redundant as index.py already adds one
Eric Zurcher e0286980 2012-08-15T16:30:10 More changes for Win32 compilation
Eric Zurcher 414f269a 2012-08-15T13:52:09 Basic changes for Win32 builds of release 2.9.0: compile buf.c Makes builds on Windows (whether by MSVC, BCB, or MinGW) to compile buf.c
Daniel Veillard 1f972e9f 2012-08-15T10:16:37 Cleanup some of the parser code Prefetching assumptions about the amount of data read in GROW should be backed up with test for 0 termination when at the end of the buffer.
Daniel Veillard ef4526ad 2012-08-15T09:14:31 Fix a variable name in comment
Daniel Veillard baaeadcf 2012-08-15T09:13:54 Regenerated testapi.c