|
87f3287d
|
2013-04-01T13:33:42
|
|
Fix tree iterators broken by 2to3 script
|
|
2cb6bf8e
|
2013-03-30T21:38:20
|
|
update all tests for Python3 and Python2
|
|
3798c4ad
|
2013-03-29T13:46:24
|
|
Fix compilation on Python3
while still compiling on recent Python2:
- change the handling of files, tweak the generator, get the fd
instead of the FILE *, dup it and fdopen based on mode, add a
Release function on Python3 and call to flush from the generated
python stubs
- switch to using Capsules instead of CObjects
- fix PyString to PyBytes
- fix PyInt to PyLong
- tweak the module registration to compile on both versions
- drop PyInstance check for passed xmlNodes and instead check
attributes presence
Daniel
|
|
d8a75bff
|
2013-03-28T00:16:42
|
|
Converting apibuild.py to python3
not finished ....
|
|
6f184651
|
2013-03-29T15:17:40
|
|
A few more fixes for python 3 affecting libxml2.py
need a few changes to the generator and the libxml.py stub
|
|
3cb1ae26
|
2013-03-27T22:40:54
|
|
First pass at starting porting to python3
|
|
a5e513a5
|
2013-03-29T14:36:15
|
|
Fix a uneeded and wrong extra link parameter
|
|
b8e3f80d
|
2013-03-28T09:46:20
|
|
updated configure.in for python3
|
|
0ab8ce53
|
2013-03-28T08:47:42
|
|
Switched comment in file to UTF-8 encoding
|
|
215a7296
|
2013-03-28T11:23:45
|
|
Extend gitignore
|
|
519bc6a3
|
2012-09-19T13:41:56
|
|
Add support for xpathRegisterVariable in Python
|
|
483272f3
|
2013-03-27T13:37:14
|
|
Added a regression tests from bug 694228 data
Provided by Mark Rowe <mrowe@apple.com>
|
|
ab0e3504
|
2013-03-27T13:21:38
|
|
Activate detection of encoding in external subset
https://bugzilla.gnome.org/show_bug.cgi?id=694228
the ctxt->encoding was percolated down when parsing the external
subset leading to failures
|
|
113384f1
|
2013-03-27T11:43:41
|
|
Add documentation for xmllint --xpath
https://bugzilla.gnome.org/show_bug.cgi?id=694822
this wasn't documented in the man page, and there was a typo in
xmllint help output.
|
|
8e2098ae
|
2013-03-27T11:00:31
|
|
Fix an output buffer flushing conversion bug
for https://bugzilla.gnome.org/show_bug.cgi?id=694982
On a flush operation, everything must be converted
|
|
e1631e1c
|
2013-03-10T12:47:37
|
|
Few cleanup patches for Windows
https://bugzilla.gnome.org/show_bug.cgi?id=690878
provided by Cole <coleharrisjohnson@gmail.com>
|
|
f7aeda24
|
2013-03-23T10:31:26
|
|
Fix the URL of the SAX docuemntation from James
as it has moved
|
|
1f6c42cf
|
2013-03-18T15:30:00
|
|
Fix an old bug in xmlSchemaValidateOneElement
Recently I have run into the very same problem Tiberius Duluman did back in
Wed, 13 May 2009 15:56:55 +0300 ([xml] Bug in xmlSchemaValidateOneElement
function). Now I can proof now that his problem is a valid problem. I checked
the latest available version of xmlschemas.c (2.9.0.) and the problem is still
there!
I think I have found a solution to the problem which I'd like proof with you:
My quick solution to the problem is to replace line 27849 in
xmlschemas.c
(v2.9.0.) in function xmlSchemaVDocWalk
valRoot = xmlDocGetRootElement(vctxt->doc);
with this one:
valRoot = vctxt->validationRoot ? vctxt->validationRoot : xmlDocGetRootElement(vctxt->doc);
Currently I'm using version 2.7.8. in Windows and this change seems to solve
the problem.
|
|
cff2546f
|
2013-03-11T15:57:55
|
|
Cache presence of '<' in entities content
slightly modify how ent->checked is used, and use the lowest bit to
keep the information
|
|
a3f1e3e5
|
2013-03-11T13:57:53
|
|
Avoid extra processing on entities
If an entity has already been checked for correctness no
need to check it on every reference
|
|
a0989068
|
2013-03-04T22:46:21
|
|
Fix configure cannot remove messages
this is the other way to solve ./configure cannot remove messages by
simply removing rm detection in configure.in
There is already a raw 'rm -f' at the end on configure.in
|
|
c100e69c
|
2013-02-28T19:02:32
|
|
fix schema validation in combination with xsi:nil
Based on Thomas Gamper <icicle@cg.tuwien.ac.at> findings and
initial patch
There is no point doing a regexp validation of further
content if there actually is no further content because the
element is nilled.
|
|
19d785b5
|
2013-02-28T18:22:46
|
|
xmlCtxtReadFile doesn't work with literal IPv6 URLs
https://bugzilla.gnome.org/show_bug.cgi?id=694185
RedHat Bug 624626 discusses the new behavior of libxml regarding brackets
around IPv6 addresses. In earlier versions such as 2.6.27, uri.c stripped the
brackets (e.g. uri->server == "fdf2:1e39:73d1:934e::119"); in the current
version it returns IPv6 addresses with brackets intact (e.g. uri->server
== "[fdf2:1e39:73d1:934e::119]").
Thus in 2.9.0, xmlCtxtReadFile() has a problem when it is passed a URL
containing a literal IPv6 address. xmlCtxReadFile() and its subroutines pass
uri->server unchanged to getaddrinfo(), which doesn't recognize a bracketed
IPv6 address, so the read fails.
This strips the [ and ] from IPv6 addresses allowing getaddrinfo()
to work properly with such URIs.
|
|
d749528a
|
2013-02-27T13:11:47
|
|
Silent the new python test on input
Just make it silent if there is no error
|
|
a9016c49
|
2013-02-25T16:07:09
|
|
Fix a few problems with setEntityLoader
1. Setting entity loader does not increment the refcount on the Python object
passed in. This works only if the object is not deleted. For example, the
following code results in segmentation fault in Python interpreter when
attempting to process any document:
[[[
def register_entity_loader():
def entity_loader(URL, ID, ctxt):
...
libxml2.setEntityLoader(entity_loader
register_entity_loader()
]]]
2. setEntityLoader() does not verify if the passed object is callable. If it
is not, current implementation attempts to call it anyway and failing that,
silently moves on to default entity loader. Attached patch makes
setEntityLoader raise ValueError exception if non-callable object is
passed.
3. In debug mode, pythonExternalEntityLoader() outputs the result object to
stderr, while the messages before and after the object (description + newline)
go to stdout. Attached patch makes them all go to stdout.
|
|
48da90bc
|
2013-02-25T15:54:25
|
|
Python binding for xmlRegisterInputCallback
It is possible to make xmlIO handle any protocol by means of
xmlRegisterInputCallback(). However, that function is currently only
available in C API. So, the natural solution seems to be implementing Python
bindings for the xmlRegisterInputCallback.
* python/generator.py: skip xmlPopInputCallbacks
* python/libxml.c python/libxml.py python/libxml_wrap.h: implement the
wrappers
* python/tests/input_callback.py python/tests/Makefile.am: also add a test case
|
|
e32ceb93
|
2013-02-20T18:28:25
|
|
Python bindings: DOM casts everything to xmlNode
I noticed another issue with Python bindings of libxml: the access methods do
not cast the pointers to specific classes such as xmlDtd, xmlEntityDecl, etc.
For example, with the following document:
<?xml version="1.0"?>
<!DOCTYPE root [<!ELEMENT root EMPTY>]>
<root/>
the following script:
import libxml2
doc = libxml2.readFile("c.xml", None, libxml2.XML_PARSE_DTDLOAD)
print repr(doc.children)
prints:
<xmlNode (root) object at 0xb74963ec>
With properly cast nodes, it outputs the following:
<xmlDtd (root) object at 0xb746352c>
The latter object (xmlDtd) enables one to use DTD-specific methods such as
debugDumpDTD(), copyDTD(), and so on.
|
|
23f05e0c
|
2013-02-19T10:21:49
|
|
Detect excessive entities expansion upon replacement
If entities expansion in the XML parser is asked for,
it is possble to craft relatively small input document leading
to excessive on-the-fly content generation.
This patch accounts for those replacement and stop parsing
after a given threshold. it can be bypassed as usual with the
HUGE parser option.
|
|
bf058dce
|
2013-02-13T18:19:42
|
|
Fix the flushing out of raw buffers on encoding conversions
https://bugzilla.gnome.org/show_bug.cgi?id=692915
the new set of converting functions tried to limit the encoding
conversion of the raw buffer to the consumption one to work in
a more progressive fashion. Unfortunately this was bad for
performances and led to errors on progressive parsing when
a very large chunk was close to the end of the document. Fix
the new internal function and switch back to the old way of
converting. Fix another bug in the process.
|
|
de0cc20c
|
2013-02-12T16:55:34
|
|
Fix some buffer conversion issues
https://bugzilla.gnome.org/show_bug.cgi?id=690202
Buffer overflow errors originating from xmlBufGetInputBase in 2.9.0
The pointers from the context input were not properly reset after
that call which can do reallocations.
|
|
60adeea9
|
2013-02-11T12:45:56
|
|
Fix rpmbuild --nocheck
if the %check section was omitted some of the file needed for
packaging would not be generated, move the generation to the
proper place.
|
|
23922c53
|
2013-02-11T11:52:44
|
|
When calling xmlNodeDump make sure we grow the buffer quickly
Make sure the underlying new buffer allocated use a double-it scheme
for the time of the dump.
|
|
2af19f98
|
2013-01-28T17:44:53
|
|
Cleanup of a duplicate test
in an and expression, pointed by Thomas Jarosch <thomas.jarosch@intra2net.com>
Daniel
|
|
eea38159
|
2013-01-28T16:55:30
|
|
Cleanup on duplicate test expressions
As pointed out by Thomas Jarosch <thomas.jarosch@intra2net.com>
Daniel
|
|
9c8eaabe
|
2013-01-04T12:41:53
|
|
Fix compiler warning after 153cf15905cf4ec080612ada6703757d10caba1e
Add missing cast for xmlNop to silence a compiler warning.
|
|
cf8f0424
|
2012-12-21T11:13:31
|
|
Fix an error in the progressive DTD parsing code
For https://bugzilla.gnome.org/show_bug.cgi?id=689958
We were looking for the wrong character in the input stream
|
|
e4d16d79
|
2012-12-21T10:58:14
|
|
xmllint should not load DTD by default when using the reader
|
|
a0571ebe
|
2012-12-12T17:16:00
|
|
Fix for win32/configure.js and WITH_THREAD_ALLOC
Building git master gives me the following error on Windows; this patch
fixes it:
icl /EP /nologo /I..\include /D "NOLIBTOOL" /D "_REENTRANT"
libxml2.def.
src > int.msvc\libxml2.def
libxml2.def.src
Z:\...\libxml2-git8123c4f6_debug\win32\../include/libxml/xmlversion.h(105):
error: unrecognized token
#if @WITH_THREAD_ALLOC@
^
Z:\...\libxml2-git8123c4f6_debug\win32\../include/libxml/xmlversion.h(105):
error: expected an expression
#if @WITH_THREAD_ALLOC@
^
Z:\...\libxml2-git8123c4f6_debug\win32\../include/libxml/xmlversion.h(105):
error: unrecognized token
#if @WITH_THREAD_ALLOC@
^
NMAKE : fatal error U1077: 'icl' : return code '0x2'
Stop.
|
|
6f49c73b
|
2012-12-12T15:41:30
|
|
Try IBM-037 when looking for EBCDIC handlers
http://en.wikipedia.org/wiki/EBCDIC_037
as it is another variat of EBCDIC
|
|
8123c4f6
|
2012-11-08T16:24:07
|
|
Fix Broken multi-arch support in xml2-config
partial revert of 87b4d6f6105658a99b976f812223c8edf4469265
coming from Fedora/RHEL/... but breaking other distros
as pointed out by Daniel Richard
|
|
fb27e2cd
|
2012-09-28T08:59:33
|
|
Fix spelling of "length".
|
|
0ad948ed
|
2012-10-29T13:41:55
|
|
Define LIBXML_THREAD_ALLOC_ENABLED via xmlversion.h
Otherwise, direct calls to xmlFree() etc. from the application will
use a different set of allocation functions to what was used to allocate
the memory internally.
|
|
6a36fbe3
|
2012-10-29T10:39:55
|
|
Fix potential out of bound access
|
|
4ea74a44
|
2012-10-29T10:27:18
|
|
Fix a portability issue for GCC < 3.4.0
|
|
153cf159
|
2012-10-26T13:50:47
|
|
Fix large parse of file from memory
https://bugzilla.redhat.com/show_bug.cgi?id=862969
The new code trying to detect excessive input lookup would
just get wrong sometimes in the case of very large file parsed
directly from memory.
|
|
711b15d5
|
2012-10-25T19:23:26
|
|
Fix a bug in the nsclean option of the parser
Raised as a side effect of:
https://bugzilla.gnome.org/show_bug.cgi?id=663844
|
|
a7982ce2
|
2012-10-25T15:39:39
|
|
Adding streaming validation to runtest checks
|
|
1abd221b
|
2012-10-25T15:37:50
|
|
Add a --pushsmall option to xmllint
To test the push parser with small chunks or 10 bytes
|
|
6c91aa38
|
2012-10-25T15:33:59
|
|
Fix a regression in 2.9.0 breaking validation while streaming
https://bugzilla.gnome.org/show_bug.cgi?id=684774
with help from Kjell Ahlstedt <kjell.ahlstedt@bredband.net>
|
|
87b4d6f6
|
2012-10-11T14:44:22
|
|
Spec cleanups and a fix for multiarch support
|
|
7457c67f
|
2012-10-11T12:25:51
|
|
Remove potential calls to exit()
|
|
713434d2
|
2012-09-26T10:21:06
|
|
Silence a clang warning
as reported by Hans Wennborg <hans@chromium.org>
|
|
7e86eb5d
|
2012-09-20T21:46:19
|
|
Cleanup the Copyright to be pure MIT Licence wording
|
|
bbe19451
|
2012-09-18T11:15:06
|
|
Windows build fixes
Building 2.9.0 on MSVC7.1 was failing
This is because HAVE_CONFIG_H is not #defined
The patch addresses the above, adds testrecurse.exe and the
standard "make check" suite of tests to the MSVC makefile, and also
fixes the following (MSVC7.1) warnings:
buf.c(674) : warning C4028: formal parameter 1 different from
declaration
libxml2\timsort.h(71) : warning C4028: formal parameter 1 different from
declaration
|
|
3f6cfbd1
|
2012-09-12T17:34:53
|
|
Fix a thread portability problem
cannot compile libxml2-2.9.0 using studio 12.1 compiler on solaris 10
I.M.O. structure initializer (as PTHREAD_ONCE_INIT) cannot be used in
a structure assignment anyway
|
|
e7715a59
|
2012-09-14T14:39:42
|
|
rand_seed should be static in dict.c
For https://bugzilla.gnome.org/show_bug.cgi?id=683933
rand_seed should be a static variable in dict.c
We ran into a problem with another library that exports rand_seed as a
function. Combined with 2.7.8 this was not a problem but later versions
have this problem.
|
|
81d7a824
|
2012-09-13T15:56:51
|
|
Fix typos in parser comments
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
|
|
5d04ad11
|
2012-09-11T17:17:15
|
|
Downgrade autoconf requirement to 2.63
It was automatically bumped to 2.68 and that's not needed
|
|
38bbd341
|
2012-09-11T15:00:08
|
|
Release of libxml2-2.9.0
* libxml.spec.in: update
* doc/*: updated and regenerated
* libxml2.syms testapi.c: regenerated
|
|
7651606f
|
2012-09-11T14:02:08
|
|
Various cleanups to avoid compiler warnings
|
|
742a0bbb
|
2012-09-11T13:37:30
|
|
Keep libxml2.syms when running "make distclean"
|
|
f8e3db04
|
2012-09-11T13:26:36
|
|
Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
|
|
429d3a0a
|
2012-09-11T11:50:25
|
|
Allow to set the quoting character of an xmlWriter
It's otherwise impossible to set the quoting character of
attribute values of an xmlWriter.
|
|
e00778b4
|
2012-09-08T21:09:26
|
|
Followup to LibXML2 docs/examples cleanup patch
|
|
f933c898
|
2012-09-07T19:32:12
|
|
Keep non-significant blanks node in HTML parser
For https://bugzilla.gnome.org/show_bug.cgi?id=681822
Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes
are removed from a HTML document, for example:
<html>
<head>
<title>This is a test.</title>
</head>
<body>
<p>This is a test.</p>
</body>
</html>
is read as:
<html><head><title>This is a test.</title></head><body>
<p>This is a test.</p>
</body></html>
This changes the default behaviour but the old behaviour is available
as expected when using the parser flag HTML_PARSE_NOBLANKS
Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com>
* HTMLparser.c: change various places in the parser where ignorable_space
SAX callback was called without checking for the parser flag preference
* xmllint.c: make sure we use the new flag even for HTML parsing
* result/HTML/*: this modifies the output of a number of tests
|
|
878ec9db
|
2012-09-07T14:52:17
|
|
Second round of cleanups for LibXML2 docs/examples
configure.am:
* Explicitly disallow --enable-rebuild-docs when builddir != srcdir, per
what you said about needing to build docs with an in-source build
doc/Makefile.am:
* Ensure that xmlversion.h is in the source tree before running
apibuild.py, to avoid generating an incomplete libxml2-api.xml
* Update the .PHONY target (forgot to do this earlier)
doc/devhelp/Makefile.am:
* Wrap the doc-generating rule in an "if REBUILD_DOCS" conditional so it
doesn't cause trouble for regular users
* Added a handy-dandy "rebuild" target
doc/examples/index.py:
* NOTE: You need to run this script to regenerate the files it creates,
and then commit the newly-updated files! The generated files currently
in git master (e.g. doc/examples/Makefile.am) are out of date even
before this patch!
* index.html really needs to be in EXTRA_DIST
* Wrap the doc-generating rules in an "if REBUILD_DOCS" conditional,
because they shouldn't be active otherwise
|
|
47881284
|
2012-09-07T14:24:50
|
|
Add a forbidden variable error number and message to XPath
Related to https://bugzilla.gnome.org/show_bug.cgi?id=680938
When the XML_XPATH_NOVAR flags is being used it means that
variables are forbidden, not that they are missing
|
|
55b899a2
|
2012-09-07T12:14:00
|
|
Support long path names on WNT
so we've got this patch to libxml2 2.7.6 in the LibreOffice code base,
inherited from OOo. it fixes a definite problem, which is that Windows
has a rather low maximum path length restriction, and there is a special
trick on NT whereby path names can be prefixed with "\\?\", in which
case the maximum length is 32k, which ought to be sufficient even for
bloated office suites :)
I'll attach the patch to the xmlCanonicPath function. note that i
didn't write this and am by no means an expert on either Microsoftean
platforms or libxml so maybe it's not the best way to do it.
|
|
1bd45d13
|
2012-09-05T15:35:19
|
|
Change the XPath code to percolate allocation errors
looping 1000 time on an error stating that a nodeset has
grown out of control is useless, make sure we percolate
error up to the various loops and break when errors occurs
|
|
7d4c529a
|
2012-09-05T11:45:32
|
|
Improve HTML escaping of attribute on output
Handle special cases of &{...} constructs as hinted in the spec
http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
|
|
857104cd
|
2012-09-04T14:25:23
|
|
Remove all .cvsignore as they are not used anymore
For https://bugzilla.gnome.org/show_bug.cgi?id=682985
suggested by Adrian Bunk <bunk@stusta.de>
|
|
7a2215db
|
2012-09-04T12:05:17
|
|
Fix reuse of xmlInitParser
While xmlCleanupParser() should not be used unless complete control
is insured over the programe making sure libxml2 is not in use anywhere
It should still be usable, and allow a sequence of
xmlInitParser();
xmlCleanupParser();
calls if needed, the problem is that the thread key wasn't reallocated
on subsequent xmlinitParser() calls leading to corruption of pthread
keys used by the program.
* threads.c: make sure xmlCleanupParser() reset the pthread_once()
global variable driving thread key allocation.
|
|
510e7583
|
2012-09-04T11:50:36
|
|
Fix a Timsort function helper comment
|
|
28f5e1a2
|
2012-09-04T11:18:39
|
|
Fix potential crash on entities errors
Related to https://bugs.launchpad.net/lxml/+bug/502959
Basically the core of the issue is that if an entity references another
entity, then in case we are replacing entities content, we should always
do so by copying the referenced content as long as the reference is
done within the entity. Otherwise, if for some reason there is a later
parsing error that entity content may be freed.
Complex scenario exposed by command:
thinkpad:~/XML/diveintopython-5.4/xml -> valgrind --db-attach=yes
../../xmllint --loaddtd --noout --noent diveintopython.xml
Document references &a;
a references &b;
we references b content directly in by linking in the a content
a has an error further down
we free a, freeing the chunk from b
Document references &b; after &a;
we try to copy b content, but it was freed already => segfault
* parser.c: never reference directly entity content without copying if
we aren't in the document main entity
|
|
3b6d7b9a
|
2012-08-28T23:40:56
|
|
xml2-config.1 markup error
There is a spurious ".l" in the xml2-config.1 man page. This line can
simply be removed.
$ mandoc -Tlint -Werror xml2-config.1
xml2-config.1:12:2: ERROR: skipping unknown macro: .l
|
|
1f01f49b
|
2012-08-28T22:16:50
|
|
Handle ICU_LIBS as LIBADD, not LDFLAGS to prevent linking errors
For https://bugzilla.gnome.org/show_bug.cgi?id=677606
For https://bugs.gentoo.org/show_bug.cgi?id=417539
If libxml2-2.8.0 is built with --with-icu --with-python on a system that has an
older version of libxml2 installed, then during "make install", libxml2mod.so
gets relinked to the systemwide version of libxml2.so.2 instead of libxml2.so.2
from the build tree, and fails at runtime if symbol versions from the older
libxml2.so.2 are not available. This effectively makes it impossible to build a
libxml2-2.8.0 binary package on a system that does not already have
libxml2-2.8.0 installed.
Investigation by Rafał Mużyło and Arfrever Frehtes Taifersar Arahesis revealed
the cause of the problem to be that libxml2's configure was adding ICU_LIBS to
LDFLAGS instead of to LIBADD. This resulted in GNU libtool using the wrong
argument order in its relinking command that gets run during "make install".
|
|
961b535c
|
2012-07-03T14:13:59
|
|
Bug 676544 - fails to build with --without-sax1
Added some ifdef'd LIBXML_SAX1_ENABLED to make it buildable with
--without-sax1 configure option.
|
|
236ea1ea
|
2012-08-27T11:56:07
|
|
fix builds not having stdint.h
|
|
8f2d6b57
|
2012-08-27T05:08:54
|
|
initialize var
|
|
8880170e
|
2012-08-27T16:20:05
|
|
Fix the XPath arity check to also check the XPath stack limits
Example xmlXPathNormalizeFunction() would do CHECK_ARITY(1)
and the expect valuePop(ctxt); to return an object, except
now valuePop() looks at the XPath stack frames and fails returning
NULL, and we end up crashing dereferencing the object.
Real solution is to exten CHECK_ARITY() and recompile all
XPath functions using it.
|
|
890faa54
|
2012-08-27T13:24:08
|
|
Fix problem with specific and generic error handlers
It seems that setting up both xmlTextReaderSetStructuredErrorHandler and
xmlSetStructuredErrorFunc confuses the code around error.c:592 and following
This patch works with any combinations of using xmlSetStructuredErrorFunc,
xmlTextReaderSetStructuredErrorHandler, both, or none.
|
|
466fcdaa
|
2012-08-27T12:03:40
|
|
Avoid a potential infinite recursion
Which can happen when eliminating epsilon transitions, as reported
by Pavel Madr <pmadr@opentext.com>
|
|
3e031b7d
|
2012-08-24T16:52:44
|
|
Switching XPath node sorting to Timsort
I use libxml xpath engine on quite large (and mostly "flat") xml files.
It seems that Shellsort, that is used in xmlXPathNodeSetSort is a
performance bottleneck for my case. I have read some posts about sorting
in libxml in the libxml archive, but I agree that qsort was not the way
to go. I experimented with Timsort instead and my results were good for
me. For about 10000 nodes, my test was about 5x faster with Timsort,
for 1000 nodes about 10% faster, for small data files, the difference
was not measurable.
* timsort.h: the algorithm, kept in a separate header
* xpath.c: plug in the new algorithm in xmlXPathNodeSetSort
* Makefile.am: add the header to the EXTRA_DIST
* doc/apibuild.py: avoid indexing the new header
|
|
73f94c60
|
2012-08-24T16:38:54
|
|
Small cleanup for valgrind target
|
|
62270539
|
2012-08-19T19:42:38
|
|
Optimizing '//' in XPath expressions
When investigating the libxslt performance problem reported in bug
#657665, I found that '//' in XPath expressions can be very slow when
working on large subtrees.
One of the reasons is the seemingly quadratic time complexity of the
duplicate checks when merging result nodes. The other is a missed
optimization for expressions of the form
'descendant-or-self::node()/axis::test'. Since '//' is expanded to
'/descendant-or-self::node()/', this type of expression is quite common.
Depending on the axis of the expression following the
'descendant-or-self' step, the following replacements can be made:
from descendant-or-self::node()/child::test
to descendant::test
from descendant-or-self::node()/descendant::test
to descendant::test
from descendant-or-self::node()/self::test
to descendant-or-self::test
from descendant-or-self::node()/descendant-or-self::test
to descendant-or-self::test
'test' can be any kind of node test.
With these replacements the possibly huge result of
'descendant-or-self::node()' doesn't have to be stored temporarily, but
can be processsed in one pass. If the resulting nodeset is small, the
duplicate checks aren't a problem.
I found that there already is a function called
xmlXPathRewriteDOSExpression which performs this optimization for a very
limited set of cases. It employs a complicated iteration scheme for
rewritten expressions. AFAICS, this can be avoided by simply changing
the axis of the expression like described above.
With the attached patch against libxml2 and the files from bug #657665 I
got the following results.
Before:
$ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
service-names-port-numbers.xml
real 2m56.213s
user 2m56.123s
sys 0m0.080s
After:
$ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
service-names-port-numbers.xml
real 0m3.836s
user 0m3.764s
sys 0m0.060s
I also ran the libxml2 and libxslt test suites with the patch and
couldn't detect any breakage.
Nick
>From e0f5a8261760e4f257b90410be27657e984237c8 Mon Sep 17 00:00:00 2001
From: Nick Wellnhofer <wellnhofer@aevum.de>
Date: Sun, 19 Aug 2012 18:20:22 +0200
Subject: [PATCH] Optimizations for descendant-or-self::node()
Currently, the function xmlXPathRewriteDOSExpression optimizes expressions
of type '//child'. Instead of adding a 'rewriteType' and doing a compound
traversal, the same can be achieved simply by setting the axis of the node
test from 'child' to 'descendant'.
There are also many other cases that can be optimized similarly. This
commit augments xmlXPathRewriteDOSExpression to essentially rewrite the
following subexpressions:
- descendant-or-self::node()/child:: to descendant::
- descendant-or-self::node()/descendant:: to descendant::
- descendant-or-self::node()/self:: to descendant-or-self::
- descendant-or-self::node()/descendant-or-self:: to descendant-or-self::
Since the '//' shortcut in XPath is translated to
'/descendant-or-self::node()/', this greatly speeds up expressions using
'//' on large subtrees.
|
|
c70d185a
|
2012-08-23T23:28:04
|
|
Fix an XSD error when generating internal automata
When generating a sequence add an extra epsilon transition
to avoid further constructs from entering via the last state
Bug reported by Johan Corveleyn <jcorvel@gmail.com>
|
|
82cdfc4e
|
2012-08-22T11:05:09
|
|
Expose xmlBufShrink in the public tree API
As suggested by Andrew W. Nosenko:
Proposal: expose the new xmlBufShrink() to the "public" API for
compatibility with xmlBufUse().
Reason: the following scenario:
1. Read something into xmlParserInputBuffer (e.g. using
xmlParserInputBufferRead())
2. Extract content through xmlBufContent()
3. Extract content length through xmlBufUse(). Result have type
'size_t'.
4. Use this content
5. Now, you need to shrink the buffer. How to do it? Doing that
through legacy xmlBufferShrink() is unsafe because it uses 'unsigned
int' and the whole point of introducing the new API was handling the
cases, when 'unsigned int' is not enough. Therefore, need to use the
new xmlBufShrink(). But it is "private".
Therefore, I propose to expose the new xmlBufShrink() in the same way,
as xmlBufContent() and xmlBufUse() are exposed.
|
|
ff7227f2
|
2012-08-20T20:58:24
|
|
Patch for portability of latin characters in C files
Coming from LibreOffice repository:
http://cgit.freedesktop.org/libreoffice/core/plain/libxml2/libxml2-latin.patch
|
|
dce1c8ba
|
2012-08-17T20:42:52
|
|
Patch for xinclude of text using multibyte characters
for bug https://bugzilla.gnome.org/show_bug.cgi?id=633166
When you xinclude a text file, reading portions (by 4000 bytes) of the
buffer incorrectly handled the situation when the end comes across
portions of the bytes in a multibyte character.
|
|
40851d0c
|
2012-08-17T20:34:05
|
|
Fix a segfault on XSD validation on pattern error
As reported by Sven <sven@e7o.de>:
The following pattern will cause a segmentation fault in my
Apache (using PHP5 to validate a XML against a XSD):
<xs:pattern value="(.*)|"/>
Fix a cascade of error handling failures which led to the
crash in that scenario.
|
|
b60061a7
|
2012-07-27T15:42:27
|
|
Visible HTML elements close the head tag
In HTML email it's common to find arbitrary fragments of HTML, the one
that triggered this change was of the form:
<meta><font></font><div>...
Before this change the <font> tag was part of the implicit <head> that
gets created for the <meta> tag, after this change, it is part of the
<body>, which more closely matches the behaviour of modern HTML
implementations.
|
|
c9a575cf
|
2012-08-17T11:59:01
|
|
libxml(3) manpage typo fix
|
|
dfc0aa0a
|
2012-08-17T11:04:24
|
|
GetProcAddressA is available only on WinCE
As Roumen pointed out
"After recent commits I count not link build for mingw* host as
GetProcAddressA is missing."
Looking around a bit it seems you are right:
http://voidnish.wordpress.com/2005/06/14/getprocaddress-in-unicode-builds/
except it was introduced in Windows CE
http://msdn.microsoft.com/en-us/library/ms885634.aspx
|
|
ec4fc529
|
2012-08-17T10:04:30
|
|
More updates and cleanups on autotools and Makefiles
Makefile.am, example/Makefile.am:
* Replaced the obsolete INCLUDES variable with AM_CPPFLAGS/AM_CFLAGS
acinclude.m4:
* autoupdate replaced AC_FD_CC with AS_MESSAGE_LOG_FD
autogen.sh:
* Added -Wall to the autoreconf invocation, which turned up a whole slew
of warnings that are fixed by this patch
configure.in:
* Most of the changes are due to autoupdate, with subsequent manual
tidying
* Note that autoupdate bumped the AC_PREREQ version from 2.59 to 2.68. If
you normally use an older version of Autoconf, and everything works fine
if you comment out that directive, feel free to bump down the version
accordingly.
* Ensure that #include directives in C fragments always have no whitespace
to the left of the '#' mark, as some preprocessors need that to be in
the first column
example/Makefile.am:
* Don't need DEPS
* Use plain LDADD instead of LDADDS; if all programs in this file need to
link against the same set of libraries, then this is all you need
|
|
6842ee81
|
2012-08-17T09:58:38
|
|
More cleanups to the documentation part of libxml2
doc/Makefile.am:
* Build what's in doc/ before doc/devhelp/, as the dependency graph flows
that way
* Add "--path $(srcdir)" so that xsltproc can find DTDs in srcdir
* Replaced $(top_srcdir)/doc with an equivalent $(srcdir)
* Qualified libxml2-api.xml with $(srcdir) as it's always generated there
* Rewrote the dependencies for libxml2-api.xml so that xmlversion.h
doesn't throw everything off
doc/devhelp/Makefile.am:
* Use Automake constructs to install the HTML files instead of an
install-data-local rule
* Reorganized the file a bit (hello whitespace!)
* EXTRA_DIST doesn't need to list so many files now that dist_devhelp_DATA
is being used
* Only print "Rebuilding devhelp files" if rebuilding is actually
occurring
doc/examples/index.py:
* Make the "this file is auto-generated" banner more prominent
* Autotools updates: Use AM_CPPFLAGS/AM_CFLAGS instead of INCLUDES
* Got rid of DEPS as it's not needed (Automake already sees the dependency
on libxml2.la by way of LDADD(S))
* Replaced LDADDS with LDADD, which is applied to all programs listed
in the file. Since all the test programs have the same link
dependencies, this way is more concise yet equivalent.
* Remove the *.tmp files via "make clean" instead of having the test
programs do it themselves (more on this later)
* Invoke index.py in srcdir, as it pretty much needs to run there
* Restructured the index.html rule so that only the xmllint invocation is
allowed to fail
* Use $(MKDIR_P) instead of $(mkinstalldirs), $(VAR) instead of @VAR@
* Remove symlinks for test?.xml in an out-of-source build
* Sort lists for neatness
* Better formatting for EXTRA_DIST and noinst_PROGRAMS variables
* Simplified the Automake bits printed for each program: *_LDFLAGS doesn't
need to be specified as it's empty anyway, *_DEPENDENCIES is redundant,
*_LDADD isn't needed due to the global LDADD
* Added a bit that symlinks in test?.xml from srcdir in out-of-source
builds. This allows the reader4 test to read these files in the current
directory, which ensures that the output always looks the same (i.e.
does not contain references to srcdir)
* Don't hide the test program invocation (or else it's hard to tell which
test failed), and don't use superfluous parentheses
* NOTE: If you check in these changes, be sure to run this script and also
check in the updated files that it generates!
doc/examples/*.c:
* Updated the test: lines so that
+ "&&" is used to separate commands instead of ";" so that errors are
not masked
+ reference files are qualified with $(srcdir)/
+ no "rm" takes place -- these are a problem because (1) if a test
fails, it's useful to have the output file ready for inspection; (2)
the "rm" invocation masks a potential non-zero exit status from diff
(This is why I added the CLEANFILES line above)
doc/examples/io1.res:
* Updated this ref file so that the test passes. (This is correct, right?)
doc/examples/reader4.res:
* Changed this back to its original form, as the symlinking of test?.xml
means this file no longer has to contain path prefixes on the filenames
doc/examples/testWriter.c:
* Changed the output filenames to *.tmp instead of *.res, partly for
consistency, partly to not have to add special cases to CLEANFILES
doc/examples/xpath1.c:
* Removed the "./" prefix on the test invocation, which is redundant as
index.py already adds one
|
|
e0286980
|
2012-08-15T16:30:10
|
|
More changes for Win32 compilation
|
|
414f269a
|
2012-08-15T13:52:09
|
|
Basic changes for Win32 builds of release 2.9.0: compile buf.c
Makes builds on Windows (whether by MSVC, BCB, or MinGW) to compile buf.c
|
|
1f972e9f
|
2012-08-15T10:16:37
|
|
Cleanup some of the parser code
Prefetching assumptions about the amount of data read in GROW
should be backed up with test for 0 termination when at the
end of the buffer.
|
|
ef4526ad
|
2012-08-15T09:14:31
|
|
Fix a variable name in comment
|
|
baaeadcf
|
2012-08-15T09:13:54
|
|
Regenerated testapi.c
|