xmlregexp.c


Log

Author Commit Date CI Message
Nick Wellnhofer 5e7c72cd 2025-06-03T00:59:10 doc: Misc fixes
Nick Wellnhofer 479f26f9 2025-06-03T00:28:16 regexp: Remove unfinished reimplementation This was never enabled.
Nick Wellnhofer 6a6a46f0 2025-05-28T16:02:41 doc: Fix autolink errors Fix links, remove links to internal functions.
Nick Wellnhofer 7bd8d1d9 2025-05-28T15:53:38 doc: Prefix autolinks with '#' Use `#func` instead of `func()` to ignore parameters and make all autolinks work.
Nick Wellnhofer 954aae90 2025-05-16T21:13:17 doc: Improve regexp documentation
Nick Wellnhofer c4926b19 2025-05-16T02:12:23 codegen: Merge xmlunicode.c into xmlregexp.c Include generated parts. Generate xmlChRangeGroups instead of functions for Unicode blocks.
Nick Wellnhofer a40f36e7 2025-05-14T04:04:28 include: Stop using *Ptr typedefs in public headers
Nick Wellnhofer 442c1903 2025-05-09T18:52:36 doc: Fix some damage from automated conversions Add some newlines, fix returns.
Nick Wellnhofer 9bbffec5 2025-05-06T17:42:46 doc: Move brief to top, params to bottom of doc comments
Nick Wellnhofer e78e05c9 2025-05-02T17:32:51 doc: Fix autolinks to functions Unfortunately, autolinks in .c files aren't converted by Doxygen for some reason.
Nick Wellnhofer f7c41287 2025-05-02T15:57:17 doc: Remove more comment block headers
Nick Wellnhofer e525564f 2025-05-01T19:20:06 doc: Remove empty lines at start of block These lines were left over after automatic conversion.
Nick Wellnhofer e549622b 2025-04-28T15:11:24 doc: Convert documentation to Doxygen Automated conversion based on a few regexes.
Nick Wellnhofer 69879da8 2025-04-28T14:04:30 doc: Remove email addresses from documentation Also remove authorship information from generated files, hash.c and globals.c which were rewritten.
Nick Wellnhofer 61890e39 2025-04-27T21:50:15 doc: Prepare for conversion to Doxygen Fix many params in internal functions (not really necessary but Doxygen warns about that in XML mode). Fix formatting in a few corner cases that automatic conversion can't handle. Rearrange some DOC_DISABLE blocks.
Nick Wellnhofer 03a8d5f9 2025-03-04T16:00:08 unicode: Make Unicode functions private
Nick Wellnhofer 6fc26076 2025-02-22T20:31:45 regexp: Hide debugging code behind DEBUG_REGEXP xmlRegexpPrint is now a deprecated no-op.
Florin Haja 4649f28f 2025-02-22T19:29:07 xmlregexp: add support for compact form of automata in xmlRegexpPrint
Nick Wellnhofer c82270a9 2025-02-22T18:51:38 regexp: Avoid dangling start/stop pointers in atom States could be eliminated later, so set start/stop pointers to NULL after they're used in xmlFAGenerateTransitions.
Nick Wellnhofer 9c16a153 2025-02-13T18:41:33 Revert "include: Make most IS_* macros private" This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
Nick Wellnhofer 84a6c82f 2024-12-19T20:59:10 include: Make most IS_* macros private Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.
Nick Wellnhofer 0d6136da 2024-12-15T23:23:10 regexp: Check reallocations for overflow
Nick Wellnhofer 5d36664f 2024-07-16T00:35:53 memory: Deprecate xmlGcMemSetup
Nick Wellnhofer 2dcd561d 2024-07-15T14:54:37 regexp: Don't print to stderr
Nick Wellnhofer 6be79014 2024-07-15T14:18:26 Remove unused code
Nick Wellnhofer 598ee0d2 2024-06-26T01:18:55 error: Remove underscores from xmlRaiseError
Rosen Penev 217e9b7a 2024-06-08T12:27:45 clang-tidy: don't return in void functions Found with readability-redundant-control-flow Signed-off-by: Rosen Penev <rosenp@gmail.com>
Nick Wellnhofer fa01278d 2024-06-16T00:11:41 regexp: Hide experimental legacy code This was never made public.
Nick Wellnhofer 10d60d15 2024-06-16T00:04:46 regexp: Stop using LIBXML_AUTOMATA_ENABLED This macro always equals LIBXML_REGEXP_ENABLED.
Nick Wellnhofer 0651ad66 2024-05-05T20:20:22 valid: Report malloc failure after xmlRegExecPushString
Nick Wellnhofer 05d9bacd 2023-12-18T21:39:51 regexp: Improve error handling Handle malloc failure from xmlRaiseError. Use xmlRaiseMemoryError. Remove argument from memory error handler. Remove TODO macro.
Nick Wellnhofer 1a354d5b 2023-12-10T17:09:45 regexp: Report malloc failures Fix places where malloc failures aren't reported.
Nick Wellnhofer 3e7673bc 2023-09-23T17:31:55 malloc-fail: Report malloc failure in xmlFARegExec
Nick Wellnhofer b7d56ef7 2023-09-22T17:03:56 malloc-fail: Report malloc failure in xmlRegEpxFromParse Also check whether malloc failures are reported when fuzzing.
Nick Wellnhofer f98fa863 2023-09-22T15:25:40 regexp: Fix status codes and handle invalid UTF-8 Fixes #561.
Nick Wellnhofer 4e1c13eb 2023-09-18T14:45:10 debug: Remove debugging code This is barely useful these days and only clutters the code base.
Nick Wellnhofer a800b7e0 2023-05-04T12:47:00 regexp: Fix null deref in xmlFAFinishReduceEpsilonTransitions Short-lived regression found by OSS-Fuzz.
Nick Wellnhofer c613ab14 2023-05-02T00:32:50 regexp: Fix mistake in previous commit The `ret = 0` line should have been deleted. Fixes #531.
Nick Wellnhofer a06eaa61 2023-03-09T06:58:24 regexp: Fix determinism checks Swap arguments in initial call to xmlFARecurseDeterminism. Fix the check whether we revisit the initial state in xmlFARecurseDeterminism. If there are transitions with equal atoms and targets but different counters, treat the regex as deterministic but mark the transitions as non-deterministic internally. Don't overwrite zero return value of xmlFAComputesDeterminism with non-zero value from xmlFARecurseDeterminism. Most of these errors lead to non-deterministic regexes not being detected which typically isn't an issue. The improved code may break users who relied on buggy behavior or cause other bugs to become visible. Fixes #469.
Nick Wellnhofer e301865e 2023-03-09T05:34:38 regexp: Fix checks for eliminated transitions 'to' can be set to -1 or -2 when eliminating transitions, so check for all negative values.
Nick Wellnhofer 90759c59 2023-03-09T16:34:11 regexp: Simplify xmlFAReduceEpsilonTransitions
Nick Wellnhofer 9f7b1142 2023-03-09T05:25:09 regexp: Fix cycle check in xmlFAReduceEpsilonTransitions The visited flag must only be reset after the first call to xmlFAReduceEpsilonTransitions has finished. Visiting states multiple times could lead to unnecessary processing of duplicate transitions. Similar to 68eadabd.
Nick Wellnhofer 85057e51 2023-02-21T15:24:19 regexp: Add sanity check in xmlRegCalloc2 These arguments should be non-zero, but add a sanity check to avoid division by zero. Fixes #450.
Nick Wellnhofer 1743c4c3 2023-02-17T15:53:07 malloc-fail: Fix OOB read after xmlRegGetCounter Found with libFuzzer, see #344.
Nick Wellnhofer 40bc1c69 2023-02-17T15:40:32 malloc-fail: Fix memory leak in xmlFAParseCharProp Found with libFuzzer, see #344.
Nick Wellnhofer e64653c0 2023-02-17T15:20:33 malloc-fail: Fix leak of xmlRegAtom Found with libFuzzer, see #344.
Nick Wellnhofer ed615967 2023-02-17T15:23:42 malloc-fail: Fix memory leak in xmlRegexpCompile Found with libFuzzer, see #344.
Nick Wellnhofer e60c9f4c 2023-02-15T01:00:03 malloc-fail: Fix memory leak after xmlRegNewState Invoke xmlRegNewState from xmlRegStatePush to simplify error handling. Found with libFuzzer, see #344.
Nick Wellnhofer bd33331b 2023-02-17T15:19:37 regexp: Simplify xmlRegAtomPush
Nick Wellnhofer 0f568c0b 2022-08-26T01:22:33 Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.
Nick Wellnhofer 14517012 2022-04-23T19:19:33 Fix parsing of subtracted regex character classes Fixes #370.
Nick Wellnhofer ebb17970 2022-03-04T02:31:59 Remove unneeded #includes
Damjan Jovanovic 37ebf8a8 2021-05-31T07:45:18 Document support for the non-standard escape sequences. Support non-BMP code points in surrogate pairs of '\uXXXX\uXXXX'.
Damjan Jovanovic b66c1961 2021-05-30T11:11:33 Use strtoul() instead of sscanf, and correct data types that break GCC.
Damjan Jovanovic ec8ff95c 2021-05-29T16:36:44 Add support for some non-standard escapes in regular expressions. This adds support for some non-standard escape sequences observed in Microsoft's MSXML DLLs and used by Windows apps, and thus needed by Wine. Some are also used in other XML implementations, eg. Java's. This isn't intended to be final. We probably wish to toggle these non-standard escape sequences on and off somehow, as needed by the caller. Further discussion: https://gitlab.gnome.org/GNOME/libxml2/-/issues/260
Nick Wellnhofer 776d15d3 2022-03-02T00:29:17 Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h
Nick Wellnhofer ea6e8f99 2021-12-20T00:34:58 Fix certain combinations of regex range quantifiers Fix regex transitions that have both min/max and a counter. In this case, we want to save the regex state before incrementing the counter. Fixes #301 and the issue reported here: https://mail.gnome.org/archives/xml/2016-April/msg00017.html
Nick Wellnhofer 382fb056 2021-12-20T00:31:41 Fix range quantifier on subregex Make sure to add counted exit transitions before other counter transitions. Otherwise, we won't backtrack correctly. Fixes #65.
Nick Wellnhofer 346c3a93 2022-02-20T18:46:42 Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.
Arne Becker ec6e3efb 2021-07-06T21:56:04 Patch to forbid epsilon-reduction of final states When building the internal representation of a regexp, it is possible that a lot of empty transitions are created. Therefore there is a step to reduce them in the function xmlFAEliminateSimpleEpsilonTransitions. There is an error there for this case: * State 1 has a transition with an atom (in this case "a") to state 2. * State 2 is final and has an epsilon transition to state 1. After reduction it looked like: * State 1 has a transition with an atom (in this case "a") to itself and is final. In other words, the empty string is accepted when it shouldn't be. The attached patch skips the reduction step for final states. An alternative would be to insert or increment counters when reducing a final state, but this seemed error prone and unnecessary, since there aren't that many final states. Fixes #282
Nick Wellnhofer 7d6837ba 2020-10-25T20:21:43 Fix caret in regexp character group Apply Per Hedeland's patch from https://bugzilla.gnome.org/show_bug.cgi?id=779751 Fixes #188.
Nick Wellnhofer 68eadabd 2020-07-11T21:32:10 Fix exponential runtime in xmlFARecurseDeterminism In order to prevent visiting a state twice, states must be marked as visited for the whole duration of graph traversal because states might be reached by different paths. Otherwise state graphs like the following can lead to exponential runtime: ->O-->O-->O-->O-->O-> \ / \ / \ / \ / O O O O Reset the "visited" flag only after the graph was traversed. xmlFAComputesDeterminism still has massive performance problems when handling fuzzed input. By design, it has quadratic time complexity in the number of reachable states. Some issues might also stem from redundant epsilon transitions. With this fix, fuzzing regexes with a maximum length of 100 becomes feasible at least. Found with libFuzzer.
Nick Wellnhofer fc842f6e 2020-07-06T15:22:12 Limit regexp nesting depth Enforce a maximum nesting depth of 50 for regular expressions. Avoids stack overflows with deeply nested regexes. Found by OSS-Fuzz.
Nick Wellnhofer f8329fdc 2020-07-02T11:51:31 Report error for invalid regexp quantifiers
Nick Wellnhofer 1e7851b5 2020-06-25T12:17:50 Fix integer overflow in xmlFAParseQuantExact Found by OSS-Fuzz.
Nick Wellnhofer 20c60886 2020-03-08T17:19:42 Fix typos Resolves #133.
Nick Wellnhofer 52649b63 2020-01-02T14:45:28 Check for overflow when allocating two-dimensional arrays Found by lgtm.com
Nick Wellnhofer 9bd7abfb 2020-01-02T14:14:48 Remove useless comparisons Found by lgtm.com
Jared Yanovich 2a350ee9 2019-09-30T17:04:54 Large batch of typo fixes Closes #109.
Nick Wellnhofer 99a864a1 2019-09-25T15:27:45 Fix Regextests - One of the bug316338 test cases is expected to succeed. - Memory leak in testRegexp.c. - Refcount handling in xmlExpHashGetEntry.
Nick Wellnhofer c2b0a184 2019-09-25T13:57:42 Fix empty branch in regex Fixes bug 649244: https://bugzilla.gnome.org/show_bug.cgi?id=649244 Closes #57.
Nick Wellnhofer e8c9cd5c 2019-09-16T15:36:02 Fix Schema determinism check of ##other namespaces Non-compound (##local) and compound string atoms are always disjoint regardless of whether the compound atom is negated (##other). Closes #40.
zhouzhongyuan 0b793591 2019-08-26T15:24:12 Fix memory leak in xmlRegEpxFromParse Merge request !39
Nick Wellnhofer 09797c13 2019-03-05T15:14:34 Fix null deref in xmlregexp error path Thanks to Shaobo He for the report.
J. Peter Mugaas d2c329a9 2017-10-21T13:49:31 Fix -Wimplicit-fallthrough warnings Add "falls through" comments to quench implicit-fallthrough warnings which are enabled by -Wextra under GCC 7.
David Kilzer fb56f80e 2017-07-04T18:38:03 Heap-buffer-overflow read of size 1 in xmlFAParsePosCharGroup Credit to OSS-Fuzz. Add a check to xmlFAParseCharRange() for the end of the buffer to prevent reading past the end of it. This fixes Bug 784017.
Nick Wellnhofer 8a0c6698 2017-07-04T17:13:06 Fix NULL pointer deref in xmlFAParseCharClassEsc Found with libFuzzer.
Nick Wellnhofer 34e44567 2017-05-31T16:48:27 Fix undefined behavior in xmlRegExecPushStringInternal It's stupid, but the behavior of memcpy(NULL, NULL, 0) is undefined.
Pranjal Jumde cbb27165 2016-03-07T06:34:26 Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup <https://bugzilla.gnome.org/show_bug.cgi?id=757711> * xmlregexp.c: (xmlFAParseCharRange): Only advance to the next character if there is no error. Advancing to the next character in case of an error while parsing regexp leads to an out of bounds access.
Daniel Veillard 34b35004 2016-05-09T09:28:38 Fix an error with regexp on nullable counted char transition This is the first of the two issues raised by Pete Cordell in https://mail.gnome.org/archives/xml/2016-April/msg00030.html
Jan Pokorný bb654feb 2016-04-13T16:56:07 Fix typos: dictio{ nn -> n }ar{y,ies} Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Gaurav 41b0d1c4 2014-05-09T16:52:32 Avoid Double Null Check Cleanup For https://bugzilla.gnome.org/show_bug.cgi?id=729851
Gaurav 2671b013 2013-09-11T14:59:06 Fix potential NULL pointer dereferences in regexp code https://bugzilla.gnome.org/show_bug.cgi?id=707749 Fix 3 cases where we might dereference NULL
Michael Wood fb27e2cd 2012-09-28T08:59:33 Fix spelling of "length".
Daniel Veillard f8e3db04 2012-09-11T13:26:36 Big space and tab cleanup Remove all space before tabs and space and tabs at end of lines.
Daniel Veillard 466fcdaa 2012-08-27T12:03:40 Avoid a potential infinite recursion Which can happen when eliminating epsilon transitions, as reported by Pavel Madr <pmadr@opentext.com>
Daniel Veillard 40851d0c 2012-08-17T20:34:05 Fix a segfault on XSD validation on pattern error As reported by Sven <sven@e7o.de>: The following pattern will cause a segmentation fault in my Apache (using PHP5 to validate a XML against a XSD): <xs:pattern value="(.*)|"/> Fix a cascade of error handling failures which led to the crash in that scenario.
Patrick R. Gansterer 204f1f14 2012-05-10T20:24:00 undef ERROR if already defined
Daniel Veillard 9543aee9 2010-03-15T11:13:39 Fix broken escape behaviour in regexp ranges
Daniel Veillard 9332b48f 2009-09-23T18:28:43 Fix a Relaxng bug raised by libvirt test suite * xmlregexp.c: other fixes in 2.7.4 raised this internal error when comparing ranges, this affects among others detection of the determinism * test/relaxng/libvirt* result/relaxng/libvirt*: add a test case based on libvirt schemas and tests
Daniel Veillard 29341682 2009-09-10T18:23:39 Release of libxml2-2.7.4 * configure.in: new version * libxml.spec.in: cleanup * xmlregexp.c: fix a comment * doc/apibuild.py: update * doc/*: regenerate everything
Daniel Veillard 594e5dfb 2009-09-07T14:58:47 Chasing dead assignments reported by clang-scan * SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations, but this led to a few real bugs and some part not yet understood (relaxng/interleave)
Daniel Veillard 13cee4e3 2009-09-05T14:52:55 Fix a bunch of scan 'dead increments' and cleanup * HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c: fix unused variables, or unneeded increments as well as a couple of space issues * runtest.c: check for NULL before calling unlink()
Daniel Veillard 1ba2aca3 2009-08-31T16:47:39 492317 Fix Relax-NG validation problems * relaxng.c xmlregexp.c: a subtle problem when checking for compileable content model, if using the same elements in cases of choices. Handled by adding a special flag to the regexp compilation to detect transitions with different atoms using same strings. * test/relaxng/492317* result/relaxng/492317*: add the test to the regression suite
Daniel Veillard d80d0728 2009-08-22T18:56:01 559410 - Regexp bug on (...)? constructs * xmlregexp.c: fix a regexp bug on some (...)? constructs * test/schemas/nvdcve* result/schemas/nvdcve*: add the tests to the regression suite
Daniel Veillard 11e28e4d 2009-08-12T12:21:42 570702 fix a bug in regexp determinism checking * xmlregexp.c: xmlFAComputesDeterminism was bugged as it removed as coalesced transitions on with sane source destination and atoms but not looking at counters
Daniel Veillard bf9c1dad 2008-08-26T07:46:42 add the testchar to 'make check' Volker Grabsch pointed out a typo * Makefile.am: add the testchar to 'make check' * xmlschemas.c: Volker Grabsch pointed out a typo * xmlregexp.c: production [19] from XML Schemas regexps were a mistake removed in version REC-xmlschema-2-20041028, Volker Grabsch provided a patch to remove it * test/schemas/regexp-char-ref_0.xml test/schemas/regexp-char-ref_0.xsd test/schemas/regexp-char-ref_1.xsd result/schemas/regexp-char-ref_0_0 result/schemas/regexp-char-ref_1_0: Volker Grabsch also provided regession tests for this Daniel svn path=/trunk/; revision=3776
Daniel Veillard ad55998f 2008-05-12T13:15:35 avoid a regexp crash, should fix #523738 Daniel * xmlregexp.c: avoid a regexp crash, should fix #523738 Daniel svn path=/trunk/; revision=3744
Daniel Veillard 10bda629 2008-03-13T07:27:24 found a nasty bug in regexp automata build, reported by Ashwin and Bjorn * xmlregexp.c: found a nasty bug in regexp automata build, reported by Ashwin and Bjorn Reese Daniel svn path=/trunk/; revision=3705
Daniel Veillard 041b687e 2008-02-08T10:37:18 apply patch from Andrew Tosh to fix behaviour when '.' is used in a * xmlregexp.c: apply patch from Andrew Tosh to fix behaviour when '.' is used in a posCharGroup * test/schemas/poschargrp0_0.* result/schemas/poschargrp0_0_0*: added the test to the regression suite Daniel svn path=/trunk/; revision=3687