lib/dfa.c


Log

Author Commit Date CI Message
Paul Eggert 83006fa8 2023-03-18T21:53:12 dfa: pacify Oracle Solaris Studio 12.6 Without this patch, the compiler complains “statement not reached”. * lib/dfa.c (lex): Refactor to omit unreachable statement.
Bruno Haible 5513c3fc 2023-02-28T05:59:25 dfa: Tweak the last patch. Suggested by Alexei Podtelezhnikov <apodtele@gmail.com>. * lib/dfa.c (FALLTHROUGH): Assume that Apple clang, in C mode, supports __attribute__ ((__fallthrough__)) starting with version 1200. References: https://en.wikipedia.org/wiki/Xcode#Xcode_11.0_-_14.x_(since_SwiftUI_framework)_2 https://github.com/apple/llvm-project/blob/swift-5.3-RELEASE/clang/test/Sema/fallthrough-attr.c
Bruno Haible c7d7c31c 2023-02-26T16:56:19 dfa: Avoid warnings with some Apple clang versions. Reported by Werner Lemberg <wl@gnu.org> in <https://lists.gnu.org/archive/html/bug-gnulib/2023-02/msg00159.html>. * lib/dfa.c (FALLTHROUGH): When __apple_build_version__ is defined, ignore __clang_major__.
Paul Eggert 8805a44c 2023-01-01T22:06:10 dfa: work around Clang 15 bug Problem reported by Kenton Groombridge in: https://lists.gnu.org/archive/html/bug-gawk/2022-12/msg00010.html On x86-64, Clang 15 gets confused by a call (X ? dfaerror : dfawarn) (Y) and generates the wrong code, presumably because dfaerror is _Noreturn and dfawarn is not. * lib/dfa.c (parse_bracket_exp): Reword to have one call for dfaerror, the other for dfawarn.
Simon Josefsson 32a72f45 2023-01-01T01:14:21 maint: run 'make update-copyright'
Paul Eggert 0153035f 2022-06-03T18:46:37 dfa: do not warn about \] and \} * lib/dfa.c (lex): Do not warn about \] and \}, since they’re surely universally supported even though POSIX says their interpretation is undefined.
Paul Eggert 88d3598a 2022-05-24T16:03:29 dfa: new options DFA_STAR_WARN, DFA_PLUS_WARN This lets ‘grep -E '(*a|+b)'’ warn about the * and the +. * lib/dfa.h (DFA_STAR_WARN, DFA_PLUS_WARN): New flags. * lib/dfa.c (lex): Support them.
Paul Eggert 55d1a73c 2022-05-23T12:17:49 dfa: '\n' is not governed by RE_LIMITED_OPS * lib/dfa.c (lex): Pay no attention to RE_LIMITED_OPS when deciding how to parse '\n', since regcomp.c doesn’t.
Paul Eggert 873de4f5 2022-05-23T12:05:14 dfa: new option DFA_STRAY_BACKSLASH_WARN This is for grep, which wants to warn about stray backslashes that lead to unspecified behavior. For example, "grep -oi '\a'" surprisingly is not equivalent to "grep -oi 'a'", so the stray backslash should be warned about. * lib/dfa.c: Include wctype.h, for iswprint and iswspace. (lex): Add support for DFA_STRAY_BACKSLASH_WARN. * lib/dfa.h (DFA_STRAY_BACKSLASH_WARN): New constant.
Paul Eggert ba80f149 2022-05-23T10:04:18 dfa: new option DFA_CONFUSING_BRACKETS_ERROR This is for grep, which wants [:alpha:] to be an error at the top level. * lib/dfa.c (struct regex_syntax): New member dfaopts, replacing anchor. All uses changed. (parse_bracket_exp): Error, not warn, if DFA_CONFUSING_BRACKETS_ERROR. * lib/dfa.h (DFA_CONFUSING_BRACKETS_ERROR): New constant.
Paul Eggert 48d43583 2022-05-20T16:55:34 dfa: steer cleer of POSIX-reserved symbols * lib/dfa.c (str_eq): Rename from streq. All uses changed. (c_isdigit): Rename from isasciidigit. The function worked in EBCDIC so it wasn’t ASCII-specific anyway. All uses changed.
Paul Eggert b19a1077 2022-05-13T23:23:35 dfa: fix bug with ‘.’ and UTF-8 Hangul Syllables This fixes a bug introduced in 2019-12-18T05:41:27Z!eggert@cs.ucla.edu, an earlier patch that fixed dfa.c to not match invalid UTF-8. Unfortunately that patch had a couple of typos when dfa.c is matching against the regular expression ‘.’ (dot). One typo caused dfa.c to incorrectly reject the valid UTF-8 sequences (ED)(90-9F)(80-BF) corresponding to U+D400 through U+D7FF, which are some Hangul Syllables and Hangul Jamo Extended-B. The other typo caused dfa.c to incorrectly reject the valid sequences (F4)(88-8F)(80-BF)(80-BF) which correspond to U+108000 through U+10FFFF (Supplemental Private Use Area plane B). * lib/dfa.c (utf8_classes): Fix typos. * tests/test-dfa-match.sh: Test the fix.
Bernhard Voelker 87e6634b 2022-01-04T00:16:50 license: fix GPLv3 texts to use a comma instead of semicolon. See: https://www.gnu.org/licenses/gpl-3.0.html#howto Run: $ git grep -l 'Foundation; either version 3' \ | xargs sed -i '/Foundation; either version 3/ s/n; e/n, e/' * All files using GPLv3: Adjust via the above command.
Paul Eggert eec12c00 2022-01-01T09:43:19 maint: run 'make update-copyright'
Paul Eggert 65e69185 2021-08-24T20:12:19 dfa: prefer idx_t to ptrdiff_t for nonnegative * lib/dfa.c (struct dfa, dfaexec_main, dfaexec_mb, dfaexec_sb) (dfaexec_noop, dfaexec): * lib/dfa.h (dfaparse, dfacomp, dfaexec): Prefer idx_t to ptrdiff_t for counts, which should be nonnegative. * lib/dfa.h: Include idx.h.
Paul Eggert b03cdc1b 2021-08-02T11:36:30 dfa: omit unneeded malloc+free Problem indirectly found by Coverity. * lib/dfa.c (enlistnew): New function, with most of the body of the old ‘enlist’. It assumes its arg NEW has been malloced and can be freed eventually. (enlist, addlists, dfamust): Use it. (dfamust): Omit an unnecessary malloc+free.
Paul Eggert b4018b8c 2021-08-01T17:28:01 dfa: improve -fanalyzer malloc checking
Paul Eggert 0a7236ef 2021-06-11T17:18:57 dfa: prefer idx_t for indexes * lib/dfa.c (mbs_to_wchar, state_index, dfaoptimize, dfaanalyze) (icatalloc, enlist, allocmust, dfamust): Prefer idx_t to size_t for indexes, and use idx_t-related allocators.
Paul Eggert 2e1b7bf9 2021-05-30T10:02:20 dfa, etc.: prefer xreallocarray to older name * lib/dfa.c (addtok_mb, realloc_trans_if_necessary, enlist): * lib/readtokens.c (readtokens): * tests/uninorm/test-u32-normalize-big.c: (read_normalization_test_file): Prefer xreallocarray to the equivalent xnrealloc. The newer name follows the glibc lead of ‘reallocarray’.
Paul Eggert e54b645f 2021-03-28T20:02:21 xalloc: new function xpalloc, from dfa Move xpalloc from dfa.c to xmalloc.c and change it from static to extern. The function is useful in other contexts; I’m about to use it in coreutils. * lib/dfa.c: Include idx.h, instead of rolling our own idx_t and IDX_MAX. Do not include intprops.h; no longer needed. (xpalloc): Move from here ... * lib/xmalloc.c (xpalloc): ... to here, and make it extern. Include intprops.h and minmax.h, needed by xpalloc. * lib/xalloc.h: Include idx.h, for idx_t. * modules/dfa (Depends-on): Add idx; remove intprops. * modules/xalloc (Depends-on): Add idx, intprops, minmax.
Paul Eggert 4b948321 2021-01-01T07:28:52 maint: run 'make update-copyright'
Norihiro Tanaka b5b7b9d2 2020-11-01T16:31:38 dfa: retain sequences of similar nodes in optimization DFA was merging similar nodes when it should not. For example, it would convert a+a+a to a+a. Now, a sequence of similar nodes is not merged. Problem reported by Gonzalo Padrino in https://bugs.gnu.org/44351 * lib/dfa.c (merge_nfa_state): Skip the follow for repetition in optimization.
Norihiro Tanaka 25f80339 2020-09-26T09:50:01 dfa: remove unused the member of structure * lib/dfa.c (struct dfa): Remove unused member 'first_end'.
Norihiro Tanaka 4a3aec70 2020-09-20T16:00:04 dfa: make dfasupported a global function * lib/dfa.c (dfasupported): Rename, and make it global. Update caller. * lib/dfa.h (dfasupported): Add prototype.
Paul Eggert 91e8dccc 2020-09-15T13:44:34 dfa: remove dfa-heap-overrun workaround * lib/dfa.c (reorder_tokens): Go back to a single pass that both sets map[*] and does other things. This reverts 2020-09-14T01:20:01Z!eggert@cs.ucla.edu, which is no longer needed now that 2020-09-14T13:21:05Z!noritnk@kcn.ne.jp fixed the underlying problem.
Norihiro Tanaka 3669512e 2020-09-14T22:21:05 dfa: fix failure in removal of epsilon closure If there are a espilon in a branch and the closure is iterated, maybe fails in removal of the node. The bug is introduced in commit da0e8454a8e68035ef4b87dbb9097f85df6ece27. * lib/dfa.c (dfaanalyze): Calculate backward transition for not only concatenation but closure.
Paul Eggert 46bdd627 2020-09-13T18:40:08 dfa: avoid use of uninitialized constraint * lib/dfa.c (merge_nfa_state): Do not initialize the constraint to zero here. (dfaoptimize): Do it here instead, via xcalloc. This prevents the use of an uninitialized constraint by later code when ! (flags[i] & OPT_QUEUED) means merge_nfa_state was not called to initialize the constraint. Problem found by running 'valgrind src/grep -E '(^| )*(a|b)*(c|d)*( |$)' < /dev/null' on Ubuntu 18.04.5 x86-64.
Paul Eggert 5332a21b 2020-09-13T18:27:07 dfa: assume C99 in reorder_tokens * lib/dfa.c (reorder_tokens): Assume C99 and simplify.
Paul Eggert cafb6153 2020-09-13T18:20:01 dfa: fix dfa-heap-overrun failure * lib/dfa.c (reorder_tokens): When setting map[d->follows[i].elems[j].index], instead of incorrectly assuming that (i < d->follows[i].elems[j].index), use two loops, one to set the map array and the other to use it. The incorrect assumption caused some elements to be missed, and this in turn caused grep's dfa-heap-overrun test to fail on Solaris 10 sparc when compiled with GCC. I found this bug while investigating https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-sparc/builds/183 and I think the bug also occurs on GNU/Linux but with more-subtle symptoms. The bug predates the recent dfa.c changes; perhaps the recent changes make the bug more likely.
Paul Eggert 4e14bef8 2020-09-12T18:51:55 dfa: epsilon-closure tweaks (Bug#40634) Rename BACKWORD to BACKWARD consistently. * lib/dfa.c (struct dfa): Reorder members to reduce fragmentation. (addtok_mb): Redo slightly to make it act more like a state machine. Check depth only when it increases. (epsclosure): Let the switch test the tokens. (dfaanalyze): Cache tindex. Simplify position loops. Prefer xcalloc to xnmalloc + explicit zeroing. Free BACKWARD only if it is not null, since we're testing that anyway. (dfaanalyze, build_state): Use merge2 instead of doing it by hand.
Norihiro Tanaka da0e8454 2020-09-12T18:51:55 dfa: use backward set in removal of epsilon closure When removing in epsilon closure, the code searched all nodes sequentially, and this was slow for some cases. Build a backward set before search, and only check previous position with the set. Problem reported in <https://bugs.gnu.org/40634>. * lib/dfa.c (struct dfa): New member 'epsilon'. (addtok_mb): Check whether a pattern has epsilon node or not. (epsclosure): New arg BACKWORD; caller changed. When removing epsilon node and reconnecting, check only previous positions. Treat BEG as if it were character. (dfaanalyze): Build backward set.
Bruno Haible d371c576 2020-08-09T20:09:36 Silence warnings from clang 10 with -Wimplicit-fallthrough. * lib/dfa.c (FALLTHROUGH): Use __attribute__ __fallthrough__ also on clang >= 10. * lib/fnmatch.c (FALLTHROUGH): Likewise. * lib/fts.c (FALLTHROUGH): Likewise. * tests/macros.h (FALLTHROUGH): Likewise. * lib/regex_internal.h (FALLTHROUGH): Likewise. * config/srclist.txt: Mark it as needing sync with glibc.
Bruno Haible 7ea899a9 2020-07-24T14:25:30 dfa: Revert breaking gawk. Reported by Arnold Robbins <arnold@skeeve.com>. * lib/dfa.c (compare): Don't reference the _GL_CMP macro.
Bruno Haible efa15594 2020-07-22T18:35:20 Optimize three-valued comparison between integers. (a > b ? 1 : a < b ? -1 : 0) is the same as (a > b) - (a < b). * m4/gnulib-common.m4 (gl_COMMON): Define _GL_CMP. * lib/c-strcasecmp.c (c_strcasecmp): Use _GL_CMP. * lib/c-strncasecmp.c (c_strncasecmp): Likewise. * lib/dfa.c (compare): Likewise. * lib/fts.c (fts_compare_ino): Likewise. * lib/mbmemcasecmp.c (mbmemcasecmp): Likewise. * lib/mbscasecmp.c (mbscasecmp): Likewise. * lib/mbsncasecmp.c (mbsncasecmp): Likewise. * lib/memcasecmp.c (memcasecmp): Likewise. * lib/memcmp2.c (memcmp2): Likewise. * lib/savedir.c (direntry_cmp_inode): Likewise. * lib/strcasecmp.c (strcasecmp): Likewise. * lib/strncasecmp.c (strncasecmp): Likewise. * lib/unistr/u-cmp2.h (FUNC): Likewise.
Paul Eggert c08f85d7 2020-05-01T18:12:12 attribute: new module This simplifies use of GCC and C2X attributes like ‘deprecated’. * MODULES.html.sh: Add attribute. * doc/attribute.texi, lib/attribute.h, modules/attribute: New files. * doc/gnulib.texi (Particular Modules): Add Attributes. * lib/backupfile.c, lib/fnmatch.c, lib/freopen-safer.c: * lib/mbrtoc32.c, lib/mbrtowc.c, lib/nstrftime.c, lib/quotearg.c: * lib/savewd.c, lib/unistr/u8-uctomb-aux.c, lib/unistr/u8-uctomb.c: * lib/vasnprintf.c: Include attribute.h, and let it define FALLTHROUGH. * lib/bitset/base.h, lib/c-stack.c (__attribute__): Remove macro. * lib/bitset/base.h (ATTRIBUTE_UNUSED): Define in terms of _GL_ATTRIBUTE_MAYBE_UNUSED, for forwards compatibility to C2X. * lib/dfa.c (FALLTHROUGH): Define consistently with gl_COMMON_BODY. This is a copy since Gawk doesn’t use Gnulib. * lib/di-set.h (_GL_ATTRIBUTE_NONNULL): Remove definition that is incompatible with gl_COMMON_BODY’s. All uses changed. * lib/fts.c: Include attribte.h, for FALLTHROUGH. Keep the existing FALLTHROUGH definition since Glibc might use it, and it does no harm to Gnulib’s FALLTHROUGH. * lib/fts_.h, lib/inttostr.h: (__GNUC_PREREQ): Remove; no longer needed. (__attribute_warn_unused_result__): Remove. All uses replaced by _GL_ATTRIBUTE_NODISCARD. * lib/gl_list.h, lib/gl_map.h, lib/gl_omap.h, lib/gl_oset.h: * lib/gl_set.h: Prefer _GL_ATTRIBUTE_NODISCARD to an ifdeffed __attribute__ ((__warn_unused_result__)), for forward compatibility to C2X. * lib/hash.h (_GL_ATTRIBUTE_WUR): Remove. All uses replaced by _GL_ATTRIBUTE_NODISCARD. (_GL_ATTRIBUTE_DEPRECATED): Remove, since gl_COMMON_BODY defines it. * lib/ino-map.h (_GL_ATTRIBUTE_NONNULL): Remove. All uses replaced by gl_COMMON_BODY’s implementation, which has a slightly different signature. * lib/safe-alloc.h (_GL_ATTRIBUTE_RETURN_CHECK): Remove. All uses replaced by _GL_ATTRIBUTE_NODISCARD. * lib/unused-parameter.h (_GL_UNUSED_PARAMETER): Define in terms of _GL_ATTRIBUTE_MAYBE_UNUSED. No doubt all uses should be replaced, at some point. * m4/gnulib-common.m4 (_GL_GNUC_PREREQ): New macro. (_Noreturn): Use it. (_GL_HAS_ATTRIBUTE, _GL_ATTRIBUTE_ALLOC_SIZE) (_GL_ATTRIBUTE_ALWAYS_INLINE, _GL_ATTRIBUTE_ARTIFICIAL) (_GL_ATTRIBUTE_COLD) (_GL_ATTRIBUTE_DEPRECATED, _GL_ATTRIBUTE_ERROR) (_GL_ATTRIBUTE_WARNING, _GL_ATTRIBUTE_EXTERNALLY_VISIBLE) (_GL_ATTRIBUTE_FALLTHROUGH, _GL_ATTRIBUTE_FORMAT) (_GL_ATTRIBUTE_LEAF, _GL_ATTRIBUTE_MAY_ALIAS) (_GL_ATTRIBUTE_MAYBE_UNUSED) (_GL_ATTRIBUTE_NODISCARD, _GL_ATTRIBUTE_NOINLINE) (_GL_ATTRIBUTE_NONNULL, _GL_ATTRIBUTE_NONSTRING) (_GL_ATTRIBUTE_NOTHROW, _GL_ATTRIBUTE_PACKED, _GL_ATTRIBUTE_PURE) (_GL_ATTRIBUTE_SENTINEL): New macros. * modules/backup-rename, modules/backupfile, modules/c-vasnprintf: * modules/fnmatch, modules/freopen-safer, modules/fts: * modules/mbrtoc32, modules/mbrtowc, modules/nstrftime: * modules/quotearg, modules/savewd: * modules/unistdio/u16-u16-vasnprintf: * modules/unistdio/u16-vasnprintf: * modules/unistdio/u32-u32-vasnprintf: * modules/unistdio/u32-vasnprintf: * modules/unistdio/u8-u8-vasnprintf: * modules/unistdio/u8-vasnprintf: * modules/unistdio/ulc-vasnprintf: * modules/unistr/u8-uctomb, modules/vasnprintf: (Depends-on:): Add attribute module.
Paul Eggert b58d0cb8 2020-01-29T16:47:51 dfa: do not depend on isblank This removes a difference between Gawk dfa.c and Gnulib dfa.c. * lib/dfa.c (isblank): Define if neither system nor Gnulib does. * modules/dfa (Depends-on): Remove isblank. * modules/isblank: Add a module indicator, for lib/dfa.c.
Paul Eggert baa7dd2b 2020-01-29T10:58:26 dfa: do not assume 64-bit int Problem reported for VAX/VMS C (!) by Arnold Robbins in: https://lists.gnu.org/r/bug-gnulib/2020-01/msg00173.html * lib/dfa.c (CHARCLASS_PAIR): Bring back this macro. (CHARCLASS_WORD_BITS, charclass_word) [!UINT_LEAST64_MAX]: Fall back to 32-bit words. (CHARCLASS_INIT): Go back to having 8 32-bit args instead of 4 64-bit args. All uses changed.
Paul Eggert 2cdc1baf 2020-01-01T00:00:18 maint: Run 'make update-copyright'
Paul Eggert 6dabeec0 2019-12-19T14:35:59 dfa: struct dfamust now uses flexible array * lib/dfa.c: Include flexmember.h. (dfamust, dfamustfree): Adjust to struct dfamust change. This saves a call to malloc+free. * lib/dfa.h (struct dfamust): Make the final member a flexible array member. * modules/dfa (Depends-on): Add flexmember.
Paul Eggert 5b570e9c 2019-12-19T13:37:45 dfa: fast->small for array elements * lib/dfa.c (charclass_word): Use uint_least64_t not uint_fast64_t, since this type is used in arrays. This change is more for documentation than for any practical effect, since the two types are the same on all known platforms.
Paul Eggert 1219c343 2019-12-17T21:41:27 dfa: do not match invalid UTF-8 * lib/dfa.c (struct dfa): Grow utf8_anychar_classes member array from 5 to 9 tokens; this is needed due to the changes to add_utf8_anychar. (charclass_index): 2nd arg is now pointer-to-const. (add_utf8_anychar): Match only valid UTF-8 byte sequences instead of allowing overlong encodings or surrogate halves.
Paul Eggert 8df5ec4b 2019-12-17T14:08:33 dfa: simplify charclass by assuming C99 * lib/dfa.c (CHARCLASS_WORD_BITS): Now always 64. (charclass_word): Now always uint_fast64_t. (CHARCLASS_PAIR): Remove. (CHARCLASS_INIT): Take 4 arguments instead of 8. All uses changed.
Paul Eggert d19dc900 2019-12-17T13:07:15 dfa: tune via xzalloc * lib/dfa.c (dfaoptimize): Prefer xzalloc to xmalloc + memset.
Paul Eggert 83e9cc89 2019-12-17T00:20:53 dfa: new function dfacopysyntax * lib/dfa.c (struct dfa): Move syntax member later so that dfacopysyntax can easily clear earlier members. (dfacopysyntax): New function, used by Gawk.
Paul Eggert 23c8b490 2019-12-16T17:00:18 dfa: remove one dependency on MB_CUR_MAX * lib/dfa.c (dfamust): No need to refer to MB_CUR_MAX here.
Paul Eggert 99969d03 2019-12-16T14:38:23 dfa: remove struct lexer_state.cur_mb_len * lib/dfa.c (struct lexer_state): Remove cur_mb_len member, as it’s not needed and the code is simpler without it. All uses removed.
Paul Eggert ecb700c1 2019-12-16T00:32:01 dfa: make dfasyntax thread-safe Problem reported by Bruno Haible in: https://lists.gnu.org/r/bug-gnulib/2019-12/msg00099.html * lib/dfa.c: Do not include locale.h. (struct dfa): Remove simple_locale member. All uses replaced by localeinfo.simple. (using_simple_locale): Remove; now present (with some changes) in localeinfo.c. (dfasyntax): No need to initialize removed member.
Paul Eggert 73de34d3 2019-12-12T14:11:06 dfa: prefer ptrdiff_t for API, too Also, use ‘idx_t’ for ptrdiff_t values that must be nonnegative, but do this only for internal use for now. * NEWS: Mention the API change. * lib/dfa.c (idx_t, IDX_MAX): New type and max value, for internal use for now. Use them instead of ptrdiff_t and PTRDIFF_MAX for values known to be nonnegative. (dfaparse, dfaexec_mb, dfaexec_sb, dfaexec_noop, dfaexec): Prefer idx_t or ptrdiff_t to size_t for API. * lib/dfa.h (dfaparse, dfacomp, dfaexec): Prefer ptrdiff_t to size_t for API.
Paul Eggert c532bd73 2019-12-11T15:08:35 dfa: prefer signed integers for internals Signed integers can be checked more easily for integer overflow. * lib/dfa.c (position, struct lexer_state, struct parser_state) (struct dfa, mbs_to_wchar, fetch_wc, parse_bracket_exp) (struct lexptr, lex, addtok_mb, add_utf8_anychar, atom) (nsubtoks, copytoks, closure, alloc_position_set, delete) (replace, state_index, epsclosure, charclass_context) (state_separate_contexts, merge_nfa_state, dfaoptimize) (dfaanalyze, build_state, dfaexec_main, dfa_supported) (maybe_disable_superset_dfa, dfassbuild, dfafree, enlist) (comsubs, inboth, allocmust): Prefer a signed to an unsigned integer when calculating indexes, unless the integer is part of the external API (a bigger deal, and to be done later).
Paul Eggert d8322e93 2019-12-11T13:44:03 dfa: fix index overflow * lib/dfa.c (compare): Avoid integer overflow when analyzing very large regular expressions.
Paul Eggert 360cbd3b 2019-12-11T13:40:01 dfa: update commentary for previous change * NEWS: Mention the change. * lib/dfa.c, lib/dfa.h (dfaparse, dfamust, dfacomp): Update comments.
Norihiro Tanaka ac924922 2019-12-11T12:53:09 dfa: separate parse and compile phase ‘dfamust’ must be called after parsing and before tokens are reordered, but both are executed in the compilation phase. Token reordering was introduced in Gnulib commit 2018-10-22T15:01:08Z!noritnk@kcn.ne.jp (5c7a0371823876cca7a1347fa09ca26bbbff0c98). * lib/dfa.c (dfaparse): Change it to global function. (dfacomp): If first argument is NULL, skip parse. * lib/dfa.h: (dfaparse): Add a prototype.
Paul Eggert e6633650 2019-01-01T00:25:11 maint: Run 'make update-copyright'
Jim Meyering 5d6a3cdd 2018-12-20T19:51:48 revert v0.1-2213-gae4b73e28 and part of v0.1-2281-g95cd86dd7 v0.1-2213-gae4b73e28 caused a regression in grep-3.2 (no match): echo '123-x'|LC_ALL=C grep -E '.\bx' The goal is to revert the first, but reverting it requires to restore the function deleted in the second. I ran this to restore the deleted function: git show v0.1-2281-g95cd86dd7 lib/dfa.c \ | perl -0777 -pe 's/^@@[^\n]*dfaan.*//ms' \ | patch -R -p1 * lib/dfa.c (charclass_context): Restore deleted function. Reverting the primary commit removes this change: dfa: Simplify a building state * lib/dfa.c (build_state): Simplify a building state.
Jim Meyering 95cd86dd 2018-12-15T10:09:35 dfa: avoid new warnings from gcc These would prevent building with -Werror and a Dec snapshot of gcc. * lib/dfa.c (dfaanalyze): Avoid shadowing warnings for "pos". Rename each inner instance to "p". (charclass_context): Remove unused static function.
Norihiro Tanaka ae4b73e2 2018-10-23T00:02:16 dfa: Simplify a building state dfa.c (build_state): Simplify a building state.
Norihiro Tanaka 5c7a0371 2018-10-23T00:01:08 dfa: reorder tokens before execution Reorder tokens before execution. It improves efficiency to access memory in building states. For example, A(BCD|E(F|G)|HI) are reorderda as following. (Before reorder) A:1 - B:2 - C:3 - D:4 ` E:5 - F:6 ` G:7 ` H:8 - I:9 (After reorder) A:1 - B:2 - C:5 - D:6 ` E:3 - F:7 ` G:8 ` H:4 - I:9 dfa.c (compare, reorder_tokens): New function. (reorder_tokens): Call them.
Norihiro Tanaka fb03ea32 2018-10-22T23:51:20 dfa: a state has a set of current positions. Up to now, a state had a set of follow-on positions. It is replaced a set of current positions. This change will save memory space. dfa.c (leaf_set): Remove it. (struct dfa): Add new member constraints and separates. (append): New function. (state_index): Bring constraint from pre-calculated. (state_separate_contexts): Bring separate contexts from pre-calculated. Change argument, update callers. (merge_nfa_state): Pre-calculate constraints for END. and remove END. No longer END is not used after here. (dfaoptimize): Initialize added member constraints. (dfaanalyze): Pre-calculate seprate contexts. (build_state): Change for this update. (dfassbuild): Initialize new members . (dfafree): Free memory for new members.
Norihiro Tanaka a37603c6 2018-10-22T23:35:50 dfa: simplify dfa optimization dfa.c (merge_nfa_state, dfaoptimize): Simplify dfa optimization.
Norihiro Tanaka eb8189b3 2018-10-22T23:31:26 dfa: position set sorts increasing order Change the order of position set from decreasing to increaing, then even after dfa is optimized, it is guaranteed that the number of a position is smaller than the subsequent one's number. dfa.c (insert, merged_constrained, delete): Reverse the direction of an inequality sign. (dfaanalyze): Position set sorts increasing order.
Norihiro Tanaka 8ef5a439 2018-10-22T23:22:40 dfa: remove unneeded code By the addition of beg, a code for the initial state is unnecessary, so remove it. dfa.c (epsclosure): Remove a code for the initial state. (dfaanalyze): Print follows for BEG in debug mode.
Norihiro Tanaka 617a6097 2018-09-19T08:35:49 dfa: optimization for state merge * lib/dfa.c (merge2): New function. (merge_nfa_state): Use it.
Jim Meyering 5007177e 2018-09-18T21:26:01 dfa: trivial comment fix: s/is/if/ * lib/dfa.c (maybe_disable_superset_dfa): Fix comment typo.
Paul Eggert bc7c0f01 2018-09-18T19:12:06 dfa: use more-informative function name * lib/dfa.c (maybe_disable_superset_dfa): Rename from dfautf8noss. Use change.
Paul Eggert 28d7f171 2018-09-18T19:05:26 dfa: tweak allocation performance * lib/dfa.c (merge_nfa_state, dfaoptimize): Prefer ptrdiff_t for indexes some more. Use char for flags, as it’s wide enough. Allocate queue and flags together, with one malloc call. No need to use xnmalloc since the multiplication and addition cannot overflow (it’s already been checked by earlier allocation). Prefer memset to open-coding.
Paul Eggert cbf5523b 2018-09-18T18:24:27 dfa: prune states as we go * lib/dfa.c (prune): Remove. (merge_nfa_state): Prune as we go instead of at the end. Prefer ptrdiff_t for indexes, as this helps the compiler a bit. Simplify constraint checking a bit.
Paul Eggert 5eae16cf 2018-09-18T15:25:51 dfa: reorder enum for efficiency * lib/dfa.c (END): Now -1 again. Reorder other elements of the enumeration to make it easier for GCC to generate efficient code by using fewer comparisons to check for ranges of values. (atom): Take advantage of the reordering.
Norihiro Tanaka 4299106c 2018-09-18T10:07:23 dfa: optimize alternation in NFA Even when similar states exist in alternation, the DFA treats them as separate items, which may complicate the transition in NFA and cause slowdown. This change assembles the states into one. For example, ab|ac is changed into a(b|c). This change speeds-up matching for many branched patterns. For example, grep speeds up more than 30× in: seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in time -p env LC_ALL=C grep -vf in in * lib/dfa.c (prune): New function. (merge_nfa_state): New function. It merges similar NFA states. (dfaoptimize): New function. It seeks merged and removed nodes. (dfaanalyze): Call new function. (dfautf8noss): Change name from dfaoptimize because of addition of new function. (dfacomp): Update caller.
Norihiro Tanaka e648401b 2018-09-18T09:58:22 dfa: simplify initial state Simplifying the initial state enables easier optimization of the NFA. * lib/dfa.c (enum token): Add new element BEG. (prtok): Adjust due to adding element BEG. (dfaparse): Put BEG at a head of tokens. (state_index): Adjust due to adding element BEG. (dfaanalyze): Concatenate BEG to other tokens, and simplify to build initial state. (dfamust): Adjust due to adding element BEG. DFAMUST ignores it.
Assaf Gordon c5e76a95 2018-08-01T22:14:21 dfa: fix memory leak * lib/dfa.c (dfafree): Add missing free() on dfa->superset.
Bruno Haible d7cf3b8d 2018-06-25T21:45:23 Continue to use spaces for indentation, not tabs.
Paul Eggert 281b825e 2018-01-01T00:57:25 maint: Run 'make update-copyright'
Jim Meyering 809f19db 2017-05-19T17:02:56 dfa: two small simplifications * lib/dfa.c (build_state): Avoid repeating longer expressions.
Paul Eggert 5e22aee7 2017-05-16T09:23:52 manywarnings: update for GCC 7 * build-aux/gcc-warning.spec: * m4/manywarnings.m4 (gl_MANYWARN_ALL_GCC): Add GCC 7 warnings, notably -Wimplicit-fallthrough=5, which requires a non-comment fallthrough attribute. This is a bit cleaner than the comment versions. * lib/strftime.c, lib/dfa.c, lib/fnmatch.c, lib/mbrtowc.c: * lib/vasnprintf.c, tests/macros.h (FALLTHROUGH): New macro. Use it whenever one switch case falls through into the next.
Paul Eggert 6afba02d 2017-03-21T19:05:17 dfa: make [0-9] faster in non-C locales Problem reported by John P. Linderman (Bug#26193). * lib/dfa.c (parse_bracket_exp): Remove redundant assignment. If both ends of the range are ASCII digits, do not worry about multi-character collating sequences and the like. Be consistent about using isalpha as a precondition for setbit_case_fold_c.
Norihiro Tanaka e0e362e7 2017-01-19T07:44:13 dfa: fix memory leak in parse Problem reported by Arnold Robbins in: http://lists.gnu.org/archive/html/bug-grep/2017-01/msg00006.html * lib/dfa.c (epsclosure): Do it.
Paul Eggert 61c27fe8 2017-01-15T17:18:10 dfa: port to gcc -fsanitize=undefined * lib/dfa.c (copy): Don’t pass NULL with size 0 to memcpy, as this runs afoul of gcc -fsanitize=undefined.
Paul Eggert 836ac768 2017-01-10T01:27:44 dfa: minor simplification with emptyset * lib/dfa.c (build_state): Simplify by using emptyset.
Paul Eggert 2db74de1 2017-01-09T22:48:46 dfa: shrink constraints from 4 bits to 3 * lib/dfa.c (newline_constraint, letter_constraint) (other_constraint, prev_newline_dependent) (prev_letter_dependent, NO_CONSTRAINT, BEGLINE_CONSTRAINT) (ENDLINE_CONSTRAINT, BEGWORD_CONSTRAINT, ENDWORD_CONSTRAINT) (LIMWORD_CONSTRAINT, NOTLIMWORD_CONSTRAINT): Constraints need only 3 bits, not 4. Using smaller integers shrinks the code a bit and makes grep a tad faster on x86-64.
Paul Eggert 9d561f0d 2017-01-09T20:28:11 dfa: omit unnecessary ptrdiff_t check * lib/dfa.c (alloc_position_set): Do not worry about ptrdiff_t overflow, since xnmalloc does that now.
Paul Eggert 7fbe8c09 2017-01-09T20:26:02 dfa: omit unnecessary allocation * lib/dfa.c (dfaanalyze): Do not allocate follow set, since an all-zero follow set works just fine.
Paul Eggert 8d3c4933 2017-01-09T16:30:41 dfa: omit unused local * lib/dfa.c (build_state): Fix up recent change.
Norihiro Tanaka 7c345c68 2017-01-09T08:21:21 dfa: melt down dfastate into build_state * src/dfa.c (dfastate): Remove it. (build_state): Insert content of dfastate() to bottom.
Norihiro Tanaka aff55692 2017-01-09T07:46:13 dfa: simplify transition table allocation * src/dfa.c (realloc_trans_if_necessary): Remove second argument. Its value is derived from other variable. Update callers. (dfastate): Remove calculation of max number of state.
Paul Eggert 823b5cb5 2017-01-08T12:44:29 dfa: fix reallocation bug when matching newlines Problem reported for sed by S. Gilles (Bug#25390). * lib/dfa.c (realloc_trans_if_necessary): Move earlier. (dfastate): Reallocate before moving any newline transition ... (build_state): ... instead of reallocating here, where it is too late.
Paul Eggert f0f371e1 2017-01-06T12:49:39 dfa: fix 'return' typo Problem reported by Nelson H. F. Beebe. * lib/dfa.c (merge): Fix typo that Sun compilers rejected.
Paul Eggert 74557b94 2017-01-02T12:22:17 dfa: prefer functions to FETCH_WC macro * lib/dfa.c (FETCH_WC): Remove, replacing with ... (fetch_wc, bracket_fetch_wc): ... new functions. These store the wint_t result into DFA->lex.wctok instead of to a separate arg. All callers changed. Move more local decls closer to where they're used.
Paul Eggert c8355b77 2017-01-02T10:49:17 dfa: narrow more local var scopes * lib/dfa.c: Move more local decls to be more local.
Paul Eggert 5c810462 2017-01-02T08:27:12 dfa: remove duplicate assignment Problem reported by Bruno Haible in: http://lists.gnu.org/archive/html/bug-gnulib/2017-01/msg00007.html * lib/dfa.c (parse_bracket_exp): Simplify.
Paul Eggert e210a3cb 2017-01-01T21:43:26 dfa: simplify constraint-dependency checking * lib/dfa.c (prev_newline_constraint, prev_letter_constraint) (prev_other_constraint): Remove. (prev_newline_dependent, prev_letter_dependent): Simplify, to avoid an unnecessary bitwise AND operation.
Paul Eggert 760d5b7d 2017-01-01T21:23:18 dfa: prefer functions and constants to macros * lib/dfa.c: Prefer constants to macros where either will do. (streq, isasciidigit, newline_constraint) (letter_constraint, other_constraint, succeeds_in_context) (prev_newline_constraint, prev_letter_constraint) (prev_other_constraint, prev_newline_dependent) (prev_letter_dependent, accepting, accepts_in_context): Now static functions instead of function-like macros. Use lower-case names accordingly. All uses changed.
Paul Eggert 51536cbb 2017-01-01T20:51:46 dfa: narrow more local var scopes * lib/dfa.c: Move some more local decls down to nearer where they're needed.
Jim Meyering 387fd77e 2016-12-31T08:06:24 dfa: narrow the scope of many local variables * lib/dfa.c: Now that we are no longer constrained to c89, move declarations of many variables (often indices) "down" into the scope(s) where used or to the point of definition. This is a no-semantic-change diff.
Paul Eggert a3fd683d 2017-01-01T02:59:23 version-etc: new year * build-aux/gendocs.sh (version): * doc/gendocs_template: * doc/gendocs_template_min: * doc/gnulib.texi: * lib/version-etc.c (COPYRIGHT_YEAR): Update copyright dates by hand in templates and the like. * all files: Run 'make update-copyright'.
Paul Eggert b724c4e6 2016-12-30T00:57:21 dfa: shorten sbit, success * lib/dfa.c (struct regex_syntax.sbit): (struct dfa.success): Use char, not int, for array elements, since they are all in the range 0..7.
Paul Eggert e0a498ec 2016-12-30T00:42:22 dfa: simplify multibyte_prop etc. This follows up on a change made when dfa.c was in grep, namely grep commit c797046c7c13c2647182b919a79a4c5b4ecf82b1 dated 2015-08-12 07:35:03 -0700, which removed unused multibyte support. That earlier simplification allows for some more simplification and trimming down here. * lib/dfa.c (struct mb_char_classes): New member nchars_alloc. (struct lexer_state): New mamber brack. (struct dfa, addtok_mb): multibyte_prop elements are now char, not int, since they must be in the range 0..3 now. Remove members mbcsets, nmbcsets, mbcsets_alloc, since the brack member now supersedes them. (parse_bracket_exp): Update dfa->lex.brack instead of dfa->mbcsets. (addtok): Use dfa->lex.brack instead of dfa->mbcsets. (dfaparse): Remove unnecessary initializations of already-0 storage. (free_mbdata): Free d->lex.brack.chars instead of d->mbcsets. (dfassbuild): No need to clear sup->mbcsets.
Paul Eggert 959c5a30 2016-12-29T23:02:45 dfa: minor performance tweak * lib/dfa.c (setbit_wc): Test < 0, not == EOF.
Paul Eggert 88125b5e 2016-12-29T21:44:17 dfa: wrap charclass inside a struct This lets GCC check for aliases more accurately. On my platform (gcc Ubuntu 5.4.0-6ubuntu1~16.04.4 x86-64, en_US.utf8 locale) this makes 'grep -Fi -f list.txt list.txt >out' about 1% faster, where list.txt is generated by 'aspell dump master | head -n 100000 >list.txt'. Also, it shrinks the grep text by 64 bytes, woohoo! See Bug#22239. * lib/dfa.c (charclass): Wrap inside a struct. All uses changed. (CHARCLASS_INIT, tstbit, setbit, clrbit, zeroset, fillset, notset) (equal, emptyset, charclass_index, setbit_wc, setbit_case_fold_c): Adjust to this, e.g., by using charclass * rather than charclass. All callers changed as needed. (copyset): Remove. All uses changed to simple assignment. (parse_bracket_exp): Use zeroset instead of memset.
Paul Eggert b783f1eb 2016-12-18T17:35:19 dfa: improve worst-case 'replace' performance See my note in Bug#22357#71. * lib/dfa.c (insert, delete): Rework to avoid duplicate test. (merge_constrained): New function, which is like the old 'merge' function, except with a new argument C2. Simplify the body by avoiding the need for different sections of code depending on whether one input is exhausted. (merge): Use the new function. (delete): Return the constraint of the deleted position, not the entire position. Caller changed. (replace): Change from O(N*(N + log N)) to O(N log N) algorithm.
Norihiro Tanaka d6df3873 2016-12-14T16:31:57 dfa: performance improvement for removal of epsilon closure * lib/dfa.c (delete): Use binary search to find deleted index. (replace): New function. It replaces a position with the followed set. (epsclosure): Replace it with a new algorithm. Update caller.