|
83006fa8
|
2023-03-18T21:53:12
|
|
dfa: pacify Oracle Solaris Studio 12.6
Without this patch, the compiler complains “statement not reached”.
* lib/dfa.c (lex): Refactor to omit unreachable statement.
|
|
5513c3fc
|
2023-02-28T05:59:25
|
|
dfa: Tweak the last patch.
Suggested by Alexei Podtelezhnikov <apodtele@gmail.com>.
* lib/dfa.c (FALLTHROUGH): Assume that Apple clang, in C mode, supports
__attribute__ ((__fallthrough__)) starting with version 1200.
References:
https://en.wikipedia.org/wiki/Xcode#Xcode_11.0_-_14.x_(since_SwiftUI_framework)_2
https://github.com/apple/llvm-project/blob/swift-5.3-RELEASE/clang/test/Sema/fallthrough-attr.c
|
|
c7d7c31c
|
2023-02-26T16:56:19
|
|
dfa: Avoid warnings with some Apple clang versions.
Reported by Werner Lemberg <wl@gnu.org> in
<https://lists.gnu.org/archive/html/bug-gnulib/2023-02/msg00159.html>.
* lib/dfa.c (FALLTHROUGH): When __apple_build_version__ is defined,
ignore __clang_major__.
|
|
8805a44c
|
2023-01-01T22:06:10
|
|
dfa: work around Clang 15 bug
Problem reported by Kenton Groombridge in:
https://lists.gnu.org/archive/html/bug-gawk/2022-12/msg00010.html
On x86-64, Clang 15 gets confused by a call (X ? dfaerror :
dfawarn) (Y) and generates the wrong code, presumably because
dfaerror is _Noreturn and dfawarn is not.
* lib/dfa.c (parse_bracket_exp): Reword to have one call for
dfaerror, the other for dfawarn.
|
|
32a72f45
|
2023-01-01T01:14:21
|
|
maint: run 'make update-copyright'
|
|
0153035f
|
2022-06-03T18:46:37
|
|
dfa: do not warn about \] and \}
* lib/dfa.c (lex): Do not warn about \] and \}, since they’re
surely universally supported even though POSIX says their
interpretation is undefined.
|
|
88d3598a
|
2022-05-24T16:03:29
|
|
dfa: new options DFA_STAR_WARN, DFA_PLUS_WARN
This lets ‘grep -E '(*a|+b)'’ warn about the * and the +.
* lib/dfa.h (DFA_STAR_WARN, DFA_PLUS_WARN): New flags.
* lib/dfa.c (lex): Support them.
|
|
55d1a73c
|
2022-05-23T12:17:49
|
|
dfa: '\n' is not governed by RE_LIMITED_OPS
* lib/dfa.c (lex): Pay no attention to RE_LIMITED_OPS when
deciding how to parse '\n', since regcomp.c doesn’t.
|
|
873de4f5
|
2022-05-23T12:05:14
|
|
dfa: new option DFA_STRAY_BACKSLASH_WARN
This is for grep, which wants to warn about stray backslashes that
lead to unspecified behavior. For example, "grep -oi '\a'"
surprisingly is not equivalent to "grep -oi 'a'", so the stray
backslash should be warned about.
* lib/dfa.c: Include wctype.h, for iswprint and iswspace.
(lex): Add support for DFA_STRAY_BACKSLASH_WARN.
* lib/dfa.h (DFA_STRAY_BACKSLASH_WARN): New constant.
|
|
ba80f149
|
2022-05-23T10:04:18
|
|
dfa: new option DFA_CONFUSING_BRACKETS_ERROR
This is for grep, which wants [:alpha:] to be an error
at the top level.
* lib/dfa.c (struct regex_syntax): New member dfaopts,
replacing anchor. All uses changed.
(parse_bracket_exp): Error, not warn, if DFA_CONFUSING_BRACKETS_ERROR.
* lib/dfa.h (DFA_CONFUSING_BRACKETS_ERROR): New constant.
|
|
48d43583
|
2022-05-20T16:55:34
|
|
dfa: steer cleer of POSIX-reserved symbols
* lib/dfa.c (str_eq): Rename from streq. All uses changed.
(c_isdigit): Rename from isasciidigit. The function worked in
EBCDIC so it wasn’t ASCII-specific anyway. All uses changed.
|
|
b19a1077
|
2022-05-13T23:23:35
|
|
dfa: fix bug with ‘.’ and UTF-8 Hangul Syllables
This fixes a bug introduced in 2019-12-18T05:41:27Z!eggert@cs.ucla.edu,
an earlier patch that fixed dfa.c to not match invalid UTF-8.
Unfortunately that patch had a couple of typos when dfa.c is
matching against the regular expression ‘.’ (dot). One typo
caused dfa.c to incorrectly reject the valid UTF-8 sequences
(ED)(90-9F)(80-BF) corresponding to U+D400 through U+D7FF, which
are some Hangul Syllables and Hangul Jamo Extended-B. The other
typo caused dfa.c to incorrectly reject the valid sequences
(F4)(88-8F)(80-BF)(80-BF) which correspond to U+108000 through
U+10FFFF (Supplemental Private Use Area plane B).
* lib/dfa.c (utf8_classes): Fix typos.
* tests/test-dfa-match.sh: Test the fix.
|
|
87e6634b
|
2022-01-04T00:16:50
|
|
license: fix GPLv3 texts to use a comma instead of semicolon.
See: https://www.gnu.org/licenses/gpl-3.0.html#howto
Run:
$ git grep -l 'Foundation; either version 3' \
| xargs sed -i '/Foundation; either version 3/ s/n; e/n, e/'
* All files using GPLv3: Adjust via the above command.
|
|
eec12c00
|
2022-01-01T09:43:19
|
|
maint: run 'make update-copyright'
|
|
65e69185
|
2021-08-24T20:12:19
|
|
dfa: prefer idx_t to ptrdiff_t for nonnegative
* lib/dfa.c (struct dfa, dfaexec_main, dfaexec_mb, dfaexec_sb)
(dfaexec_noop, dfaexec):
* lib/dfa.h (dfaparse, dfacomp, dfaexec):
Prefer idx_t to ptrdiff_t for counts, which should be nonnegative.
* lib/dfa.h: Include idx.h.
|
|
b03cdc1b
|
2021-08-02T11:36:30
|
|
dfa: omit unneeded malloc+free
Problem indirectly found by Coverity.
* lib/dfa.c (enlistnew): New function, with most of the body of
the old ‘enlist’. It assumes its arg NEW has been malloced and
can be freed eventually.
(enlist, addlists, dfamust): Use it.
(dfamust): Omit an unnecessary malloc+free.
|
|
b4018b8c
|
2021-08-01T17:28:01
|
|
dfa: improve -fanalyzer malloc checking
|
|
0a7236ef
|
2021-06-11T17:18:57
|
|
dfa: prefer idx_t for indexes
* lib/dfa.c (mbs_to_wchar, state_index, dfaoptimize, dfaanalyze)
(icatalloc, enlist, allocmust, dfamust):
Prefer idx_t to size_t for indexes, and use idx_t-related allocators.
|
|
2e1b7bf9
|
2021-05-30T10:02:20
|
|
dfa, etc.: prefer xreallocarray to older name
* lib/dfa.c (addtok_mb, realloc_trans_if_necessary, enlist):
* lib/readtokens.c (readtokens):
* tests/uninorm/test-u32-normalize-big.c:
(read_normalization_test_file):
Prefer xreallocarray to the equivalent xnrealloc.
The newer name follows the glibc lead of ‘reallocarray’.
|
|
e54b645f
|
2021-03-28T20:02:21
|
|
xalloc: new function xpalloc, from dfa
Move xpalloc from dfa.c to xmalloc.c and change it from static to
extern. The function is useful in other contexts; I’m about to
use it in coreutils.
* lib/dfa.c: Include idx.h, instead of rolling our own idx_t and
IDX_MAX. Do not include intprops.h; no longer needed.
(xpalloc): Move from here ...
* lib/xmalloc.c (xpalloc): ... to here, and make it extern.
Include intprops.h and minmax.h, needed by xpalloc.
* lib/xalloc.h: Include idx.h, for idx_t.
* modules/dfa (Depends-on): Add idx; remove intprops.
* modules/xalloc (Depends-on): Add idx, intprops, minmax.
|
|
4b948321
|
2021-01-01T07:28:52
|
|
maint: run 'make update-copyright'
|
|
b5b7b9d2
|
2020-11-01T16:31:38
|
|
dfa: retain sequences of similar nodes in optimization
DFA was merging similar nodes when it should not. For example,
it would convert a+a+a to a+a. Now, a sequence of similar nodes
is not merged.
Problem reported by Gonzalo Padrino in https://bugs.gnu.org/44351
* lib/dfa.c (merge_nfa_state): Skip the follow for repetition in
optimization.
|
|
25f80339
|
2020-09-26T09:50:01
|
|
dfa: remove unused the member of structure
* lib/dfa.c (struct dfa): Remove unused member 'first_end'.
|
|
4a3aec70
|
2020-09-20T16:00:04
|
|
dfa: make dfasupported a global function
* lib/dfa.c (dfasupported): Rename, and make it global.
Update caller.
* lib/dfa.h (dfasupported): Add prototype.
|
|
91e8dccc
|
2020-09-15T13:44:34
|
|
dfa: remove dfa-heap-overrun workaround
* lib/dfa.c (reorder_tokens): Go back to a single pass that
both sets map[*] and does other things. This reverts
2020-09-14T01:20:01Z!eggert@cs.ucla.edu, which is no longer
needed now that 2020-09-14T13:21:05Z!noritnk@kcn.ne.jp
fixed the underlying problem.
|
|
3669512e
|
2020-09-14T22:21:05
|
|
dfa: fix failure in removal of epsilon closure
If there are a espilon in a branch and the closure is iterated, maybe fails
in removal of the node. The bug is introduced in
commit da0e8454a8e68035ef4b87dbb9097f85df6ece27.
* lib/dfa.c (dfaanalyze): Calculate backward transition for not only
concatenation but closure.
|
|
46bdd627
|
2020-09-13T18:40:08
|
|
dfa: avoid use of uninitialized constraint
* lib/dfa.c (merge_nfa_state): Do not initialize the constraint
to zero here.
(dfaoptimize): Do it here instead, via xcalloc. This prevents the
use of an uninitialized constraint by later code when ! (flags[i]
& OPT_QUEUED) means merge_nfa_state was not called to initialize
the constraint. Problem found by running 'valgrind src/grep -E
'(^| )*(a|b)*(c|d)*( |$)' < /dev/null' on Ubuntu 18.04.5 x86-64.
|
|
5332a21b
|
2020-09-13T18:27:07
|
|
dfa: assume C99 in reorder_tokens
* lib/dfa.c (reorder_tokens): Assume C99 and simplify.
|
|
cafb6153
|
2020-09-13T18:20:01
|
|
dfa: fix dfa-heap-overrun failure
* lib/dfa.c (reorder_tokens): When setting
map[d->follows[i].elems[j].index], instead of incorrectly assuming
that (i < d->follows[i].elems[j].index), use two loops, one to set
the map array and the other to use it. The incorrect assumption
caused some elements to be missed, and this in turn caused grep's
dfa-heap-overrun test to fail on Solaris 10 sparc when compiled
with GCC. I found this bug while investigating
https://buildfarm.opencsw.org/buildbot/builders/ggrep-solaris10-sparc/builds/183
and I think the bug also occurs on GNU/Linux but with more-subtle
symptoms. The bug predates the recent dfa.c changes; perhaps the
recent changes make the bug more likely.
|
|
4e14bef8
|
2020-09-12T18:51:55
|
|
dfa: epsilon-closure tweaks (Bug#40634)
Rename BACKWORD to BACKWARD consistently.
* lib/dfa.c (struct dfa): Reorder members to reduce fragmentation.
(addtok_mb): Redo slightly to make it act more like a state machine.
Check depth only when it increases.
(epsclosure): Let the switch test the tokens.
(dfaanalyze): Cache tindex. Simplify position loops.
Prefer xcalloc to xnmalloc + explicit zeroing. Free BACKWARD
only if it is not null, since we're testing that anyway.
(dfaanalyze, build_state): Use merge2 instead of doing it by hand.
|
|
da0e8454
|
2020-09-12T18:51:55
|
|
dfa: use backward set in removal of epsilon closure
When removing in epsilon closure, the code searched all nodes
sequentially, and this was slow for some cases. Build a backward
set before search, and only check previous position with the set.
Problem reported in <https://bugs.gnu.org/40634>.
* lib/dfa.c (struct dfa): New member 'epsilon'.
(addtok_mb): Check whether a pattern has epsilon node or not.
(epsclosure): New arg BACKWORD; caller changed. When removing
epsilon node and reconnecting, check only previous positions.
Treat BEG as if it were character.
(dfaanalyze): Build backward set.
|
|
d371c576
|
2020-08-09T20:09:36
|
|
Silence warnings from clang 10 with -Wimplicit-fallthrough.
* lib/dfa.c (FALLTHROUGH): Use __attribute__ __fallthrough__ also on
clang >= 10.
* lib/fnmatch.c (FALLTHROUGH): Likewise.
* lib/fts.c (FALLTHROUGH): Likewise.
* tests/macros.h (FALLTHROUGH): Likewise.
* lib/regex_internal.h (FALLTHROUGH): Likewise.
* config/srclist.txt: Mark it as needing sync with glibc.
|
|
7ea899a9
|
2020-07-24T14:25:30
|
|
dfa: Revert breaking gawk.
Reported by Arnold Robbins <arnold@skeeve.com>.
* lib/dfa.c (compare): Don't reference the _GL_CMP macro.
|
|
efa15594
|
2020-07-22T18:35:20
|
|
Optimize three-valued comparison between integers.
(a > b ? 1 : a < b ? -1 : 0) is the same as (a > b) - (a < b).
* m4/gnulib-common.m4 (gl_COMMON): Define _GL_CMP.
* lib/c-strcasecmp.c (c_strcasecmp): Use _GL_CMP.
* lib/c-strncasecmp.c (c_strncasecmp): Likewise.
* lib/dfa.c (compare): Likewise.
* lib/fts.c (fts_compare_ino): Likewise.
* lib/mbmemcasecmp.c (mbmemcasecmp): Likewise.
* lib/mbscasecmp.c (mbscasecmp): Likewise.
* lib/mbsncasecmp.c (mbsncasecmp): Likewise.
* lib/memcasecmp.c (memcasecmp): Likewise.
* lib/memcmp2.c (memcmp2): Likewise.
* lib/savedir.c (direntry_cmp_inode): Likewise.
* lib/strcasecmp.c (strcasecmp): Likewise.
* lib/strncasecmp.c (strncasecmp): Likewise.
* lib/unistr/u-cmp2.h (FUNC): Likewise.
|
|
c08f85d7
|
2020-05-01T18:12:12
|
|
attribute: new module
This simplifies use of GCC and C2X attributes like ‘deprecated’.
* MODULES.html.sh: Add attribute.
* doc/attribute.texi, lib/attribute.h, modules/attribute: New files.
* doc/gnulib.texi (Particular Modules): Add Attributes.
* lib/backupfile.c, lib/fnmatch.c, lib/freopen-safer.c:
* lib/mbrtoc32.c, lib/mbrtowc.c, lib/nstrftime.c, lib/quotearg.c:
* lib/savewd.c, lib/unistr/u8-uctomb-aux.c, lib/unistr/u8-uctomb.c:
* lib/vasnprintf.c:
Include attribute.h, and let it define FALLTHROUGH.
* lib/bitset/base.h, lib/c-stack.c (__attribute__): Remove macro.
* lib/bitset/base.h (ATTRIBUTE_UNUSED): Define in terms of
_GL_ATTRIBUTE_MAYBE_UNUSED, for forwards compatibility to C2X.
* lib/dfa.c (FALLTHROUGH): Define consistently with gl_COMMON_BODY.
This is a copy since Gawk doesn’t use Gnulib.
* lib/di-set.h (_GL_ATTRIBUTE_NONNULL): Remove definition that
is incompatible with gl_COMMON_BODY’s. All uses changed.
* lib/fts.c: Include attribte.h, for FALLTHROUGH.
Keep the existing FALLTHROUGH definition since Glibc might use it,
and it does no harm to Gnulib’s FALLTHROUGH.
* lib/fts_.h, lib/inttostr.h:
(__GNUC_PREREQ): Remove; no longer needed.
(__attribute_warn_unused_result__): Remove. All uses
replaced by _GL_ATTRIBUTE_NODISCARD.
* lib/gl_list.h, lib/gl_map.h, lib/gl_omap.h, lib/gl_oset.h:
* lib/gl_set.h: Prefer _GL_ATTRIBUTE_NODISCARD to an ifdeffed
__attribute__ ((__warn_unused_result__)), for forward
compatibility to C2X.
* lib/hash.h (_GL_ATTRIBUTE_WUR): Remove. All uses replaced by
_GL_ATTRIBUTE_NODISCARD.
(_GL_ATTRIBUTE_DEPRECATED): Remove, since gl_COMMON_BODY defines it.
* lib/ino-map.h (_GL_ATTRIBUTE_NONNULL): Remove. All uses
replaced by gl_COMMON_BODY’s implementation, which has a
slightly different signature.
* lib/safe-alloc.h (_GL_ATTRIBUTE_RETURN_CHECK):
Remove. All uses replaced by _GL_ATTRIBUTE_NODISCARD.
* lib/unused-parameter.h (_GL_UNUSED_PARAMETER):
Define in terms of _GL_ATTRIBUTE_MAYBE_UNUSED.
No doubt all uses should be replaced, at some point.
* m4/gnulib-common.m4 (_GL_GNUC_PREREQ): New macro.
(_Noreturn): Use it.
(_GL_HAS_ATTRIBUTE, _GL_ATTRIBUTE_ALLOC_SIZE)
(_GL_ATTRIBUTE_ALWAYS_INLINE, _GL_ATTRIBUTE_ARTIFICIAL)
(_GL_ATTRIBUTE_COLD)
(_GL_ATTRIBUTE_DEPRECATED, _GL_ATTRIBUTE_ERROR)
(_GL_ATTRIBUTE_WARNING, _GL_ATTRIBUTE_EXTERNALLY_VISIBLE)
(_GL_ATTRIBUTE_FALLTHROUGH, _GL_ATTRIBUTE_FORMAT)
(_GL_ATTRIBUTE_LEAF, _GL_ATTRIBUTE_MAY_ALIAS)
(_GL_ATTRIBUTE_MAYBE_UNUSED)
(_GL_ATTRIBUTE_NODISCARD, _GL_ATTRIBUTE_NOINLINE)
(_GL_ATTRIBUTE_NONNULL, _GL_ATTRIBUTE_NONSTRING)
(_GL_ATTRIBUTE_NOTHROW, _GL_ATTRIBUTE_PACKED, _GL_ATTRIBUTE_PURE)
(_GL_ATTRIBUTE_SENTINEL): New macros.
* modules/backup-rename, modules/backupfile, modules/c-vasnprintf:
* modules/fnmatch, modules/freopen-safer, modules/fts:
* modules/mbrtoc32, modules/mbrtowc, modules/nstrftime:
* modules/quotearg, modules/savewd:
* modules/unistdio/u16-u16-vasnprintf:
* modules/unistdio/u16-vasnprintf:
* modules/unistdio/u32-u32-vasnprintf:
* modules/unistdio/u32-vasnprintf:
* modules/unistdio/u8-u8-vasnprintf:
* modules/unistdio/u8-vasnprintf:
* modules/unistdio/ulc-vasnprintf:
* modules/unistr/u8-uctomb, modules/vasnprintf:
(Depends-on:): Add attribute module.
|
|
b58d0cb8
|
2020-01-29T16:47:51
|
|
dfa: do not depend on isblank
This removes a difference between Gawk dfa.c and Gnulib dfa.c.
* lib/dfa.c (isblank): Define if neither system nor Gnulib does.
* modules/dfa (Depends-on): Remove isblank.
* modules/isblank: Add a module indicator, for lib/dfa.c.
|
|
baa7dd2b
|
2020-01-29T10:58:26
|
|
dfa: do not assume 64-bit int
Problem reported for VAX/VMS C (!) by Arnold Robbins in:
https://lists.gnu.org/r/bug-gnulib/2020-01/msg00173.html
* lib/dfa.c (CHARCLASS_PAIR): Bring back this macro.
(CHARCLASS_WORD_BITS, charclass_word) [!UINT_LEAST64_MAX]:
Fall back to 32-bit words.
(CHARCLASS_INIT): Go back to having 8 32-bit args instead
of 4 64-bit args. All uses changed.
|
|
2cdc1baf
|
2020-01-01T00:00:18
|
|
maint: Run 'make update-copyright'
|
|
6dabeec0
|
2019-12-19T14:35:59
|
|
dfa: struct dfamust now uses flexible array
* lib/dfa.c: Include flexmember.h.
(dfamust, dfamustfree): Adjust to struct dfamust change.
This saves a call to malloc+free.
* lib/dfa.h (struct dfamust): Make the final member a
flexible array member.
* modules/dfa (Depends-on): Add flexmember.
|
|
5b570e9c
|
2019-12-19T13:37:45
|
|
dfa: fast->small for array elements
* lib/dfa.c (charclass_word): Use uint_least64_t not uint_fast64_t,
since this type is used in arrays. This change is more for
documentation than for any practical effect, since the two types
are the same on all known platforms.
|
|
1219c343
|
2019-12-17T21:41:27
|
|
dfa: do not match invalid UTF-8
* lib/dfa.c (struct dfa): Grow utf8_anychar_classes member array
from 5 to 9 tokens; this is needed due to the changes to
add_utf8_anychar.
(charclass_index): 2nd arg is now pointer-to-const.
(add_utf8_anychar): Match only valid UTF-8 byte sequences
instead of allowing overlong encodings or surrogate halves.
|
|
8df5ec4b
|
2019-12-17T14:08:33
|
|
dfa: simplify charclass by assuming C99
* lib/dfa.c (CHARCLASS_WORD_BITS): Now always 64.
(charclass_word): Now always uint_fast64_t.
(CHARCLASS_PAIR): Remove.
(CHARCLASS_INIT): Take 4 arguments instead of 8. All uses changed.
|
|
d19dc900
|
2019-12-17T13:07:15
|
|
dfa: tune via xzalloc
* lib/dfa.c (dfaoptimize): Prefer xzalloc to xmalloc + memset.
|
|
83e9cc89
|
2019-12-17T00:20:53
|
|
dfa: new function dfacopysyntax
* lib/dfa.c (struct dfa): Move syntax member later so
that dfacopysyntax can easily clear earlier members.
(dfacopysyntax): New function, used by Gawk.
|
|
23c8b490
|
2019-12-16T17:00:18
|
|
dfa: remove one dependency on MB_CUR_MAX
* lib/dfa.c (dfamust): No need to refer to MB_CUR_MAX here.
|
|
99969d03
|
2019-12-16T14:38:23
|
|
dfa: remove struct lexer_state.cur_mb_len
* lib/dfa.c (struct lexer_state): Remove cur_mb_len member,
as it’s not needed and the code is simpler without it.
All uses removed.
|
|
ecb700c1
|
2019-12-16T00:32:01
|
|
dfa: make dfasyntax thread-safe
Problem reported by Bruno Haible in:
https://lists.gnu.org/r/bug-gnulib/2019-12/msg00099.html
* lib/dfa.c: Do not include locale.h.
(struct dfa): Remove simple_locale member.
All uses replaced by localeinfo.simple.
(using_simple_locale): Remove; now present (with some
changes) in localeinfo.c.
(dfasyntax): No need to initialize removed member.
|
|
73de34d3
|
2019-12-12T14:11:06
|
|
dfa: prefer ptrdiff_t for API, too
Also, use ‘idx_t’ for ptrdiff_t values that must be nonnegative,
but do this only for internal use for now.
* NEWS: Mention the API change.
* lib/dfa.c (idx_t, IDX_MAX): New type and max value, for internal
use for now. Use them instead of ptrdiff_t and PTRDIFF_MAX for
values known to be nonnegative.
(dfaparse, dfaexec_mb, dfaexec_sb, dfaexec_noop, dfaexec):
Prefer idx_t or ptrdiff_t to size_t for API.
* lib/dfa.h (dfaparse, dfacomp, dfaexec):
Prefer ptrdiff_t to size_t for API.
|
|
c532bd73
|
2019-12-11T15:08:35
|
|
dfa: prefer signed integers for internals
Signed integers can be checked more easily for integer overflow.
* lib/dfa.c (position, struct lexer_state, struct parser_state)
(struct dfa, mbs_to_wchar, fetch_wc, parse_bracket_exp)
(struct lexptr, lex, addtok_mb, add_utf8_anychar, atom)
(nsubtoks, copytoks, closure, alloc_position_set, delete)
(replace, state_index, epsclosure, charclass_context)
(state_separate_contexts, merge_nfa_state, dfaoptimize)
(dfaanalyze, build_state, dfaexec_main, dfa_supported)
(maybe_disable_superset_dfa, dfassbuild, dfafree, enlist)
(comsubs, inboth, allocmust):
Prefer a signed to an unsigned integer when calculating indexes,
unless the integer is part of the external API (a bigger deal,
and to be done later).
|
|
d8322e93
|
2019-12-11T13:44:03
|
|
dfa: fix index overflow
* lib/dfa.c (compare): Avoid integer overflow when analyzing
very large regular expressions.
|
|
360cbd3b
|
2019-12-11T13:40:01
|
|
dfa: update commentary for previous change
* NEWS: Mention the change.
* lib/dfa.c, lib/dfa.h (dfaparse, dfamust, dfacomp): Update comments.
|
|
ac924922
|
2019-12-11T12:53:09
|
|
dfa: separate parse and compile phase
‘dfamust’ must be called after parsing and before tokens are
reordered, but both are executed in the compilation phase.
Token reordering was introduced in Gnulib commit
2018-10-22T15:01:08Z!noritnk@kcn.ne.jp
(5c7a0371823876cca7a1347fa09ca26bbbff0c98).
* lib/dfa.c (dfaparse): Change it to global function.
(dfacomp): If first argument is NULL, skip parse.
* lib/dfa.h: (dfaparse): Add a prototype.
|
|
e6633650
|
2019-01-01T00:25:11
|
|
maint: Run 'make update-copyright'
|
|
5d6a3cdd
|
2018-12-20T19:51:48
|
|
revert v0.1-2213-gae4b73e28 and part of v0.1-2281-g95cd86dd7
v0.1-2213-gae4b73e28 caused a regression in grep-3.2 (no match):
echo '123-x'|LC_ALL=C grep -E '.\bx'
The goal is to revert the first, but reverting it requires to restore
the function deleted in the second. I ran this to restore the deleted
function:
git show v0.1-2281-g95cd86dd7 lib/dfa.c \
| perl -0777 -pe 's/^@@[^\n]*dfaan.*//ms' \
| patch -R -p1
* lib/dfa.c (charclass_context): Restore deleted function.
Reverting the primary commit removes this change:
dfa: Simplify a building state
* lib/dfa.c (build_state): Simplify a building state.
|
|
95cd86dd
|
2018-12-15T10:09:35
|
|
dfa: avoid new warnings from gcc
These would prevent building with -Werror and a Dec snapshot of gcc.
* lib/dfa.c (dfaanalyze): Avoid shadowing warnings for "pos".
Rename each inner instance to "p".
(charclass_context): Remove unused static function.
|
|
ae4b73e2
|
2018-10-23T00:02:16
|
|
dfa: Simplify a building state
dfa.c (build_state): Simplify a building state.
|
|
5c7a0371
|
2018-10-23T00:01:08
|
|
dfa: reorder tokens before execution
Reorder tokens before execution. It improves efficiency to access
memory in building states. For example, A(BCD|E(F|G)|HI) are reorderda
as following.
(Before reorder)
A:1 - B:2 - C:3 - D:4
` E:5 - F:6
` G:7
` H:8 - I:9
(After reorder)
A:1 - B:2 - C:5 - D:6
` E:3 - F:7
` G:8
` H:4 - I:9
dfa.c (compare, reorder_tokens): New function.
(reorder_tokens): Call them.
|
|
fb03ea32
|
2018-10-22T23:51:20
|
|
dfa: a state has a set of current positions.
Up to now, a state had a set of follow-on positions. It is replaced a
set of current positions. This change will save memory space.
dfa.c (leaf_set): Remove it.
(struct dfa): Add new member constraints and separates.
(append): New function.
(state_index): Bring constraint from pre-calculated.
(state_separate_contexts): Bring separate contexts from pre-calculated.
Change argument, update callers.
(merge_nfa_state): Pre-calculate constraints for END. and remove END.
No longer END is not used after here.
(dfaoptimize): Initialize added member constraints.
(dfaanalyze): Pre-calculate seprate contexts.
(build_state): Change for this update.
(dfassbuild): Initialize new members .
(dfafree): Free memory for new members.
|
|
a37603c6
|
2018-10-22T23:35:50
|
|
dfa: simplify dfa optimization
dfa.c (merge_nfa_state, dfaoptimize): Simplify dfa optimization.
|
|
eb8189b3
|
2018-10-22T23:31:26
|
|
dfa: position set sorts increasing order
Change the order of position set from decreasing to increaing, then even
after dfa is optimized, it is guaranteed that the number of a position
is smaller than the subsequent one's number.
dfa.c (insert, merged_constrained, delete): Reverse the direction of an
inequality sign.
(dfaanalyze): Position set sorts increasing order.
|
|
8ef5a439
|
2018-10-22T23:22:40
|
|
dfa: remove unneeded code
By the addition of beg, a code for the initial state is unnecessary, so
remove it.
dfa.c (epsclosure): Remove a code for the initial state.
(dfaanalyze): Print follows for BEG in debug mode.
|
|
617a6097
|
2018-09-19T08:35:49
|
|
dfa: optimization for state merge
* lib/dfa.c (merge2): New function.
(merge_nfa_state): Use it.
|
|
5007177e
|
2018-09-18T21:26:01
|
|
dfa: trivial comment fix: s/is/if/
* lib/dfa.c (maybe_disable_superset_dfa): Fix comment typo.
|
|
bc7c0f01
|
2018-09-18T19:12:06
|
|
dfa: use more-informative function name
* lib/dfa.c (maybe_disable_superset_dfa):
Rename from dfautf8noss. Use change.
|
|
28d7f171
|
2018-09-18T19:05:26
|
|
dfa: tweak allocation performance
* lib/dfa.c (merge_nfa_state, dfaoptimize):
Prefer ptrdiff_t for indexes some more.
Use char for flags, as it’s wide enough.
Allocate queue and flags together, with one malloc call.
No need to use xnmalloc since the multiplication and
addition cannot overflow (it’s already been checked by
earlier allocation). Prefer memset to open-coding.
|
|
cbf5523b
|
2018-09-18T18:24:27
|
|
dfa: prune states as we go
* lib/dfa.c (prune): Remove.
(merge_nfa_state): Prune as we go instead of at the end.
Prefer ptrdiff_t for indexes, as this helps the compiler a bit.
Simplify constraint checking a bit.
|
|
5eae16cf
|
2018-09-18T15:25:51
|
|
dfa: reorder enum for efficiency
* lib/dfa.c (END): Now -1 again. Reorder other elements
of the enumeration to make it easier for GCC to generate
efficient code by using fewer comparisons to check for
ranges of values.
(atom): Take advantage of the reordering.
|
|
4299106c
|
2018-09-18T10:07:23
|
|
dfa: optimize alternation in NFA
Even when similar states exist in alternation, the DFA treats them
as separate items, which may complicate the transition in NFA and
cause slowdown. This change assembles the states into one. For
example, ab|ac is changed into a(b|c). This change speeds-up
matching for many branched patterns. For example, grep speeds up
more than 30× in:
seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in
time -p env LC_ALL=C grep -vf in in
* lib/dfa.c (prune): New function.
(merge_nfa_state): New function. It merges similar NFA states.
(dfaoptimize): New function. It seeks merged and removed nodes.
(dfaanalyze): Call new function.
(dfautf8noss): Change name from dfaoptimize because of addition of new
function.
(dfacomp): Update caller.
|
|
e648401b
|
2018-09-18T09:58:22
|
|
dfa: simplify initial state
Simplifying the initial state enables easier optimization of the NFA.
* lib/dfa.c (enum token): Add new element BEG.
(prtok): Adjust due to adding element BEG.
(dfaparse): Put BEG at a head of tokens.
(state_index): Adjust due to adding element BEG.
(dfaanalyze): Concatenate BEG to other tokens, and simplify to
build initial state.
(dfamust): Adjust due to adding element BEG. DFAMUST ignores it.
|
|
c5e76a95
|
2018-08-01T22:14:21
|
|
dfa: fix memory leak
* lib/dfa.c (dfafree): Add missing free() on dfa->superset.
|
|
d7cf3b8d
|
2018-06-25T21:45:23
|
|
Continue to use spaces for indentation, not tabs.
|
|
281b825e
|
2018-01-01T00:57:25
|
|
maint: Run 'make update-copyright'
|
|
809f19db
|
2017-05-19T17:02:56
|
|
dfa: two small simplifications
* lib/dfa.c (build_state): Avoid repeating longer expressions.
|
|
5e22aee7
|
2017-05-16T09:23:52
|
|
manywarnings: update for GCC 7
* build-aux/gcc-warning.spec:
* m4/manywarnings.m4 (gl_MANYWARN_ALL_GCC):
Add GCC 7 warnings, notably -Wimplicit-fallthrough=5, which
requires a non-comment fallthrough attribute. This is a bit
cleaner than the comment versions.
* lib/strftime.c, lib/dfa.c, lib/fnmatch.c, lib/mbrtowc.c:
* lib/vasnprintf.c, tests/macros.h (FALLTHROUGH): New macro.
Use it whenever one switch case falls through into the next.
|
|
6afba02d
|
2017-03-21T19:05:17
|
|
dfa: make [0-9] faster in non-C locales
Problem reported by John P. Linderman (Bug#26193).
* lib/dfa.c (parse_bracket_exp): Remove redundant assignment.
If both ends of the range are ASCII digits, do not worry about
multi-character collating sequences and the like. Be consistent
about using isalpha as a precondition for setbit_case_fold_c.
|
|
e0e362e7
|
2017-01-19T07:44:13
|
|
dfa: fix memory leak in parse
Problem reported by Arnold Robbins in:
http://lists.gnu.org/archive/html/bug-grep/2017-01/msg00006.html
* lib/dfa.c (epsclosure): Do it.
|
|
61c27fe8
|
2017-01-15T17:18:10
|
|
dfa: port to gcc -fsanitize=undefined
* lib/dfa.c (copy): Don’t pass NULL with size 0 to memcpy,
as this runs afoul of gcc -fsanitize=undefined.
|
|
836ac768
|
2017-01-10T01:27:44
|
|
dfa: minor simplification with emptyset
* lib/dfa.c (build_state): Simplify by using emptyset.
|
|
2db74de1
|
2017-01-09T22:48:46
|
|
dfa: shrink constraints from 4 bits to 3
* lib/dfa.c (newline_constraint, letter_constraint)
(other_constraint, prev_newline_dependent)
(prev_letter_dependent, NO_CONSTRAINT, BEGLINE_CONSTRAINT)
(ENDLINE_CONSTRAINT, BEGWORD_CONSTRAINT, ENDWORD_CONSTRAINT)
(LIMWORD_CONSTRAINT, NOTLIMWORD_CONSTRAINT):
Constraints need only 3 bits, not 4. Using smaller integers
shrinks the code a bit and makes grep a tad faster on x86-64.
|
|
9d561f0d
|
2017-01-09T20:28:11
|
|
dfa: omit unnecessary ptrdiff_t check
* lib/dfa.c (alloc_position_set): Do not worry about ptrdiff_t
overflow, since xnmalloc does that now.
|
|
7fbe8c09
|
2017-01-09T20:26:02
|
|
dfa: omit unnecessary allocation
* lib/dfa.c (dfaanalyze): Do not allocate follow set, since
an all-zero follow set works just fine.
|
|
8d3c4933
|
2017-01-09T16:30:41
|
|
dfa: omit unused local
* lib/dfa.c (build_state): Fix up recent change.
|
|
7c345c68
|
2017-01-09T08:21:21
|
|
dfa: melt down dfastate into build_state
* src/dfa.c (dfastate): Remove it.
(build_state): Insert content of dfastate() to bottom.
|
|
aff55692
|
2017-01-09T07:46:13
|
|
dfa: simplify transition table allocation
* src/dfa.c (realloc_trans_if_necessary): Remove second argument.
Its value is derived from other variable. Update callers.
(dfastate): Remove calculation of max number of state.
|
|
823b5cb5
|
2017-01-08T12:44:29
|
|
dfa: fix reallocation bug when matching newlines
Problem reported for sed by S. Gilles (Bug#25390).
* lib/dfa.c (realloc_trans_if_necessary): Move earlier.
(dfastate): Reallocate before moving any newline transition ...
(build_state): ... instead of reallocating here, where it is too late.
|
|
f0f371e1
|
2017-01-06T12:49:39
|
|
dfa: fix 'return' typo
Problem reported by Nelson H. F. Beebe.
* lib/dfa.c (merge): Fix typo that Sun compilers rejected.
|
|
74557b94
|
2017-01-02T12:22:17
|
|
dfa: prefer functions to FETCH_WC macro
* lib/dfa.c (FETCH_WC): Remove, replacing with ...
(fetch_wc, bracket_fetch_wc): ... new functions. These store the
wint_t result into DFA->lex.wctok instead of to a separate arg.
All callers changed. Move more local decls closer to where
they're used.
|
|
c8355b77
|
2017-01-02T10:49:17
|
|
dfa: narrow more local var scopes
* lib/dfa.c: Move more local decls to be more local.
|
|
5c810462
|
2017-01-02T08:27:12
|
|
dfa: remove duplicate assignment
Problem reported by Bruno Haible in:
http://lists.gnu.org/archive/html/bug-gnulib/2017-01/msg00007.html
* lib/dfa.c (parse_bracket_exp): Simplify.
|
|
e210a3cb
|
2017-01-01T21:43:26
|
|
dfa: simplify constraint-dependency checking
* lib/dfa.c (prev_newline_constraint, prev_letter_constraint)
(prev_other_constraint): Remove.
(prev_newline_dependent, prev_letter_dependent):
Simplify, to avoid an unnecessary bitwise AND operation.
|
|
760d5b7d
|
2017-01-01T21:23:18
|
|
dfa: prefer functions and constants to macros
* lib/dfa.c: Prefer constants to macros where either will do.
(streq, isasciidigit, newline_constraint)
(letter_constraint, other_constraint, succeeds_in_context)
(prev_newline_constraint, prev_letter_constraint)
(prev_other_constraint, prev_newline_dependent)
(prev_letter_dependent, accepting, accepts_in_context):
Now static functions instead of function-like macros.
Use lower-case names accordingly. All uses changed.
|
|
51536cbb
|
2017-01-01T20:51:46
|
|
dfa: narrow more local var scopes
* lib/dfa.c: Move some more local decls down to nearer where
they're needed.
|
|
387fd77e
|
2016-12-31T08:06:24
|
|
dfa: narrow the scope of many local variables
* lib/dfa.c: Now that we are no longer constrained to c89, move
declarations of many variables (often indices) "down" into the
scope(s) where used or to the point of definition. This is a
no-semantic-change diff.
|
|
a3fd683d
|
2017-01-01T02:59:23
|
|
version-etc: new year
* build-aux/gendocs.sh (version):
* doc/gendocs_template:
* doc/gendocs_template_min:
* doc/gnulib.texi:
* lib/version-etc.c (COPYRIGHT_YEAR):
Update copyright dates by hand in templates and the like.
* all files: Run 'make update-copyright'.
|
|
b724c4e6
|
2016-12-30T00:57:21
|
|
dfa: shorten sbit, success
* lib/dfa.c (struct regex_syntax.sbit):
(struct dfa.success): Use char, not int, for array elements, since
they are all in the range 0..7.
|
|
e0a498ec
|
2016-12-30T00:42:22
|
|
dfa: simplify multibyte_prop etc.
This follows up on a change made when dfa.c was in grep, namely grep
commit c797046c7c13c2647182b919a79a4c5b4ecf82b1
dated 2015-08-12 07:35:03 -0700, which removed unused multibyte support.
That earlier simplification allows for some more simplification
and trimming down here.
* lib/dfa.c (struct mb_char_classes): New member nchars_alloc.
(struct lexer_state): New mamber brack.
(struct dfa, addtok_mb): multibyte_prop elements are now char, not int,
since they must be in the range 0..3 now.
Remove members mbcsets, nmbcsets, mbcsets_alloc, since
the brack member now supersedes them.
(parse_bracket_exp): Update dfa->lex.brack instead of dfa->mbcsets.
(addtok): Use dfa->lex.brack instead of dfa->mbcsets.
(dfaparse): Remove unnecessary initializations of already-0 storage.
(free_mbdata): Free d->lex.brack.chars instead of d->mbcsets.
(dfassbuild): No need to clear sup->mbcsets.
|
|
959c5a30
|
2016-12-29T23:02:45
|
|
dfa: minor performance tweak
* lib/dfa.c (setbit_wc): Test < 0, not == EOF.
|
|
88125b5e
|
2016-12-29T21:44:17
|
|
dfa: wrap charclass inside a struct
This lets GCC check for aliases more accurately.
On my platform (gcc Ubuntu 5.4.0-6ubuntu1~16.04.4 x86-64,
en_US.utf8 locale) this makes 'grep -Fi -f list.txt list.txt >out'
about 1% faster, where list.txt is generated by 'aspell dump
master | head -n 100000 >list.txt'. Also, it shrinks the grep
text by 64 bytes, woohoo! See Bug#22239.
* lib/dfa.c (charclass): Wrap inside a struct. All uses changed.
(CHARCLASS_INIT, tstbit, setbit, clrbit, zeroset, fillset, notset)
(equal, emptyset, charclass_index, setbit_wc, setbit_case_fold_c):
Adjust to this, e.g., by using charclass * rather than charclass.
All callers changed as needed.
(copyset): Remove. All uses changed to simple assignment.
(parse_bracket_exp): Use zeroset instead of memset.
|
|
b783f1eb
|
2016-12-18T17:35:19
|
|
dfa: improve worst-case 'replace' performance
See my note in Bug#22357#71.
* lib/dfa.c (insert, delete): Rework to avoid duplicate test.
(merge_constrained): New function, which is like
the old 'merge' function, except with a new argument C2.
Simplify the body by avoiding the need for different sections
of code depending on whether one input is exhausted.
(merge): Use the new function.
(delete): Return the constraint of the deleted position,
not the entire position. Caller changed.
(replace): Change from O(N*(N + log N)) to O(N log N) algorithm.
|
|
d6df3873
|
2016-12-14T16:31:57
|
|
dfa: performance improvement for removal of epsilon closure
* lib/dfa.c (delete): Use binary search to find deleted index.
(replace): New function. It replaces a position with the followed set.
(epsclosure): Replace it with a new algorithm. Update caller.
|