|
28492cce
|
2005-08-31T22:51:09
|
|
On 64-bit hosts (where size_t is 64 bits and int is 32 bits), the
old glibc regex code mishandles strings longer than 2**31 bytes.
This patch fixes this when the regex code is used in gnulib
(i.e., outside glibc).
* lib/regex.h (_REGEX_LARGE_OFFSETS): New feature-test macro,
governing whether the rest of this patch is active. By default,
the macro is disabled and the patch has no effect.
(regoff_t) [defined _REGEX_LARGE_OFFSETS]: Define to off_t, not int.
(__re_idx_t, __re_size_t, __re_long_size_t): New types.
(struct re_pattern_buffer, re_search, re_search_2, re_match):
(re_match_2, re_set_registers): Use the new types.
* lib/regex_internal.h (Idx, re_hashval_t): New types.
(REG_MISSING, REG_ERROR, REG_VALID_INDEX, REG_VALID_NONZERO_INDEX):
New macros.
(re_node_set, re_charset_t, re_token_t, re_string_realloc_buffers):
(re_string_context_at, bin_tree_t, re_dfastate_t):
(struct re_state_table_entry, state_array_t, re_sub_match_last_t):
(re_sub_match_top_t, re_match_context_t, re_sift_context_t):
(struct re_fail_stack_ent_t, struct re_fail_stack_t, struct re_dfa_t):
(re_string_char_size_at, re_string_wchar_at):
(re_string_elem_size_at):
Use the new types and macros to port to 64-bit hosts.
Use unsigned types for internal values, so that the code
mostly works even for arrays larger than SSIZE_MAX.
* lib/regcomp.c (re_compile_internal, init_dfa, duplicate_node):
(search_duplicated_node, calc_eclosure_iter, fetch_number):
(parse_reg_exp, parse_branch, parse_expression, parse_sub_exp):
(build_equiv_class, build_charclass, re_compile_fastmap_iter):
(free_dfa_content, create_initial_state, optimize_utf8, analyze):
(optimize_subexps, calc_first, link_nfa_nodes, duplicate_node_closure):
(calc_inveclosure, parse_dup_op, build_range_exp):
(build_collating_symbol, parse_bracket_exp, build_charclass_op):
(fetch_number, create_token_tree, mark_opt_subexp):
Likewise.
* lib/regex_internal.c
(re_string_construct_common, create_ci_newstate):
(create_cd_newstate, re_string_allocate, re_string_construct):
(re_string_realloc_buffers, build_wcs_upper_buffer):
(re_string_skip_chars, build_upper_buffer, re_string_translate_buffer):
(re_string_reconstruct, re_string_peek_byte_case):
(re_string_fetch_byte_case, re_string_context_at):
(re_node_set_alloc, re_node_set_init_1, re_node_set_init_2):
(re_node_set_init_copy, re_node_set_add_intersect):
(re_node_set_init_union, re_node_set_merge, re_node_set_insert):
(re_node_set_insert_last, re_node_set_compare, re_node_set_contains):
(re_node_set_remove_at, re_dfa_add_node, calc_state_hash):
(re_acquire_state, re_acquire_state_context, register_state):
Likewise.
* lib/regex.c
(match_ctx_init, match_ctx_add_entry, search_cur_bkref_entry):
(match_ctx_add_subtop, match_ctx_add_sublast, sift_ctx_init):
(re_search_internal, re_search_2_stub, re_search_stub)
(re_copy_regs, check_matching, check_halt_state_context, update_regs):
(push_fail_stack, sift_states_iter_mb, build_sifted_states):
(update_cur_sifted_state, check_dst_limits):
(check_dst_limits_calc_pos_1, check_dst_limits_calc_pos):
(check_subexp_limits, sift_states_bkref, merge_state_array):
(check_subexp_matching_top, get_subexp, get_subexp_sub):
(find_subexp_node, check_arrival, check_arrival_add_next_nodes):
(check_arrival_expand_ecl, check_arrival_expand_ecl_sub):
(expand_bkref_cache, check_node_accept_bytes):
(group_nodes_into_DFAstates, check_node_accept, regexec, re_match):
(re_search, re_match_2, re_search_2, prune_impossible_nodes):
(acquire_init_state_context, check_halt_node_context):
(proceed_next_node, pop_fail_stack, set_regs, free_fail_stack_return):
(sift_states_backward, clean_state_log_if_needed):
(sub_epsilon_src_nodes, add_epsilone_src_nodes, merge_state_with_log):
(find_recover_state, transit_state_sb, transit_state_mb):
(transit_state_bkref, build_trtable, match_ctx_clean):
Likewise.
* lib/regcomp.c (parse_dup_op): Add an extra test if Idx is unsigned,
to work around an assumption that REG_MISSING is negative.
* m4/regex.m4 (gl_REGEX): Require AC_SYS_LARGEFILE, Define
_REGEX_LARGE_OFFSETS). Test for regoff_t/off_t bug in 64-bit
and large-file glibc and in 32-bit large-file Solaris.
* config/srclist.txt: Add glibc bug 1281.
|
|
083768e3
|
2005-08-25T05:08:59
|
|
* config/srclist.txt: Add glibc bug 1240.
* lib/regcomp.c (regerror): 2nd arg is 'restrict', as per POSIX.
* lib/regex.h (regerror): Likewise.
|
|
dad0bacf
|
2005-08-24T23:09:33
|
|
[_REGEX_SOURCE]: Define re_fastmap_accurate too; this was
inadvertently omitted from the previous patch.
|
|
d6103c5e
|
2005-08-24T22:29:38
|
|
* config/srclist.txt:
Remove glibc bug 1233 and add 1236, which supersedes it.
* lib/regex.h: Fix a multitude of POSIX name space violations.
These changes have an effect only for programs that define
_POSIX_C_SOURCE, _POSIX_SOURCE, or _XOPEN_SOURCE; they
do not change anything for programs compiled in the normal way.
Also, there is no effect on the ABI.
(_REGEX_SOURCE): New macro.
Do not include <stddef.h> if _XOPEN_SOURCE and VMS are both
defined and _GNU_SOURCE is not; this fixes a name space violation.
Rename the following macros to obey POSIX requirements.
The old names are still visible as macros if _REGEX_SOURCE is defined.
(REG_BACKSLASH_ESCAPE_IN_LISTS): renamed from
RE_BACKSLASH_ESCAPE_IN_LISTS.
(REG_BK_PLUS_QM): renamed from RE_BK_PLUS_QM.
(REG_CHAR_CLASSES): renamed from RE_CHAR_CLASSES.
(REG_CONTEXT_INDEP_ANCHORS): renamed from RE_CONTEXT_INDEP_ANCHORS.
(REG_CONTEXT_INDEP_OPS): renamed from RE_CONTEXT_INDEP_OPS.
(REG_CONTEXT_INVALID_OPS): renamed from RE_CONTEXT_INVALID_OPS.
(REG_DOT_NEWLINE): renamed from RE_DOT_NEWLINE.
(REG_DOT_NOT_NULL): renamed from RE_DOT_NOT_NULL.
(REG_HAT_LISTS_NOT_NEWLINE): renamed from RE_HAT_LISTS_NOT_NEWLINE.
(REG_INTERVALS): renamed from RE_INTERVALS.
(REG_LIMITED_OPS): renamed from RE_LIMITED_OPS.
(REG_NEWLINE_ALT): renamed from RE_NEWLINE_ALT.
(REG_NO_BK_BRACES): renamed from RE_NO_BK_BRACES.
(REG_NO_BK_PARENS): renamed from RE_NO_BK_PARENS.
(REG_NO_BK_REFS): renamed from RE_NO_BK_REFS.
(REG_NO_BK_VBAR): renamed from RE_NO_BK_VBAR.
(REG_NO_EMPTY_RANGES): renamed from RE_NO_EMPTY_RANGES.
(REG_UNMATCHED_RIGHT_PAREN_ORD): renamed from
RE_UNMATCHED_RIGHT_PAREN_ORD.
(REG_NO_POSIX_BACKTRACKING): renamed from RE_NO_POSIX_BACKTRACKING.
(REG_NO_GNU_OPS): renamed from RE_NO_GNU_OPS.
(REG_DEBUG): renamed from RE_DEBUG.
(REG_INVALID_INTERVAL_ORD): renamed from RE_INVALID_INTERVAL_ORD.
(REG_IGNORE_CASE): renamed from RE_ICASE. This renaming is a bit
unusual, since we can't clash with the POSIX REG_ICASE.
(REG_CARET_ANCHORS_HERE): renamed from RE_CARET_ANCHORS_HERE.
(REG_CONTEXT_INVALID_DUP): renamed from RE_CONTEXT_INVALID_DUP.
(REG_NO_SUB): renamed from RE_NO_SUB.
(REG_SYNTAX_EMACS): renamed from RE_SYNTAX_EMACS.
(REG_SYNTAX_AWK): renamed from RE_SYNTAX_AWK.
(REG_SYNTAX_GNU_AWK): renamed from RE_SYNTAX_GNU_AWK.
(REG_SYNTAX_POSIX_AWK): renamed from RE_SYNTAX_POSIX_AWK.
(REG_SYNTAX_GREP): renamed from RE_SYNTAX_GREP.
(REG_SYNTAX_EGREP): renamed from RE_SYNTAX_EGREP.
(REG_SYNTAX_POSIX_EGREP): renamed from RE_SYNTAX_POSIX_EGREP.
(REG_SYNTAX_ED): renamed from RE_SYNTAX_ED.
(REG_SYNTAX_SED): renamed from RE_SYNTAX_SED.
(_REG_SYNTAX_POSIX_COMMON): renamed from _RE_SYNTAX_POSIX_COMMON.
(REG_SYNTAX_POSIX_BASIC): renamed from RE_SYNTAX_POSIX_BASIC.
(REG_SYNTAX_POSIX_MINIMAL_BASIC): renamed from
RE_SYNTAX_POSIX_MINIMAL_BASIC.
(REG_SYNTAX_POSIX_EXTENDED): renamed from RE_SYNTAX_POSIX_EXTENDED.
(REG_SYNTAX_POSIX_MINIMAL_EXTENDED): renamed from
RE_SYNTAX_POSIX_MINIMAL_EXTENDED.
(REG_DUP_MAX): renamed from RE_DUP_MAX. No need to undef it.
(REG_UNALLOCATED): Renamed from REGS_UNALLOCATED.
(REG_REALLOCATE): Renamed from REGS_REALLOCATE.
(REG_FIXED): Renamed from REGS_FIXED.
(REG_NREGS): Renamed from RE_NREGS.
(REG_ICASE, REG_NEWLINE, REG_NOSUB): Do not depend on the values
of other REG_* macros, since POSIX says the user is allowed to
#undef these macros selectively.
(reg_errcode_t): Update comment stating what other tables need
to be consistent.
Rename the following enum values to obey POSIX requirements.
The old names are still visible as macros.
(_REG_ENOSYS): Renamed from REG_ENOSYS. Define even if _XOPEN_SOURCE
is not defined, since GNU is supposed to be a superset of POSIX as
much as possible, and since we want reg_errcode_t to be a signed
type for implementation consistency.
(_REG_NOERROR): Renamed from REG_NOERROR.
(_REG_NOMATCH): Renamed from REG_NOMATCH.
(_REG_BADPAT): Renamed from REG_BADPAT.
(_REG_ECOLLATE): Renamed from REG_ECOLLATE.
(_REG_ECTYPE): Renamed from REG_ECTYPE.
(_REG_EESCAPE): Renamed from REG_EESCAPE.
(_REG_ESUBREG): Renamed from REG_ESUBREG.
(_REG_EBRACK): Renamed from REG_EBRACK.
(_REG_EPAREN): Renamed from REG_EPAREN.
(_REG_EBRACE): Renamed from REG_EBRACE.
(_REG_BADBR): Renamed from REG_BADBR.
(_REG_ERANGE): Renamed from REG_ERANGE.
(_REG_ESPACE): Renamed from REG_ESPACE.
(_REG_BADRPT): Renamed from REG_BADRPT.
(_REG_EEND): Renamed from REG_EEND.
(_REG_ESIZE): Renamed from REG_ESIZE.
(_REG_ERPAREN): Renamed from REG_ERPAREN.
(REG_ENOSYS, REG_NOERROR, REG_NOMATCH, REG_BADPAT, REG_ECOLLATE):
(REG_ECTYPE, REG_EESCAPE, REG_ESUBREG, REG_EBRACK, REG_EPAREN):
(REG_EBRACE, REG_BADBR, REG_ERANGE, REG_ESPACE, REG_BADRPT, REG_EEND):
(REG_ESIZE, REG_ERPAREN): Now macros, not enum constants.
(_REG_RE_NAME, _REG_RM_NAME): New macros.
(REG_TRANSLATE_TYPE): Renamed from RE_TRANSLATE_TYPE. All uses
changed. But support the old name if the new one is not defined
and if _REGEX_SOURCE.
Change the following member names in struct re_pattern_buffer.
The old names are still supported if !_REGEX_SOURCE.
The new names are always supported, regardless of _REGEX_SOURCE.
(re_buffer): Renamed from buffer.
(re_allocated): Renamed from allocated.
(re_used): Renamed from used.
(re_syntax): Renamed from syntax.
(re_fastmap): Renamed from fastmap.
(re_translate): Renamed from translate.
(re_can_be_null): Renamed from can_be_null.
(re_regs_allocated): Renamed from regs_allocated.
(re_fastmap_accurate): Renamed from fastmap_accurate.
(re_no_sub): Renamed from no_sub.
(re_not_bol): Renamed from not_bol.
(re_not_eol): Renamed from not_eol.
(re_newline_anchor): Renamed from newline_anchor.
Change the following member names in struct re_registers.
The old names are still supported if !_REGEX_SOURCE.
The new names are always supported, regardless of _REGEX_SOURCE.
(rm_num_regs): Renamed from num_regs.
(rm_start): Renamed from start.
(rm_end): Renamed from end.
(re_set_syntax, re_compile_pattern, re_compile_fastmap):
(re_search, re_search_2, re_match, re_match_2, re_set_registers):
Prepend __ to parameter names.
|
|
9828dc4e
|
2005-08-23T20:37:24
|
|
* config/srclist.txt: Add glibc bug 1233.
* lib/regex.h (REG_NOSYS)
[!defined _XOPEN_SOURCE && 200112L <= _POSIX_C_SOURCE]:
Define, since POSIX requires it as of 2001.
(_REG_ENOSYS) [! (defined _XOPEN_SOURCE || 200112L <= _POSIX_C_SOURCE)]:
New private symbol, used to keep the enum signed in all cases.
|
|
b5fb7004
|
2005-08-23T19:11:45
|
|
* config/srclist.txt: Add glibc bug 1232.
* lib/regex.h (RE_NO_EMPTY_RANGES): Fix doc bug reported by James Youngman
in <http://lists.gnu.org/archive/html/bug-gnulib/2005-07/msg00132.html>.
|
|
087e9e5b
|
2005-08-20T07:42:15
|
|
* config/srclist.txt: Add glibc bugs 1220, 1221, 1222.
* lib/regcomp.c:
(re_compile_pattern, re_set_syntax, re_compile_fastmap):
(re_compile_fastmap_iter, regcomp, regerror, regfree):
(re_compile_internal, init_dfa, init_word_char, free_workarea_compile):
(create_initial_state, optimize_utf8, analyze, postorder, preorder):
(optimize_subexps, lower_subexps, lower_subexp, calc_first, calc_next):
(link_nfa_nodes, duplicate_node_closure, search_duplicated_node):
(duplicate_node, calc_inveclosure, calc_eclosure, calc_eclosure_iter):
(fetch_token, peek_token, peek_token_bracket, parse, parse_reg_exp):
(parse_branch, parse_expression, parse_sub_exp, parse_dup_op):
(build_range_exp, build_collating_symbol, parse_bracket_exp):
(parse_bracket_element, parse_bracket_symbol, build_equiv_class):
(build_charclass, build_charclass_op, fetch_number, create_tree):
(create_token_tree, mark_opt_subexp, duplicate_tree):
Use prototypes rather than old-style definitions.
* lib/regex_internal.c:
(re_string_allocate, re_string_construct, re_string_realloc_buffers):
(re_string_construct_common, build_wcs_buffer, build_wcs_upper_buffer):
(re_string_skip_chars, build_upper_buffer, re_string_translate_buffer):
(re_string_reconstruct, re_string_peek_byte_case):
(re_string_fetch_byte_case, re_string_destruct, re_string_context_at):
(re_node_set_alloc, re_node_set_init_1, re_node_set_init_2):
(re_node_set_init_copy, re_node_set_add_intersect):
(re_node_set_init_union, re_node_set_merge, re_node_set_insert):
(re_node_set_insert_last, re_node_set_compare, re_node_set_contains):
(re_node_set_remove_at, re_dfa_add_node, calc_state_hash):
(re_acquire_state, re_acquire_state_context, register_state):
(create_ci_newstate, create_cd_newstate, free_state):
Likewise.
* lib/regexec.c (regexec, re_match, re_search, re_match_2, re_search_2):
(re_search_2_stub, re_search_stub, re_copy_regs, re_set_registers):
(re_search_internal, prune_impossible_nodes):
(acquire_init_state_context, check_matching, static):
(check_halt_node_context, check_halt_state_context, proceed_next_node):
(push_fail_stack, pop_fail_stack, set_regs, free_fail_stack_return):
(update_regs, sift_states_backward, build_sifted_states):
(clean_state_log_if_needed, merge_state_array):
(update_cur_sifted_state, add_epsilon_src_nodes):
(sub_epsilon_src_nodes, check_dst_limits, check_dst_limits_calc_pos_1):
(check_dst_limits_calc_pos, check_subexp_limits, sift_states_bkref):
(sift_states_iter_mb, transit_state, merge_state_with_log, static):
(find_recover_state, check_subexp_matching_top, transit_state_mb):
(transit_state_bkref, get_subexp, get_subexp_sub, find_subexp_node):
(check_arrival, check_arrival_add_next_nodes):
(check_arrival_expand_ecl, check_arrival_expand_ecl_sub):
(expand_bkref_cache, build_trtable, group_nodes_into_DFAstates):
(check_node_accept_bytes, check_node_accept, extend_buffers):
(match_ctx_init, match_ctx_clean, match_ctx_free, match_ctx_add_entry):
(search_cur_bkref_entry, match_ctx_add_subtop, match_ctx_add_sublast):
(sift_ctx_init):
Likewise.
* lib/regex_internal.h:
(re_string_allocate, re_string_construct, re_string_reconstruct):
(re_string_realloc_buffers, build_wcs_buffer, build_wcs_upper_buffer):
(build_upper_buffer, re_string_translate_buffer, re_string_destruct):
(re_string_elem_size_at, re_string_char_size_at, re_string_wchar_at):
(re_string_context_at, re_string_peek_byte_case):
(re_string_fetch_byte_case): Declare even if RE_NO_INTERNAL_PROTOTYPES
is defined, since we now use prototypes always.
* lib/regex.h (_RE_ARGS): Remove. No longer needed, since we assume
C89 or better. All uses removed.
|
|
6caf406f
|
2005-08-18T05:08:05
|
|
Remove useless space-before-tab.
|
|
7277ed5a
|
2005-08-16T00:07:03
|
|
* config/srclist.txt: Comment out $LIBCSRC/posix/regex.h.
Add comments for each pending glibc patch.
* lib/regex.h (__restrict_arr): Don't define to __restrict if
__cplusplus is defined.
|
|
151e40bb
|
2005-07-07T08:08:39
|
|
* modules/regex (Files): Add lib/regex_internal.c,
lib/regex_internal.h, lib/regexec.c, lib/regcomp.c, m4/codeset.m4.
(Depends-on): Add extensions.
(Makefile.am): Remove lib_SOURCES; now done by m4 code.
* config/srclist.txt: Add regcomp.c, regex.c, regex.h, regex_internal.c,
regexec.c.
Add regex_internal.h too, but as a comment, since the libc version
is currently broken in gnulib mode.
* lib/regex.c, lib/regex.h: Sync from libc.
* lib/regcomp.c, lib/regexec_internal.c, lib/regex_internal.h, lib/regexec.c:
New files, synced from libc, except that regex_internal.h
currently has a small porting fix.
* m4/regex.m4: Adjust to new libc regex implementation.
(gl_INCLUDED_REGEX): Add AC_LIBSOURCES for
all the .c and .h parts of (the new) regex.
Quote the m4 stuff better.
Check for RE_ICASE bug of old gnulib.
Check for REG_STARTEND of recent libc.
Rename local variables from jm_* to gl_*.
Quote operand of "test -f".
Say "recent enough" version of libc, not "version 2".
(gl_PREREQ_REGEX): Remove AC_FUNC_ALLOCA, since alloca is a
prerequisite module. Remove AC_HEADER_STDC; no longer needed.
Check for locale.h, isblank, mbrtowc, wcrtomb, wcscoll.
Remove check for btowc, isascii.
Require AM_LANGINFO_CODESET.
|
|
6ef9a073
|
2005-05-15T04:45:43
|
|
Sync from coreutils.
* modules/yesno (Depends-on): Add getline.
* gethrxtime.c, gethrxtime.h, getpass.h, mountlist.h, path-concat.c,
regex.h, strtoll.c, unlocked-io.h, xtime.h:
White space changes only.
* makepath.c (make_path): Port to hosts where leading "//" is special.
* yesno.c: Include getline.h, not ctype.h.
(yesno): Don't remove leading white space; POSIX doesn't allow it.
Use getline to remove arbitrary restriction on response length.
|
|
267a39ba
|
2005-05-14T06:03:57
|
|
*** empty log message ***
|
|
d136d930
|
2003-09-10T06:18:22
|
|
Remove K&R cruft.
|
|
9d738dcb
|
2003-08-17T05:30:20
|
|
Undo white space changes of 2003-08-12, allowing us to sync more files
from glibc.
|
|
a2e0479b
|
2003-08-12T23:27:26
|
|
White space fixes from coreutils.
|
|
0438d509
|
2003-08-10T13:54:55
|
|
update regex.h from libc
|
|
202b2af4
|
2003-08-09T08:57:49
|
|
Merge from coreutils.
|
|
c9a65f97
|
2003-04-18T12:04:31
|
|
update from libc
|
|
b7476078
|
2002-11-25T00:17:33
|
|
change license to gpl.
|
|
ba951bda
|
2001-12-15T16:57:15
|
|
(__restrict_arr): Update from libc.
|
|
28877cd2
|
2001-08-12T12:49:11
|
|
update from libc
|
|
3e1ee4f3
|
2001-04-02T08:31:28
|
|
Update from GNU libc.
|
|
4aeea603
|
2000-10-29T13:49:56
|
|
(__restrict_arr): Move definition out of #ifndef block.
Required because egcs-2.91.66 (aka 1.1.2) defines __restrict, but
doesn't define __restrict_arr.
|
|
fb523b1e
|
2000-10-28T07:15:32
|
|
Update from libc.
|
|
34d6df51
|
2000-05-04T07:06:42
|
|
Update from glibc.
|
|
dd82a0db
|
1999-01-13T05:36:45
|
|
new version from glibc
|
|
9fb5802a
|
1998-08-07T12:54:51
|
|
update from glibc
|
|
54dce95d
|
1998-03-23T07:24:54
|
|
update from libc/copies
|
|
a44bdcc8
|
1997-07-26T02:55:14
|
|
replace with new version from libc
|
|
37f27c1b
|
1996-07-15T02:41:49
|
|
update FSF address in copyright and remove any trailing blanks
|
|
428f2264
|
1995-10-19T14:21:35
|
|
New version from FSF.
|
|
b80c3874
|
1995-05-20T13:28:24
|
|
merge with 1.11.1a
|
|
7bf62aa2
|
1995-02-16T20:25:54
|
|
update from FSF
|
|
87e2683e
|
1992-11-08T02:50:44
|
|
Initial revision
|