lib/regex.h


Log

Author Commit Date CI Message
Paul Eggert 28492cce 2005-08-31T22:51:09 On 64-bit hosts (where size_t is 64 bits and int is 32 bits), the old glibc regex code mishandles strings longer than 2**31 bytes. This patch fixes this when the regex code is used in gnulib (i.e., outside glibc). * lib/regex.h (_REGEX_LARGE_OFFSETS): New feature-test macro, governing whether the rest of this patch is active. By default, the macro is disabled and the patch has no effect. (regoff_t) [defined _REGEX_LARGE_OFFSETS]: Define to off_t, not int. (__re_idx_t, __re_size_t, __re_long_size_t): New types. (struct re_pattern_buffer, re_search, re_search_2, re_match): (re_match_2, re_set_registers): Use the new types. * lib/regex_internal.h (Idx, re_hashval_t): New types. (REG_MISSING, REG_ERROR, REG_VALID_INDEX, REG_VALID_NONZERO_INDEX): New macros. (re_node_set, re_charset_t, re_token_t, re_string_realloc_buffers): (re_string_context_at, bin_tree_t, re_dfastate_t): (struct re_state_table_entry, state_array_t, re_sub_match_last_t): (re_sub_match_top_t, re_match_context_t, re_sift_context_t): (struct re_fail_stack_ent_t, struct re_fail_stack_t, struct re_dfa_t): (re_string_char_size_at, re_string_wchar_at): (re_string_elem_size_at): Use the new types and macros to port to 64-bit hosts. Use unsigned types for internal values, so that the code mostly works even for arrays larger than SSIZE_MAX. * lib/regcomp.c (re_compile_internal, init_dfa, duplicate_node): (search_duplicated_node, calc_eclosure_iter, fetch_number): (parse_reg_exp, parse_branch, parse_expression, parse_sub_exp): (build_equiv_class, build_charclass, re_compile_fastmap_iter): (free_dfa_content, create_initial_state, optimize_utf8, analyze): (optimize_subexps, calc_first, link_nfa_nodes, duplicate_node_closure): (calc_inveclosure, parse_dup_op, build_range_exp): (build_collating_symbol, parse_bracket_exp, build_charclass_op): (fetch_number, create_token_tree, mark_opt_subexp): Likewise. * lib/regex_internal.c (re_string_construct_common, create_ci_newstate): (create_cd_newstate, re_string_allocate, re_string_construct): (re_string_realloc_buffers, build_wcs_upper_buffer): (re_string_skip_chars, build_upper_buffer, re_string_translate_buffer): (re_string_reconstruct, re_string_peek_byte_case): (re_string_fetch_byte_case, re_string_context_at): (re_node_set_alloc, re_node_set_init_1, re_node_set_init_2): (re_node_set_init_copy, re_node_set_add_intersect): (re_node_set_init_union, re_node_set_merge, re_node_set_insert): (re_node_set_insert_last, re_node_set_compare, re_node_set_contains): (re_node_set_remove_at, re_dfa_add_node, calc_state_hash): (re_acquire_state, re_acquire_state_context, register_state): Likewise. * lib/regex.c (match_ctx_init, match_ctx_add_entry, search_cur_bkref_entry): (match_ctx_add_subtop, match_ctx_add_sublast, sift_ctx_init): (re_search_internal, re_search_2_stub, re_search_stub) (re_copy_regs, check_matching, check_halt_state_context, update_regs): (push_fail_stack, sift_states_iter_mb, build_sifted_states): (update_cur_sifted_state, check_dst_limits): (check_dst_limits_calc_pos_1, check_dst_limits_calc_pos): (check_subexp_limits, sift_states_bkref, merge_state_array): (check_subexp_matching_top, get_subexp, get_subexp_sub): (find_subexp_node, check_arrival, check_arrival_add_next_nodes): (check_arrival_expand_ecl, check_arrival_expand_ecl_sub): (expand_bkref_cache, check_node_accept_bytes): (group_nodes_into_DFAstates, check_node_accept, regexec, re_match): (re_search, re_match_2, re_search_2, prune_impossible_nodes): (acquire_init_state_context, check_halt_node_context): (proceed_next_node, pop_fail_stack, set_regs, free_fail_stack_return): (sift_states_backward, clean_state_log_if_needed): (sub_epsilon_src_nodes, add_epsilone_src_nodes, merge_state_with_log): (find_recover_state, transit_state_sb, transit_state_mb): (transit_state_bkref, build_trtable, match_ctx_clean): Likewise. * lib/regcomp.c (parse_dup_op): Add an extra test if Idx is unsigned, to work around an assumption that REG_MISSING is negative. * m4/regex.m4 (gl_REGEX): Require AC_SYS_LARGEFILE, Define _REGEX_LARGE_OFFSETS). Test for regoff_t/off_t bug in 64-bit and large-file glibc and in 32-bit large-file Solaris. * config/srclist.txt: Add glibc bug 1281.
Paul Eggert 083768e3 2005-08-25T05:08:59 * config/srclist.txt: Add glibc bug 1240. * lib/regcomp.c (regerror): 2nd arg is 'restrict', as per POSIX. * lib/regex.h (regerror): Likewise.
Paul Eggert dad0bacf 2005-08-24T23:09:33 [_REGEX_SOURCE]: Define re_fastmap_accurate too; this was inadvertently omitted from the previous patch.
Paul Eggert d6103c5e 2005-08-24T22:29:38 * config/srclist.txt: Remove glibc bug 1233 and add 1236, which supersedes it. * lib/regex.h: Fix a multitude of POSIX name space violations. These changes have an effect only for programs that define _POSIX_C_SOURCE, _POSIX_SOURCE, or _XOPEN_SOURCE; they do not change anything for programs compiled in the normal way. Also, there is no effect on the ABI. (_REGEX_SOURCE): New macro. Do not include <stddef.h> if _XOPEN_SOURCE and VMS are both defined and _GNU_SOURCE is not; this fixes a name space violation. Rename the following macros to obey POSIX requirements. The old names are still visible as macros if _REGEX_SOURCE is defined. (REG_BACKSLASH_ESCAPE_IN_LISTS): renamed from RE_BACKSLASH_ESCAPE_IN_LISTS. (REG_BK_PLUS_QM): renamed from RE_BK_PLUS_QM. (REG_CHAR_CLASSES): renamed from RE_CHAR_CLASSES. (REG_CONTEXT_INDEP_ANCHORS): renamed from RE_CONTEXT_INDEP_ANCHORS. (REG_CONTEXT_INDEP_OPS): renamed from RE_CONTEXT_INDEP_OPS. (REG_CONTEXT_INVALID_OPS): renamed from RE_CONTEXT_INVALID_OPS. (REG_DOT_NEWLINE): renamed from RE_DOT_NEWLINE. (REG_DOT_NOT_NULL): renamed from RE_DOT_NOT_NULL. (REG_HAT_LISTS_NOT_NEWLINE): renamed from RE_HAT_LISTS_NOT_NEWLINE. (REG_INTERVALS): renamed from RE_INTERVALS. (REG_LIMITED_OPS): renamed from RE_LIMITED_OPS. (REG_NEWLINE_ALT): renamed from RE_NEWLINE_ALT. (REG_NO_BK_BRACES): renamed from RE_NO_BK_BRACES. (REG_NO_BK_PARENS): renamed from RE_NO_BK_PARENS. (REG_NO_BK_REFS): renamed from RE_NO_BK_REFS. (REG_NO_BK_VBAR): renamed from RE_NO_BK_VBAR. (REG_NO_EMPTY_RANGES): renamed from RE_NO_EMPTY_RANGES. (REG_UNMATCHED_RIGHT_PAREN_ORD): renamed from RE_UNMATCHED_RIGHT_PAREN_ORD. (REG_NO_POSIX_BACKTRACKING): renamed from RE_NO_POSIX_BACKTRACKING. (REG_NO_GNU_OPS): renamed from RE_NO_GNU_OPS. (REG_DEBUG): renamed from RE_DEBUG. (REG_INVALID_INTERVAL_ORD): renamed from RE_INVALID_INTERVAL_ORD. (REG_IGNORE_CASE): renamed from RE_ICASE. This renaming is a bit unusual, since we can't clash with the POSIX REG_ICASE. (REG_CARET_ANCHORS_HERE): renamed from RE_CARET_ANCHORS_HERE. (REG_CONTEXT_INVALID_DUP): renamed from RE_CONTEXT_INVALID_DUP. (REG_NO_SUB): renamed from RE_NO_SUB. (REG_SYNTAX_EMACS): renamed from RE_SYNTAX_EMACS. (REG_SYNTAX_AWK): renamed from RE_SYNTAX_AWK. (REG_SYNTAX_GNU_AWK): renamed from RE_SYNTAX_GNU_AWK. (REG_SYNTAX_POSIX_AWK): renamed from RE_SYNTAX_POSIX_AWK. (REG_SYNTAX_GREP): renamed from RE_SYNTAX_GREP. (REG_SYNTAX_EGREP): renamed from RE_SYNTAX_EGREP. (REG_SYNTAX_POSIX_EGREP): renamed from RE_SYNTAX_POSIX_EGREP. (REG_SYNTAX_ED): renamed from RE_SYNTAX_ED. (REG_SYNTAX_SED): renamed from RE_SYNTAX_SED. (_REG_SYNTAX_POSIX_COMMON): renamed from _RE_SYNTAX_POSIX_COMMON. (REG_SYNTAX_POSIX_BASIC): renamed from RE_SYNTAX_POSIX_BASIC. (REG_SYNTAX_POSIX_MINIMAL_BASIC): renamed from RE_SYNTAX_POSIX_MINIMAL_BASIC. (REG_SYNTAX_POSIX_EXTENDED): renamed from RE_SYNTAX_POSIX_EXTENDED. (REG_SYNTAX_POSIX_MINIMAL_EXTENDED): renamed from RE_SYNTAX_POSIX_MINIMAL_EXTENDED. (REG_DUP_MAX): renamed from RE_DUP_MAX. No need to undef it. (REG_UNALLOCATED): Renamed from REGS_UNALLOCATED. (REG_REALLOCATE): Renamed from REGS_REALLOCATE. (REG_FIXED): Renamed from REGS_FIXED. (REG_NREGS): Renamed from RE_NREGS. (REG_ICASE, REG_NEWLINE, REG_NOSUB): Do not depend on the values of other REG_* macros, since POSIX says the user is allowed to #undef these macros selectively. (reg_errcode_t): Update comment stating what other tables need to be consistent. Rename the following enum values to obey POSIX requirements. The old names are still visible as macros. (_REG_ENOSYS): Renamed from REG_ENOSYS. Define even if _XOPEN_SOURCE is not defined, since GNU is supposed to be a superset of POSIX as much as possible, and since we want reg_errcode_t to be a signed type for implementation consistency. (_REG_NOERROR): Renamed from REG_NOERROR. (_REG_NOMATCH): Renamed from REG_NOMATCH. (_REG_BADPAT): Renamed from REG_BADPAT. (_REG_ECOLLATE): Renamed from REG_ECOLLATE. (_REG_ECTYPE): Renamed from REG_ECTYPE. (_REG_EESCAPE): Renamed from REG_EESCAPE. (_REG_ESUBREG): Renamed from REG_ESUBREG. (_REG_EBRACK): Renamed from REG_EBRACK. (_REG_EPAREN): Renamed from REG_EPAREN. (_REG_EBRACE): Renamed from REG_EBRACE. (_REG_BADBR): Renamed from REG_BADBR. (_REG_ERANGE): Renamed from REG_ERANGE. (_REG_ESPACE): Renamed from REG_ESPACE. (_REG_BADRPT): Renamed from REG_BADRPT. (_REG_EEND): Renamed from REG_EEND. (_REG_ESIZE): Renamed from REG_ESIZE. (_REG_ERPAREN): Renamed from REG_ERPAREN. (REG_ENOSYS, REG_NOERROR, REG_NOMATCH, REG_BADPAT, REG_ECOLLATE): (REG_ECTYPE, REG_EESCAPE, REG_ESUBREG, REG_EBRACK, REG_EPAREN): (REG_EBRACE, REG_BADBR, REG_ERANGE, REG_ESPACE, REG_BADRPT, REG_EEND): (REG_ESIZE, REG_ERPAREN): Now macros, not enum constants. (_REG_RE_NAME, _REG_RM_NAME): New macros. (REG_TRANSLATE_TYPE): Renamed from RE_TRANSLATE_TYPE. All uses changed. But support the old name if the new one is not defined and if _REGEX_SOURCE. Change the following member names in struct re_pattern_buffer. The old names are still supported if !_REGEX_SOURCE. The new names are always supported, regardless of _REGEX_SOURCE. (re_buffer): Renamed from buffer. (re_allocated): Renamed from allocated. (re_used): Renamed from used. (re_syntax): Renamed from syntax. (re_fastmap): Renamed from fastmap. (re_translate): Renamed from translate. (re_can_be_null): Renamed from can_be_null. (re_regs_allocated): Renamed from regs_allocated. (re_fastmap_accurate): Renamed from fastmap_accurate. (re_no_sub): Renamed from no_sub. (re_not_bol): Renamed from not_bol. (re_not_eol): Renamed from not_eol. (re_newline_anchor): Renamed from newline_anchor. Change the following member names in struct re_registers. The old names are still supported if !_REGEX_SOURCE. The new names are always supported, regardless of _REGEX_SOURCE. (rm_num_regs): Renamed from num_regs. (rm_start): Renamed from start. (rm_end): Renamed from end. (re_set_syntax, re_compile_pattern, re_compile_fastmap): (re_search, re_search_2, re_match, re_match_2, re_set_registers): Prepend __ to parameter names.
Paul Eggert 9828dc4e 2005-08-23T20:37:24 * config/srclist.txt: Add glibc bug 1233. * lib/regex.h (REG_NOSYS) [!defined _XOPEN_SOURCE && 200112L <= _POSIX_C_SOURCE]: Define, since POSIX requires it as of 2001. (_REG_ENOSYS) [! (defined _XOPEN_SOURCE || 200112L <= _POSIX_C_SOURCE)]: New private symbol, used to keep the enum signed in all cases.
Paul Eggert b5fb7004 2005-08-23T19:11:45 * config/srclist.txt: Add glibc bug 1232. * lib/regex.h (RE_NO_EMPTY_RANGES): Fix doc bug reported by James Youngman in <http://lists.gnu.org/archive/html/bug-gnulib/2005-07/msg00132.html>.
Paul Eggert 087e9e5b 2005-08-20T07:42:15 * config/srclist.txt: Add glibc bugs 1220, 1221, 1222. * lib/regcomp.c: (re_compile_pattern, re_set_syntax, re_compile_fastmap): (re_compile_fastmap_iter, regcomp, regerror, regfree): (re_compile_internal, init_dfa, init_word_char, free_workarea_compile): (create_initial_state, optimize_utf8, analyze, postorder, preorder): (optimize_subexps, lower_subexps, lower_subexp, calc_first, calc_next): (link_nfa_nodes, duplicate_node_closure, search_duplicated_node): (duplicate_node, calc_inveclosure, calc_eclosure, calc_eclosure_iter): (fetch_token, peek_token, peek_token_bracket, parse, parse_reg_exp): (parse_branch, parse_expression, parse_sub_exp, parse_dup_op): (build_range_exp, build_collating_symbol, parse_bracket_exp): (parse_bracket_element, parse_bracket_symbol, build_equiv_class): (build_charclass, build_charclass_op, fetch_number, create_tree): (create_token_tree, mark_opt_subexp, duplicate_tree): Use prototypes rather than old-style definitions. * lib/regex_internal.c: (re_string_allocate, re_string_construct, re_string_realloc_buffers): (re_string_construct_common, build_wcs_buffer, build_wcs_upper_buffer): (re_string_skip_chars, build_upper_buffer, re_string_translate_buffer): (re_string_reconstruct, re_string_peek_byte_case): (re_string_fetch_byte_case, re_string_destruct, re_string_context_at): (re_node_set_alloc, re_node_set_init_1, re_node_set_init_2): (re_node_set_init_copy, re_node_set_add_intersect): (re_node_set_init_union, re_node_set_merge, re_node_set_insert): (re_node_set_insert_last, re_node_set_compare, re_node_set_contains): (re_node_set_remove_at, re_dfa_add_node, calc_state_hash): (re_acquire_state, re_acquire_state_context, register_state): (create_ci_newstate, create_cd_newstate, free_state): Likewise. * lib/regexec.c (regexec, re_match, re_search, re_match_2, re_search_2): (re_search_2_stub, re_search_stub, re_copy_regs, re_set_registers): (re_search_internal, prune_impossible_nodes): (acquire_init_state_context, check_matching, static): (check_halt_node_context, check_halt_state_context, proceed_next_node): (push_fail_stack, pop_fail_stack, set_regs, free_fail_stack_return): (update_regs, sift_states_backward, build_sifted_states): (clean_state_log_if_needed, merge_state_array): (update_cur_sifted_state, add_epsilon_src_nodes): (sub_epsilon_src_nodes, check_dst_limits, check_dst_limits_calc_pos_1): (check_dst_limits_calc_pos, check_subexp_limits, sift_states_bkref): (sift_states_iter_mb, transit_state, merge_state_with_log, static): (find_recover_state, check_subexp_matching_top, transit_state_mb): (transit_state_bkref, get_subexp, get_subexp_sub, find_subexp_node): (check_arrival, check_arrival_add_next_nodes): (check_arrival_expand_ecl, check_arrival_expand_ecl_sub): (expand_bkref_cache, build_trtable, group_nodes_into_DFAstates): (check_node_accept_bytes, check_node_accept, extend_buffers): (match_ctx_init, match_ctx_clean, match_ctx_free, match_ctx_add_entry): (search_cur_bkref_entry, match_ctx_add_subtop, match_ctx_add_sublast): (sift_ctx_init): Likewise. * lib/regex_internal.h: (re_string_allocate, re_string_construct, re_string_reconstruct): (re_string_realloc_buffers, build_wcs_buffer, build_wcs_upper_buffer): (build_upper_buffer, re_string_translate_buffer, re_string_destruct): (re_string_elem_size_at, re_string_char_size_at, re_string_wchar_at): (re_string_context_at, re_string_peek_byte_case): (re_string_fetch_byte_case): Declare even if RE_NO_INTERNAL_PROTOTYPES is defined, since we now use prototypes always. * lib/regex.h (_RE_ARGS): Remove. No longer needed, since we assume C89 or better. All uses removed.
Paul Eggert 6caf406f 2005-08-18T05:08:05 Remove useless space-before-tab.
Paul Eggert 7277ed5a 2005-08-16T00:07:03 * config/srclist.txt: Comment out $LIBCSRC/posix/regex.h. Add comments for each pending glibc patch. * lib/regex.h (__restrict_arr): Don't define to __restrict if __cplusplus is defined.
Paul Eggert 151e40bb 2005-07-07T08:08:39 * modules/regex (Files): Add lib/regex_internal.c, lib/regex_internal.h, lib/regexec.c, lib/regcomp.c, m4/codeset.m4. (Depends-on): Add extensions. (Makefile.am): Remove lib_SOURCES; now done by m4 code. * config/srclist.txt: Add regcomp.c, regex.c, regex.h, regex_internal.c, regexec.c. Add regex_internal.h too, but as a comment, since the libc version is currently broken in gnulib mode. * lib/regex.c, lib/regex.h: Sync from libc. * lib/regcomp.c, lib/regexec_internal.c, lib/regex_internal.h, lib/regexec.c: New files, synced from libc, except that regex_internal.h currently has a small porting fix. * m4/regex.m4: Adjust to new libc regex implementation. (gl_INCLUDED_REGEX): Add AC_LIBSOURCES for all the .c and .h parts of (the new) regex. Quote the m4 stuff better. Check for RE_ICASE bug of old gnulib. Check for REG_STARTEND of recent libc. Rename local variables from jm_* to gl_*. Quote operand of "test -f". Say "recent enough" version of libc, not "version 2". (gl_PREREQ_REGEX): Remove AC_FUNC_ALLOCA, since alloca is a prerequisite module. Remove AC_HEADER_STDC; no longer needed. Check for locale.h, isblank, mbrtowc, wcrtomb, wcscoll. Remove check for btowc, isascii. Require AM_LANGINFO_CODESET.
Paul Eggert 6ef9a073 2005-05-15T04:45:43 Sync from coreutils. * modules/yesno (Depends-on): Add getline. * gethrxtime.c, gethrxtime.h, getpass.h, mountlist.h, path-concat.c, regex.h, strtoll.c, unlocked-io.h, xtime.h: White space changes only. * makepath.c (make_path): Port to hosts where leading "//" is special. * yesno.c: Include getline.h, not ctype.h. (yesno): Don't remove leading white space; POSIX doesn't allow it. Use getline to remove arbitrary restriction on response length.
Paul Eggert 267a39ba 2005-05-14T06:03:57 *** empty log message ***
Paul Eggert d136d930 2003-09-10T06:18:22 Remove K&R cruft.
Paul Eggert 9d738dcb 2003-08-17T05:30:20 Undo white space changes of 2003-08-12, allowing us to sync more files from glibc.
Paul Eggert a2e0479b 2003-08-12T23:27:26 White space fixes from coreutils.
Karl Berry 0438d509 2003-08-10T13:54:55 update regex.h from libc
Paul Eggert 202b2af4 2003-08-09T08:57:49 Merge from coreutils.
Karl Berry c9a65f97 2003-04-18T12:04:31 update from libc
Karl Berry b7476078 2002-11-25T00:17:33 change license to gpl.
Jim Meyering ba951bda 2001-12-15T16:57:15 (__restrict_arr): Update from libc.
Jim Meyering 28877cd2 2001-08-12T12:49:11 update from libc
Jim Meyering 3e1ee4f3 2001-04-02T08:31:28 Update from GNU libc.
Jim Meyering 4aeea603 2000-10-29T13:49:56 (__restrict_arr): Move definition out of #ifndef block. Required because egcs-2.91.66 (aka 1.1.2) defines __restrict, but doesn't define __restrict_arr.
Jim Meyering fb523b1e 2000-10-28T07:15:32 Update from libc.
Jim Meyering 34d6df51 2000-05-04T07:06:42 Update from glibc.
Jim Meyering dd82a0db 1999-01-13T05:36:45 new version from glibc
Jim Meyering 9fb5802a 1998-08-07T12:54:51 update from glibc
Jim Meyering 54dce95d 1998-03-23T07:24:54 update from libc/copies
Jim Meyering a44bdcc8 1997-07-26T02:55:14 replace with new version from libc
Jim Meyering 37f27c1b 1996-07-15T02:41:49 update FSF address in copyright and remove any trailing blanks
Jim Meyering 428f2264 1995-10-19T14:21:35 New version from FSF.
Jim Meyering b80c3874 1995-05-20T13:28:24 merge with 1.11.1a
Jim Meyering 7bf62aa2 1995-02-16T20:25:54 update from FSF
Jim Meyering 87e2683e 1992-11-08T02:50:44 Initial revision