|
1e3866b6
|
2005-09-16T00:23:36
|
|
* regcomp.c, regexec.c, regex_internal.c: Back out previous
changes, consolidating in...
* regex_internal.h: ...this file.
|
|
594190cb
|
2005-09-15T19:14:23
|
|
* regex_internal.h: Blank `pure' for GNUC < 3.
* regex_internal.c: Ditto, using this...
(__GNUC_PREREQ): ...new macro.
* regcomp.c, regexec.c: Blank `always_inline' for GNUC < 3.1 using...
(__GNUC_PREREQ): ...this new macro.
|
|
c4f640f1
|
2005-09-06T07:36:48
|
|
Change bitset word type from unsigned int to unsigned long int,
as this has better performance on typical 64-bit hosts.
Port bitset code to hosts with unusual word sizes.
* lib/regcomp.c (build_equiv_class, build_charclass):
(build_range_exp, build_collating_symbol):
Prefer bitset to re_bitset_ptr_t in prototypes, when the actual
argument is a bitset. This is merely a style issue, but it makes
it clearer that an entire array is expected.
(re_compile_fastmap_iter, init_dfa, init_word_char, optimize_subexps):
* lib/regcomp.c (lower_subexp, parse_bracket_exp):
(built_charclass_op):
Port to the case where bitset_word is not the same as unsigned int.
* lib/regex_internal.h (bitset_set, bitset_clear, bitset_contain):
(bitset_not, bitset_merge, bitset_set_all, bitset_mask):
Likewise.
* lib/regexec.c (check_dst_limits_calc_pos_1):
(check_subexp_matching_top):
(build_trtable, group_nodes_into_DFAstates):
Likewise.
* lib/regcomp.c (re_compile_fastmap_iter, utf8_sb_map):
(optimize_utf8):
Don't assume that SBC_MAX is a multiple of BITSET_WORD_BITS.
* lib/regex_internal.h (bitset_set_all, bitset_not): Likewise.
* lib/regexec.c (group_nodes_into_DFAstates): Likewise.
* lib/regcomp.c (utf8_sb_map): Don't assume UINT_MAX == 0xffffffff.
* lib/regcomp.c (optimize_subexps, lower_subexp):
Work even if bitset_word has holes in its bitwise representation.
* lib/regex_internal.h (BITSET_WORD_BITS): Likewise.
* lib/regexec.c (check_dst_limits_calc_pos_1):
(heck_subexp_matching_top): Likewise.
* lib/regex_internal.c (re_string_reconstruct):
Don't assume UCHAR_MAX == 255.
* lib/regex_internal.h (bitset_set_all): Likewise.
* lib/regex_internal.h (BITSET_WORD_BITS): Renamed from UINT_BITS.
All uses changed.
(BITSET_WORDS): Renamed from BITSET_UINTS. All uses changed.
(bitset_word): New type, replacing 'unsigned int' for bitset uses.
All uses changed.
(BITSET_WORD_MAX): New macro.
(bitset_set, bitset_clear, bitset_contain, bitset_empty):
(bitset_set_all, bitset_copy): Now inline functions, not macros.
(bitset_empty, bitset_copy):
Prefer sizeof (bitset) to multiplying it out ourselves.
(bitset_not_merge): Remove; unused.
(bitset_contain): Return bool, not unsigned int with one bit on.
All callers changed.
* lib/regexec.c (build_trtable): Don't assume bitset has no stricter
alignment than re_node_set; do this by defining a new internal
type struct dests_alloc and using it to allocate memory.
* config/srclist.txt: Add glibc bug 1302.
|
|
812cbebe
|
2005-09-02T22:54:59
|
|
Check for arithmetic overflow when calculating sizes, to prevent
some buffer-overflow issues. These patches are conservative, in the
sense that when I couldn't determine whether an overflow was possible,
I inserted a run-time check.
* regex_internal.h (re_xmalloc, re_xrealloc, re_x2realloc): New macros.
(SIZE_MAX) [!defined SIZE_MAX]: New macro.
(re_alloc_oversized, re_x2alloc_oversized, re_xnmalloc):
(re_xnrealloc, re_x2nrealloc): New inline functions.
* lib/regcomp.c (init_dfa, analyze, build_range_exp, parse_bracket_exp):
(build_equiv_class, build_charclass): Check for arithmetic overflow
in size expression calculations.
* lib/regex_internal.c (re_string_realloc_buffers):
(build_wcs_upper_buffer, re_node_set_add_intersect):
(re_node_set_init_union, re_node_set_insert, re_node_set_insert_last):
(re_dfa_add_node, register_state): Likewise.
* lib/regexec.c (re_search_stub, re_copy_regs, re_search_internal):
(prune_impossible_nodes, push_fail_stack, set_regs, check_arrival):
(build_trtable, extend_buffers, match_ctx_init, match_ctx_add_entry):
(match_ctx_add_subtop, match_ctx_add_sublast): Likewise.
|
|
9581d12c
|
2005-09-01T22:10:59
|
|
* regex_internal.c (re_string_context_at): Fix bug where the
code assumed that Idx is signed.
* config/srclist.txt: Add glibc bug 1287.
|
|
7094f1ff
|
2005-09-01T21:01:26
|
|
* lib/regex_internal.c (build_wcs_upper_buffer): Fix portability
bugs in int versus size_t comparisons.
* config/srclist.txt: Add glibc bug 1285, 1286.
|
|
1e5cfc92
|
2005-09-01T19:41:07
|
|
Use bool where appropriate.
* lib/regcomp.c (re_set_fastmap): ICASE arg is bool, not int.
All callers changed.
(calc_eclosure_iter): Likewise, for ROOT arg.
(parse_bracket_element): Likewise, for ACCEPT_HYPHEN arg.
(build_charclass_op): Likewise, for NON_MATCH arg.
* lib/regex_internal.c (re_string_allocate, re_string_construct):
(re_string_construct_common): Likewise, for ICASE arg.
* lib/regexec.c (re_search_2_stub, re_search_stub):
Likewise, for RET_LEN arg.
(check_matching): Likewise, for FL_LONGEST_MATCH arg.
(set_regs): Likewise, for FL_BACKTRACK arg.
* lib/regcomp.c (re_compile_fastmap_iter, optimize_utf8):
(duplicate_node_closure, calc_inveclosure, calc_eclosure):
(calc_eclosure_iter, parse_bracket_exp):
Use bool for internal variables that are booleans.
* lib/regexec.c (re_search_internal, check_matching):
(proceed_next_node):
(set_regs, build_sifted_states, sift_states_bkref):
(check_arrival_add_next_nodes, check_arrival_expand_ecl_sub):
(expand_bkref_cache, build_trtable, group_nodes_into_DFAstates):
(find_collation_sequence_value):
Likewise.
* lib/regex_internal.c (re_node_set_insert, re_node_set_insert_last):
(re_node_set_compare):
Return bool, not int. All callers changed.
* lib/regexec.c (check_halt_node_context, check_dst_limits):
(build_trtable, check_node_accept): Likewise.
* lib/regex_internal.h: Include stdbool.h.
Fix bugs uncovered when converting to bool.
* lib/regcomp.c (calc_eclosure_iter): Check for storage allocation
failure instead of charging ahead blindly.
* lib/regex_internal.c (register_state): Likewise.
* lib/regexec.c (re_search_2_stub): Use simpler method than boolean
for freeing internal storage.
(group_nodes_into_DFA_states): Use unsigned int, not int, for
bitset pieces used as boolean, to avoid undefined behavior
on hosts that do int overflow checking.
* config/srclist.txt: Add glibc bug 1285.
|
|
fec9ced8
|
2005-09-01T07:03:01
|
|
* lib/regex_internal.c (re_string_reconstruct): Don't assume buffer
lengths fit in regoff_t; this isn't true if regoff_t is the same
width as size_t.
* lib/regex.c (re_search_internal): 5th arg is LAST_START
(= START + RANGE) instead of RANGE. This avoids overflow
problems when regoff_t is the same width as size_t.
All callers changed.
(re_search_2_stub): Check for overflow when adding the
sizes of the two strings.
(re_search_stub): Check for overflow when adding START
to RANGE; if it occurs, substitute the extreme value.
* config/srclist.txt: Add glibc bug 1284.
|
|
ea626b10
|
2005-08-31T23:36:42
|
|
* lib/regcomp.c (search_duplicated_node): Make first pointer arg
a pointer-to-const.
* lib/regex_internal.c (create_ci_newstate, create_cd_newstate):
(register_state): Likewise.
* lib/regexec.c (search_cur_bkref_entry, check_dst_limits):
(check_dst_limits_calc_pos_1, check_dst_limits_calc_pos):
(group_nodes_into_DFAstates): Likewise.
* config/srclist.txt: Add glibc bug 1282.
|
|
28492cce
|
2005-08-31T22:51:09
|
|
On 64-bit hosts (where size_t is 64 bits and int is 32 bits), the
old glibc regex code mishandles strings longer than 2**31 bytes.
This patch fixes this when the regex code is used in gnulib
(i.e., outside glibc).
* lib/regex.h (_REGEX_LARGE_OFFSETS): New feature-test macro,
governing whether the rest of this patch is active. By default,
the macro is disabled and the patch has no effect.
(regoff_t) [defined _REGEX_LARGE_OFFSETS]: Define to off_t, not int.
(__re_idx_t, __re_size_t, __re_long_size_t): New types.
(struct re_pattern_buffer, re_search, re_search_2, re_match):
(re_match_2, re_set_registers): Use the new types.
* lib/regex_internal.h (Idx, re_hashval_t): New types.
(REG_MISSING, REG_ERROR, REG_VALID_INDEX, REG_VALID_NONZERO_INDEX):
New macros.
(re_node_set, re_charset_t, re_token_t, re_string_realloc_buffers):
(re_string_context_at, bin_tree_t, re_dfastate_t):
(struct re_state_table_entry, state_array_t, re_sub_match_last_t):
(re_sub_match_top_t, re_match_context_t, re_sift_context_t):
(struct re_fail_stack_ent_t, struct re_fail_stack_t, struct re_dfa_t):
(re_string_char_size_at, re_string_wchar_at):
(re_string_elem_size_at):
Use the new types and macros to port to 64-bit hosts.
Use unsigned types for internal values, so that the code
mostly works even for arrays larger than SSIZE_MAX.
* lib/regcomp.c (re_compile_internal, init_dfa, duplicate_node):
(search_duplicated_node, calc_eclosure_iter, fetch_number):
(parse_reg_exp, parse_branch, parse_expression, parse_sub_exp):
(build_equiv_class, build_charclass, re_compile_fastmap_iter):
(free_dfa_content, create_initial_state, optimize_utf8, analyze):
(optimize_subexps, calc_first, link_nfa_nodes, duplicate_node_closure):
(calc_inveclosure, parse_dup_op, build_range_exp):
(build_collating_symbol, parse_bracket_exp, build_charclass_op):
(fetch_number, create_token_tree, mark_opt_subexp):
Likewise.
* lib/regex_internal.c
(re_string_construct_common, create_ci_newstate):
(create_cd_newstate, re_string_allocate, re_string_construct):
(re_string_realloc_buffers, build_wcs_upper_buffer):
(re_string_skip_chars, build_upper_buffer, re_string_translate_buffer):
(re_string_reconstruct, re_string_peek_byte_case):
(re_string_fetch_byte_case, re_string_context_at):
(re_node_set_alloc, re_node_set_init_1, re_node_set_init_2):
(re_node_set_init_copy, re_node_set_add_intersect):
(re_node_set_init_union, re_node_set_merge, re_node_set_insert):
(re_node_set_insert_last, re_node_set_compare, re_node_set_contains):
(re_node_set_remove_at, re_dfa_add_node, calc_state_hash):
(re_acquire_state, re_acquire_state_context, register_state):
Likewise.
* lib/regex.c
(match_ctx_init, match_ctx_add_entry, search_cur_bkref_entry):
(match_ctx_add_subtop, match_ctx_add_sublast, sift_ctx_init):
(re_search_internal, re_search_2_stub, re_search_stub)
(re_copy_regs, check_matching, check_halt_state_context, update_regs):
(push_fail_stack, sift_states_iter_mb, build_sifted_states):
(update_cur_sifted_state, check_dst_limits):
(check_dst_limits_calc_pos_1, check_dst_limits_calc_pos):
(check_subexp_limits, sift_states_bkref, merge_state_array):
(check_subexp_matching_top, get_subexp, get_subexp_sub):
(find_subexp_node, check_arrival, check_arrival_add_next_nodes):
(check_arrival_expand_ecl, check_arrival_expand_ecl_sub):
(expand_bkref_cache, check_node_accept_bytes):
(group_nodes_into_DFAstates, check_node_accept, regexec, re_match):
(re_search, re_match_2, re_search_2, prune_impossible_nodes):
(acquire_init_state_context, check_halt_node_context):
(proceed_next_node, pop_fail_stack, set_regs, free_fail_stack_return):
(sift_states_backward, clean_state_log_if_needed):
(sub_epsilon_src_nodes, add_epsilone_src_nodes, merge_state_with_log):
(find_recover_state, transit_state_sb, transit_state_mb):
(transit_state_bkref, build_trtable, match_ctx_clean):
Likewise.
* lib/regcomp.c (parse_dup_op): Add an extra test if Idx is unsigned,
to work around an assumption that REG_MISSING is negative.
* m4/regex.m4 (gl_REGEX): Require AC_SYS_LARGEFILE, Define
_REGEX_LARGE_OFFSETS). Test for regoff_t/off_t bug in 64-bit
and large-file glibc and in 32-bit large-file Solaris.
* config/srclist.txt: Add glibc bug 1281.
|
|
3af956ae
|
2005-08-26T21:47:51
|
|
* config/srclist.text: Add glibc bug 1248.
* lib/regex_internal.h: Remove all references to
RE_NO_INTERNAL_PROTOTYPES; no longer neeeded now that we assume C89
or better.
(bitset_not, bitset_merge, bitset_not_merge):
(bitset_mask, re_string_allocate, re_string_construct):
(re_string_reconstruct, re_string_destruct, re_string_elem_size_at):
(re_string_char_size_at, re_string_wchar_at, re_string_peek_byte_case):
(re_string_fetch_byte_case, re_node_set_alloc, re_node_set_init_1):
(re_node_set_init_2, re_node_set_init_copy, re_node_set_add_intersect):
(re_node_set_init_union, re_node_set_merge, re_node_set_insert):
(re_node_set_insert_last, re_node_set_compare, re_node_set_contains):
(re_node_set_remove_at, re_dfa_add_node, re_acquire_state):
(re_acquire_state_context):
Remove unnecessary forward decls.
(re_string_char_size_at, re_string_wchar_at, re_string_elem_size_at):
Put __attribute at function definition,
now that the function decl has been removed.
* lib/regex_internal.c (re_string_peek_byte_case):
(re_string_fetch_byte_case, re_node_set_compare, re_node_set_contains):
Likewise.
|
|
cad71bd9
|
2005-08-25T20:39:57
|
|
Make regex safe for g++. This fixes one real bug (an "err"
that should have been "*err").
* config/srclist.txt: Add glibc bug 1241.
* lib/regex_internal.h (re_calloc): New macro, consistent with
re_malloc etc. All callers of calloc changed to use re_calloc.
* lib/regex_internal.c (build_wcs_upper_buffer): Return reg_errcode_t,
not int. All callers changed.
* lib/regcomp.c (re_compile_fastmap_iter): Don't use alloca
(mb_cur_max); just use an array of size MB_LEN_MAX.
* lib/regexec.c (push_fail_stack): Use re_realloc, not realloc.
(find_recover_state): Change "err" to "*err"; this fixes what
appears to be a real bug.
(check_arrival_expand_ecl_sub): Be consistent about reg_errcode_t
versus int.
|
|
e6d7b6da
|
2005-08-24T23:29:39
|
|
* config/srclist.txt: Add glibc bug 1237.
* lib/regcomp.c, lib/regex_internal.c, lib/regex_internal.h:
* lib/regexec.c:
All uses of recently-renamed identifiers changed to use the new,
POSIX-compliant names. The code will build and run just fine
without these changes, but it's better to eat our own dog food
and use the standard-conforming names.
* m4/regex.m4 (gl_REGEX): Use POSIX-compliant spellings when testing
for GNU regex features.
|
|
576ad385
|
2005-08-23T18:55:44
|
|
* config/srclist.txt: Add glibc bug 1231.
* lib/regex_internal.c (re_string_skip_chars, register_state):
(calc_state_hash):
Remove forward decls; no longer needed now that we use prototypes.
* lib/regexec.c (acquire_init_state_context, check_halt_node_context):
(proceed_next_node, pop_fail_stack, sub_epsilon_src_nodes):
(clean_state_log_if_needed): Likewise.
|
|
9c0a244e
|
2005-08-21T03:31:45
|
|
* config/srclist.txt: Add glibc bug 1226.
* lib/regex_internal.c (calc_state_hash): Put 'inline' before type, since
some compilers warn about it otherwise.
|
|
087e9e5b
|
2005-08-20T07:42:15
|
|
* config/srclist.txt: Add glibc bugs 1220, 1221, 1222.
* lib/regcomp.c:
(re_compile_pattern, re_set_syntax, re_compile_fastmap):
(re_compile_fastmap_iter, regcomp, regerror, regfree):
(re_compile_internal, init_dfa, init_word_char, free_workarea_compile):
(create_initial_state, optimize_utf8, analyze, postorder, preorder):
(optimize_subexps, lower_subexps, lower_subexp, calc_first, calc_next):
(link_nfa_nodes, duplicate_node_closure, search_duplicated_node):
(duplicate_node, calc_inveclosure, calc_eclosure, calc_eclosure_iter):
(fetch_token, peek_token, peek_token_bracket, parse, parse_reg_exp):
(parse_branch, parse_expression, parse_sub_exp, parse_dup_op):
(build_range_exp, build_collating_symbol, parse_bracket_exp):
(parse_bracket_element, parse_bracket_symbol, build_equiv_class):
(build_charclass, build_charclass_op, fetch_number, create_tree):
(create_token_tree, mark_opt_subexp, duplicate_tree):
Use prototypes rather than old-style definitions.
* lib/regex_internal.c:
(re_string_allocate, re_string_construct, re_string_realloc_buffers):
(re_string_construct_common, build_wcs_buffer, build_wcs_upper_buffer):
(re_string_skip_chars, build_upper_buffer, re_string_translate_buffer):
(re_string_reconstruct, re_string_peek_byte_case):
(re_string_fetch_byte_case, re_string_destruct, re_string_context_at):
(re_node_set_alloc, re_node_set_init_1, re_node_set_init_2):
(re_node_set_init_copy, re_node_set_add_intersect):
(re_node_set_init_union, re_node_set_merge, re_node_set_insert):
(re_node_set_insert_last, re_node_set_compare, re_node_set_contains):
(re_node_set_remove_at, re_dfa_add_node, calc_state_hash):
(re_acquire_state, re_acquire_state_context, register_state):
(create_ci_newstate, create_cd_newstate, free_state):
Likewise.
* lib/regexec.c (regexec, re_match, re_search, re_match_2, re_search_2):
(re_search_2_stub, re_search_stub, re_copy_regs, re_set_registers):
(re_search_internal, prune_impossible_nodes):
(acquire_init_state_context, check_matching, static):
(check_halt_node_context, check_halt_state_context, proceed_next_node):
(push_fail_stack, pop_fail_stack, set_regs, free_fail_stack_return):
(update_regs, sift_states_backward, build_sifted_states):
(clean_state_log_if_needed, merge_state_array):
(update_cur_sifted_state, add_epsilon_src_nodes):
(sub_epsilon_src_nodes, check_dst_limits, check_dst_limits_calc_pos_1):
(check_dst_limits_calc_pos, check_subexp_limits, sift_states_bkref):
(sift_states_iter_mb, transit_state, merge_state_with_log, static):
(find_recover_state, check_subexp_matching_top, transit_state_mb):
(transit_state_bkref, get_subexp, get_subexp_sub, find_subexp_node):
(check_arrival, check_arrival_add_next_nodes):
(check_arrival_expand_ecl, check_arrival_expand_ecl_sub):
(expand_bkref_cache, build_trtable, group_nodes_into_DFAstates):
(check_node_accept_bytes, check_node_accept, extend_buffers):
(match_ctx_init, match_ctx_clean, match_ctx_free, match_ctx_add_entry):
(search_cur_bkref_entry, match_ctx_add_subtop, match_ctx_add_sublast):
(sift_ctx_init):
Likewise.
* lib/regex_internal.h:
(re_string_allocate, re_string_construct, re_string_reconstruct):
(re_string_realloc_buffers, build_wcs_buffer, build_wcs_upper_buffer):
(build_upper_buffer, re_string_translate_buffer, re_string_destruct):
(re_string_elem_size_at, re_string_char_size_at, re_string_wchar_at):
(re_string_context_at, re_string_peek_byte_case):
(re_string_fetch_byte_case): Declare even if RE_NO_INTERNAL_PROTOTYPES
is defined, since we now use prototypes always.
* lib/regex.h (_RE_ARGS): Remove. No longer needed, since we assume
C89 or better. All uses removed.
|
|
6ce32a50
|
2005-08-20T00:58:13
|
|
(re_acquire_state, re_acquire_state_context) [defined lint]:
Suppress bogus uninitialized-variable warnings.
|
|
b89e8d75
|
2005-08-19T23:00:55
|
|
(re_string_realloc_buffers, re_node_set_insert):
(re_node_set_insert_last, re_dfa_add_node):
Rename local variables to avoid GCC shadowing warnings.
|
|
151e40bb
|
2005-07-07T08:08:39
|
|
* modules/regex (Files): Add lib/regex_internal.c,
lib/regex_internal.h, lib/regexec.c, lib/regcomp.c, m4/codeset.m4.
(Depends-on): Add extensions.
(Makefile.am): Remove lib_SOURCES; now done by m4 code.
* config/srclist.txt: Add regcomp.c, regex.c, regex.h, regex_internal.c,
regexec.c.
Add regex_internal.h too, but as a comment, since the libc version
is currently broken in gnulib mode.
* lib/regex.c, lib/regex.h: Sync from libc.
* lib/regcomp.c, lib/regexec_internal.c, lib/regex_internal.h, lib/regexec.c:
New files, synced from libc, except that regex_internal.h
currently has a small porting fix.
* m4/regex.m4: Adjust to new libc regex implementation.
(gl_INCLUDED_REGEX): Add AC_LIBSOURCES for
all the .c and .h parts of (the new) regex.
Quote the m4 stuff better.
Check for RE_ICASE bug of old gnulib.
Check for REG_STARTEND of recent libc.
Rename local variables from jm_* to gl_*.
Quote operand of "test -f".
Say "recent enough" version of libc, not "version 2".
(gl_PREREQ_REGEX): Remove AC_FUNC_ALLOCA, since alloca is a
prerequisite module. Remove AC_HEADER_STDC; no longer needed.
Check for locale.h, isblank, mbrtowc, wcrtomb, wcscoll.
Remove check for btowc, isascii.
Require AM_LANGINFO_CODESET.
|