lib/regex_internal.c


Log

Author Commit Date CI Message
Derek R. Price 1e3866b6 2005-09-16T00:23:36 * regcomp.c, regexec.c, regex_internal.c: Back out previous changes, consolidating in... * regex_internal.h: ...this file.
Derek R. Price 594190cb 2005-09-15T19:14:23 * regex_internal.h: Blank `pure' for GNUC < 3. * regex_internal.c: Ditto, using this... (__GNUC_PREREQ): ...new macro. * regcomp.c, regexec.c: Blank `always_inline' for GNUC < 3.1 using... (__GNUC_PREREQ): ...this new macro.
Paul Eggert c4f640f1 2005-09-06T07:36:48 Change bitset word type from unsigned int to unsigned long int, as this has better performance on typical 64-bit hosts. Port bitset code to hosts with unusual word sizes. * lib/regcomp.c (build_equiv_class, build_charclass): (build_range_exp, build_collating_symbol): Prefer bitset to re_bitset_ptr_t in prototypes, when the actual argument is a bitset. This is merely a style issue, but it makes it clearer that an entire array is expected. (re_compile_fastmap_iter, init_dfa, init_word_char, optimize_subexps): * lib/regcomp.c (lower_subexp, parse_bracket_exp): (built_charclass_op): Port to the case where bitset_word is not the same as unsigned int. * lib/regex_internal.h (bitset_set, bitset_clear, bitset_contain): (bitset_not, bitset_merge, bitset_set_all, bitset_mask): Likewise. * lib/regexec.c (check_dst_limits_calc_pos_1): (check_subexp_matching_top): (build_trtable, group_nodes_into_DFAstates): Likewise. * lib/regcomp.c (re_compile_fastmap_iter, utf8_sb_map): (optimize_utf8): Don't assume that SBC_MAX is a multiple of BITSET_WORD_BITS. * lib/regex_internal.h (bitset_set_all, bitset_not): Likewise. * lib/regexec.c (group_nodes_into_DFAstates): Likewise. * lib/regcomp.c (utf8_sb_map): Don't assume UINT_MAX == 0xffffffff. * lib/regcomp.c (optimize_subexps, lower_subexp): Work even if bitset_word has holes in its bitwise representation. * lib/regex_internal.h (BITSET_WORD_BITS): Likewise. * lib/regexec.c (check_dst_limits_calc_pos_1): (heck_subexp_matching_top): Likewise. * lib/regex_internal.c (re_string_reconstruct): Don't assume UCHAR_MAX == 255. * lib/regex_internal.h (bitset_set_all): Likewise. * lib/regex_internal.h (BITSET_WORD_BITS): Renamed from UINT_BITS. All uses changed. (BITSET_WORDS): Renamed from BITSET_UINTS. All uses changed. (bitset_word): New type, replacing 'unsigned int' for bitset uses. All uses changed. (BITSET_WORD_MAX): New macro. (bitset_set, bitset_clear, bitset_contain, bitset_empty): (bitset_set_all, bitset_copy): Now inline functions, not macros. (bitset_empty, bitset_copy): Prefer sizeof (bitset) to multiplying it out ourselves. (bitset_not_merge): Remove; unused. (bitset_contain): Return bool, not unsigned int with one bit on. All callers changed. * lib/regexec.c (build_trtable): Don't assume bitset has no stricter alignment than re_node_set; do this by defining a new internal type struct dests_alloc and using it to allocate memory. * config/srclist.txt: Add glibc bug 1302.
Paul Eggert 812cbebe 2005-09-02T22:54:59 Check for arithmetic overflow when calculating sizes, to prevent some buffer-overflow issues. These patches are conservative, in the sense that when I couldn't determine whether an overflow was possible, I inserted a run-time check. * regex_internal.h (re_xmalloc, re_xrealloc, re_x2realloc): New macros. (SIZE_MAX) [!defined SIZE_MAX]: New macro. (re_alloc_oversized, re_x2alloc_oversized, re_xnmalloc): (re_xnrealloc, re_x2nrealloc): New inline functions. * lib/regcomp.c (init_dfa, analyze, build_range_exp, parse_bracket_exp): (build_equiv_class, build_charclass): Check for arithmetic overflow in size expression calculations. * lib/regex_internal.c (re_string_realloc_buffers): (build_wcs_upper_buffer, re_node_set_add_intersect): (re_node_set_init_union, re_node_set_insert, re_node_set_insert_last): (re_dfa_add_node, register_state): Likewise. * lib/regexec.c (re_search_stub, re_copy_regs, re_search_internal): (prune_impossible_nodes, push_fail_stack, set_regs, check_arrival): (build_trtable, extend_buffers, match_ctx_init, match_ctx_add_entry): (match_ctx_add_subtop, match_ctx_add_sublast): Likewise.
Paul Eggert 9581d12c 2005-09-01T22:10:59 * regex_internal.c (re_string_context_at): Fix bug where the code assumed that Idx is signed. * config/srclist.txt: Add glibc bug 1287.
Paul Eggert 7094f1ff 2005-09-01T21:01:26 * lib/regex_internal.c (build_wcs_upper_buffer): Fix portability bugs in int versus size_t comparisons. * config/srclist.txt: Add glibc bug 1285, 1286.
Paul Eggert 1e5cfc92 2005-09-01T19:41:07 Use bool where appropriate. * lib/regcomp.c (re_set_fastmap): ICASE arg is bool, not int. All callers changed. (calc_eclosure_iter): Likewise, for ROOT arg. (parse_bracket_element): Likewise, for ACCEPT_HYPHEN arg. (build_charclass_op): Likewise, for NON_MATCH arg. * lib/regex_internal.c (re_string_allocate, re_string_construct): (re_string_construct_common): Likewise, for ICASE arg. * lib/regexec.c (re_search_2_stub, re_search_stub): Likewise, for RET_LEN arg. (check_matching): Likewise, for FL_LONGEST_MATCH arg. (set_regs): Likewise, for FL_BACKTRACK arg. * lib/regcomp.c (re_compile_fastmap_iter, optimize_utf8): (duplicate_node_closure, calc_inveclosure, calc_eclosure): (calc_eclosure_iter, parse_bracket_exp): Use bool for internal variables that are booleans. * lib/regexec.c (re_search_internal, check_matching): (proceed_next_node): (set_regs, build_sifted_states, sift_states_bkref): (check_arrival_add_next_nodes, check_arrival_expand_ecl_sub): (expand_bkref_cache, build_trtable, group_nodes_into_DFAstates): (find_collation_sequence_value): Likewise. * lib/regex_internal.c (re_node_set_insert, re_node_set_insert_last): (re_node_set_compare): Return bool, not int. All callers changed. * lib/regexec.c (check_halt_node_context, check_dst_limits): (build_trtable, check_node_accept): Likewise. * lib/regex_internal.h: Include stdbool.h. Fix bugs uncovered when converting to bool. * lib/regcomp.c (calc_eclosure_iter): Check for storage allocation failure instead of charging ahead blindly. * lib/regex_internal.c (register_state): Likewise. * lib/regexec.c (re_search_2_stub): Use simpler method than boolean for freeing internal storage. (group_nodes_into_DFA_states): Use unsigned int, not int, for bitset pieces used as boolean, to avoid undefined behavior on hosts that do int overflow checking. * config/srclist.txt: Add glibc bug 1285.
Paul Eggert fec9ced8 2005-09-01T07:03:01 * lib/regex_internal.c (re_string_reconstruct): Don't assume buffer lengths fit in regoff_t; this isn't true if regoff_t is the same width as size_t. * lib/regex.c (re_search_internal): 5th arg is LAST_START (= START + RANGE) instead of RANGE. This avoids overflow problems when regoff_t is the same width as size_t. All callers changed. (re_search_2_stub): Check for overflow when adding the sizes of the two strings. (re_search_stub): Check for overflow when adding START to RANGE; if it occurs, substitute the extreme value. * config/srclist.txt: Add glibc bug 1284.
Paul Eggert ea626b10 2005-08-31T23:36:42 * lib/regcomp.c (search_duplicated_node): Make first pointer arg a pointer-to-const. * lib/regex_internal.c (create_ci_newstate, create_cd_newstate): (register_state): Likewise. * lib/regexec.c (search_cur_bkref_entry, check_dst_limits): (check_dst_limits_calc_pos_1, check_dst_limits_calc_pos): (group_nodes_into_DFAstates): Likewise. * config/srclist.txt: Add glibc bug 1282.
Paul Eggert 28492cce 2005-08-31T22:51:09 On 64-bit hosts (where size_t is 64 bits and int is 32 bits), the old glibc regex code mishandles strings longer than 2**31 bytes. This patch fixes this when the regex code is used in gnulib (i.e., outside glibc). * lib/regex.h (_REGEX_LARGE_OFFSETS): New feature-test macro, governing whether the rest of this patch is active. By default, the macro is disabled and the patch has no effect. (regoff_t) [defined _REGEX_LARGE_OFFSETS]: Define to off_t, not int. (__re_idx_t, __re_size_t, __re_long_size_t): New types. (struct re_pattern_buffer, re_search, re_search_2, re_match): (re_match_2, re_set_registers): Use the new types. * lib/regex_internal.h (Idx, re_hashval_t): New types. (REG_MISSING, REG_ERROR, REG_VALID_INDEX, REG_VALID_NONZERO_INDEX): New macros. (re_node_set, re_charset_t, re_token_t, re_string_realloc_buffers): (re_string_context_at, bin_tree_t, re_dfastate_t): (struct re_state_table_entry, state_array_t, re_sub_match_last_t): (re_sub_match_top_t, re_match_context_t, re_sift_context_t): (struct re_fail_stack_ent_t, struct re_fail_stack_t, struct re_dfa_t): (re_string_char_size_at, re_string_wchar_at): (re_string_elem_size_at): Use the new types and macros to port to 64-bit hosts. Use unsigned types for internal values, so that the code mostly works even for arrays larger than SSIZE_MAX. * lib/regcomp.c (re_compile_internal, init_dfa, duplicate_node): (search_duplicated_node, calc_eclosure_iter, fetch_number): (parse_reg_exp, parse_branch, parse_expression, parse_sub_exp): (build_equiv_class, build_charclass, re_compile_fastmap_iter): (free_dfa_content, create_initial_state, optimize_utf8, analyze): (optimize_subexps, calc_first, link_nfa_nodes, duplicate_node_closure): (calc_inveclosure, parse_dup_op, build_range_exp): (build_collating_symbol, parse_bracket_exp, build_charclass_op): (fetch_number, create_token_tree, mark_opt_subexp): Likewise. * lib/regex_internal.c (re_string_construct_common, create_ci_newstate): (create_cd_newstate, re_string_allocate, re_string_construct): (re_string_realloc_buffers, build_wcs_upper_buffer): (re_string_skip_chars, build_upper_buffer, re_string_translate_buffer): (re_string_reconstruct, re_string_peek_byte_case): (re_string_fetch_byte_case, re_string_context_at): (re_node_set_alloc, re_node_set_init_1, re_node_set_init_2): (re_node_set_init_copy, re_node_set_add_intersect): (re_node_set_init_union, re_node_set_merge, re_node_set_insert): (re_node_set_insert_last, re_node_set_compare, re_node_set_contains): (re_node_set_remove_at, re_dfa_add_node, calc_state_hash): (re_acquire_state, re_acquire_state_context, register_state): Likewise. * lib/regex.c (match_ctx_init, match_ctx_add_entry, search_cur_bkref_entry): (match_ctx_add_subtop, match_ctx_add_sublast, sift_ctx_init): (re_search_internal, re_search_2_stub, re_search_stub) (re_copy_regs, check_matching, check_halt_state_context, update_regs): (push_fail_stack, sift_states_iter_mb, build_sifted_states): (update_cur_sifted_state, check_dst_limits): (check_dst_limits_calc_pos_1, check_dst_limits_calc_pos): (check_subexp_limits, sift_states_bkref, merge_state_array): (check_subexp_matching_top, get_subexp, get_subexp_sub): (find_subexp_node, check_arrival, check_arrival_add_next_nodes): (check_arrival_expand_ecl, check_arrival_expand_ecl_sub): (expand_bkref_cache, check_node_accept_bytes): (group_nodes_into_DFAstates, check_node_accept, regexec, re_match): (re_search, re_match_2, re_search_2, prune_impossible_nodes): (acquire_init_state_context, check_halt_node_context): (proceed_next_node, pop_fail_stack, set_regs, free_fail_stack_return): (sift_states_backward, clean_state_log_if_needed): (sub_epsilon_src_nodes, add_epsilone_src_nodes, merge_state_with_log): (find_recover_state, transit_state_sb, transit_state_mb): (transit_state_bkref, build_trtable, match_ctx_clean): Likewise. * lib/regcomp.c (parse_dup_op): Add an extra test if Idx is unsigned, to work around an assumption that REG_MISSING is negative. * m4/regex.m4 (gl_REGEX): Require AC_SYS_LARGEFILE, Define _REGEX_LARGE_OFFSETS). Test for regoff_t/off_t bug in 64-bit and large-file glibc and in 32-bit large-file Solaris. * config/srclist.txt: Add glibc bug 1281.
Paul Eggert 3af956ae 2005-08-26T21:47:51 * config/srclist.text: Add glibc bug 1248. * lib/regex_internal.h: Remove all references to RE_NO_INTERNAL_PROTOTYPES; no longer neeeded now that we assume C89 or better. (bitset_not, bitset_merge, bitset_not_merge): (bitset_mask, re_string_allocate, re_string_construct): (re_string_reconstruct, re_string_destruct, re_string_elem_size_at): (re_string_char_size_at, re_string_wchar_at, re_string_peek_byte_case): (re_string_fetch_byte_case, re_node_set_alloc, re_node_set_init_1): (re_node_set_init_2, re_node_set_init_copy, re_node_set_add_intersect): (re_node_set_init_union, re_node_set_merge, re_node_set_insert): (re_node_set_insert_last, re_node_set_compare, re_node_set_contains): (re_node_set_remove_at, re_dfa_add_node, re_acquire_state): (re_acquire_state_context): Remove unnecessary forward decls. (re_string_char_size_at, re_string_wchar_at, re_string_elem_size_at): Put __attribute at function definition, now that the function decl has been removed. * lib/regex_internal.c (re_string_peek_byte_case): (re_string_fetch_byte_case, re_node_set_compare, re_node_set_contains): Likewise.
Paul Eggert cad71bd9 2005-08-25T20:39:57 Make regex safe for g++. This fixes one real bug (an "err" that should have been "*err"). * config/srclist.txt: Add glibc bug 1241. * lib/regex_internal.h (re_calloc): New macro, consistent with re_malloc etc. All callers of calloc changed to use re_calloc. * lib/regex_internal.c (build_wcs_upper_buffer): Return reg_errcode_t, not int. All callers changed. * lib/regcomp.c (re_compile_fastmap_iter): Don't use alloca (mb_cur_max); just use an array of size MB_LEN_MAX. * lib/regexec.c (push_fail_stack): Use re_realloc, not realloc. (find_recover_state): Change "err" to "*err"; this fixes what appears to be a real bug. (check_arrival_expand_ecl_sub): Be consistent about reg_errcode_t versus int.
Paul Eggert e6d7b6da 2005-08-24T23:29:39 * config/srclist.txt: Add glibc bug 1237. * lib/regcomp.c, lib/regex_internal.c, lib/regex_internal.h: * lib/regexec.c: All uses of recently-renamed identifiers changed to use the new, POSIX-compliant names. The code will build and run just fine without these changes, but it's better to eat our own dog food and use the standard-conforming names. * m4/regex.m4 (gl_REGEX): Use POSIX-compliant spellings when testing for GNU regex features.
Paul Eggert 576ad385 2005-08-23T18:55:44 * config/srclist.txt: Add glibc bug 1231. * lib/regex_internal.c (re_string_skip_chars, register_state): (calc_state_hash): Remove forward decls; no longer needed now that we use prototypes. * lib/regexec.c (acquire_init_state_context, check_halt_node_context): (proceed_next_node, pop_fail_stack, sub_epsilon_src_nodes): (clean_state_log_if_needed): Likewise.
Paul Eggert 9c0a244e 2005-08-21T03:31:45 * config/srclist.txt: Add glibc bug 1226. * lib/regex_internal.c (calc_state_hash): Put 'inline' before type, since some compilers warn about it otherwise.
Paul Eggert 087e9e5b 2005-08-20T07:42:15 * config/srclist.txt: Add glibc bugs 1220, 1221, 1222. * lib/regcomp.c: (re_compile_pattern, re_set_syntax, re_compile_fastmap): (re_compile_fastmap_iter, regcomp, regerror, regfree): (re_compile_internal, init_dfa, init_word_char, free_workarea_compile): (create_initial_state, optimize_utf8, analyze, postorder, preorder): (optimize_subexps, lower_subexps, lower_subexp, calc_first, calc_next): (link_nfa_nodes, duplicate_node_closure, search_duplicated_node): (duplicate_node, calc_inveclosure, calc_eclosure, calc_eclosure_iter): (fetch_token, peek_token, peek_token_bracket, parse, parse_reg_exp): (parse_branch, parse_expression, parse_sub_exp, parse_dup_op): (build_range_exp, build_collating_symbol, parse_bracket_exp): (parse_bracket_element, parse_bracket_symbol, build_equiv_class): (build_charclass, build_charclass_op, fetch_number, create_tree): (create_token_tree, mark_opt_subexp, duplicate_tree): Use prototypes rather than old-style definitions. * lib/regex_internal.c: (re_string_allocate, re_string_construct, re_string_realloc_buffers): (re_string_construct_common, build_wcs_buffer, build_wcs_upper_buffer): (re_string_skip_chars, build_upper_buffer, re_string_translate_buffer): (re_string_reconstruct, re_string_peek_byte_case): (re_string_fetch_byte_case, re_string_destruct, re_string_context_at): (re_node_set_alloc, re_node_set_init_1, re_node_set_init_2): (re_node_set_init_copy, re_node_set_add_intersect): (re_node_set_init_union, re_node_set_merge, re_node_set_insert): (re_node_set_insert_last, re_node_set_compare, re_node_set_contains): (re_node_set_remove_at, re_dfa_add_node, calc_state_hash): (re_acquire_state, re_acquire_state_context, register_state): (create_ci_newstate, create_cd_newstate, free_state): Likewise. * lib/regexec.c (regexec, re_match, re_search, re_match_2, re_search_2): (re_search_2_stub, re_search_stub, re_copy_regs, re_set_registers): (re_search_internal, prune_impossible_nodes): (acquire_init_state_context, check_matching, static): (check_halt_node_context, check_halt_state_context, proceed_next_node): (push_fail_stack, pop_fail_stack, set_regs, free_fail_stack_return): (update_regs, sift_states_backward, build_sifted_states): (clean_state_log_if_needed, merge_state_array): (update_cur_sifted_state, add_epsilon_src_nodes): (sub_epsilon_src_nodes, check_dst_limits, check_dst_limits_calc_pos_1): (check_dst_limits_calc_pos, check_subexp_limits, sift_states_bkref): (sift_states_iter_mb, transit_state, merge_state_with_log, static): (find_recover_state, check_subexp_matching_top, transit_state_mb): (transit_state_bkref, get_subexp, get_subexp_sub, find_subexp_node): (check_arrival, check_arrival_add_next_nodes): (check_arrival_expand_ecl, check_arrival_expand_ecl_sub): (expand_bkref_cache, build_trtable, group_nodes_into_DFAstates): (check_node_accept_bytes, check_node_accept, extend_buffers): (match_ctx_init, match_ctx_clean, match_ctx_free, match_ctx_add_entry): (search_cur_bkref_entry, match_ctx_add_subtop, match_ctx_add_sublast): (sift_ctx_init): Likewise. * lib/regex_internal.h: (re_string_allocate, re_string_construct, re_string_reconstruct): (re_string_realloc_buffers, build_wcs_buffer, build_wcs_upper_buffer): (build_upper_buffer, re_string_translate_buffer, re_string_destruct): (re_string_elem_size_at, re_string_char_size_at, re_string_wchar_at): (re_string_context_at, re_string_peek_byte_case): (re_string_fetch_byte_case): Declare even if RE_NO_INTERNAL_PROTOTYPES is defined, since we now use prototypes always. * lib/regex.h (_RE_ARGS): Remove. No longer needed, since we assume C89 or better. All uses removed.
Paul Eggert 6ce32a50 2005-08-20T00:58:13 (re_acquire_state, re_acquire_state_context) [defined lint]: Suppress bogus uninitialized-variable warnings.
Paul Eggert b89e8d75 2005-08-19T23:00:55 (re_string_realloc_buffers, re_node_set_insert): (re_node_set_insert_last, re_dfa_add_node): Rename local variables to avoid GCC shadowing warnings.
Paul Eggert 151e40bb 2005-07-07T08:08:39 * modules/regex (Files): Add lib/regex_internal.c, lib/regex_internal.h, lib/regexec.c, lib/regcomp.c, m4/codeset.m4. (Depends-on): Add extensions. (Makefile.am): Remove lib_SOURCES; now done by m4 code. * config/srclist.txt: Add regcomp.c, regex.c, regex.h, regex_internal.c, regexec.c. Add regex_internal.h too, but as a comment, since the libc version is currently broken in gnulib mode. * lib/regex.c, lib/regex.h: Sync from libc. * lib/regcomp.c, lib/regexec_internal.c, lib/regex_internal.h, lib/regexec.c: New files, synced from libc, except that regex_internal.h currently has a small porting fix. * m4/regex.m4: Adjust to new libc regex implementation. (gl_INCLUDED_REGEX): Add AC_LIBSOURCES for all the .c and .h parts of (the new) regex. Quote the m4 stuff better. Check for RE_ICASE bug of old gnulib. Check for REG_STARTEND of recent libc. Rename local variables from jm_* to gl_*. Quote operand of "test -f". Say "recent enough" version of libc, not "version 2". (gl_PREREQ_REGEX): Remove AC_FUNC_ALLOCA, since alloca is a prerequisite module. Remove AC_HEADER_STDC; no longer needed. Check for locale.h, isblank, mbrtowc, wcrtomb, wcscoll. Remove check for btowc, isascii. Require AM_LANGINFO_CODESET.