From 95cd4c403d8dd85f8371bc92bca328cd690b84d5 Mon Sep 17 00:00:00 2001 From: Stefan Monnier Date: Wed, 8 Mar 2000 23:26:00 +0000 Subject: [PATCH] *** empty log message *** --- etc/NEWS | 2 +- man/search.texi | 22 ++++++++----- src/ChangeLog | 82 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 98 insertions(+), 8 deletions(-) diff --git a/etc/NEWS b/etc/NEWS index eeb04e11410..80f4d96a785 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1087,7 +1087,7 @@ what BODY returns. +++ ** Regular expressions now support intervals \{n,m\} as well as -Perl's non-greedy *? +? and ?? operators. +Perl's shy-groups \(?:...\) and non-greedy *? +? and ?? operators. +++ ** The optional argument BUFFER of function file-local-copy has been diff --git a/man/search.texi b/man/search.texi index de6cd92849e..96f984b053c 100644 --- a/man/search.texi +++ b/man/search.texi @@ -432,16 +432,16 @@ are non-greedy variants of the operators above. The normal operators as they can, while if you append a @samp{?} after them, it makes them non-greedy: they will match as little as possible. -@item \@{n,m\@} +@item \@{@var{n},@var{m}\@} is another postfix operator that specifies an interval of iteration: -the preceding regular expression must match between @samp{n} and -@samp{m} times. If @samp{m} is omitted, then there is no upper bound -and if @samp{,m} is omitted, then the regular expression must match -exactly @samp{n} times. @* +the preceding regular expression must match between @var{n} and +@var{m} times. If @var{m} is omitted, then there is no upper bound +and if @var{,m} is omitted, then the regular expression must match +exactly @var{n} times. @* @samp{\@{0,1\@}} is equivalent to @samp{?}. @* @samp{\@{0,\@}} is equivalent to @samp{*}. @* @samp{\@{1,\@}} is equivalent to @samp{+}. @* -@samp{\@{n\@}} is equivalent to @samp{\@{n,n\@}}. +@samp{\@{@var{n}\@}} is equivalent to @samp{\@{@var{n},@var{n}\@}}. @item [ @dots{} ] is a @dfn{character set}, which begins with @samp{[} and is terminated @@ -560,7 +560,15 @@ To record a matched substring for future reference. This last application is not a consequence of the idea of a parenthetical grouping; it is a separate feature that is assigned as a second meaning to the same @samp{\( @dots{} \)} construct. In practice -there is no conflict between the two meanings. +there is almost no conflict between the two meanings. + +@item \(?: @dots{} \) +is another grouping construct (often called ``shy'') that serves the same +first two purposes, but not the third: +it cannot be referred to later on by number. This is only useful +for mechanically constructed regular expressions where grouping +constructs need to be introduced implicitly and hence risk changing the +numbering of subsequent groups. @item \@var{d} matches the same text that matched the @var{d}th occurrence of a diff --git a/src/ChangeLog b/src/ChangeLog index 847ac8c7748..3d7584433b1 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,85 @@ +2000-03-08 Stefan Monnier + + This is a big redesign of failure-stack and register handling, prompted + by bugs revealed when trying to add shy-groups. Overall, what happened + is that loops are now structured a little differently, groups can be + shy and the code is a little simpler. + + * regex.h: Update the copyright. + (RE_SHY_GROUPS): New value. + (RE_UNMATCHED_RIGHT_PAREN_ORD): Renumber. + (RE_SYNTAX_EMACS): Add RE_SHY_GROUPS. + + * regex.c (enum re_opcode_t): Remove jump_past_alt, maybe_pop_jump, + push_dummy_failure and dumy_failure_jump. + Add on_failure_jump_(exclusive, loop and smart). + Also fix the comment for (start|stop)_memory since they now only take + one argument (the second has becomes unnecessary). + (print_partial_compiled_pattern): Adjust for changes in re_opcode_t. + (print_compiled_pattern): Use %ld to printf long ints and flush to make + debugging a little easier. + (union fail_stack_elt): Make the integer unsigned. + (struct fail_stack_type): Add a `frame' element. + (INIT_FAIL_STACK): Init `frame' as well. + (POP_PATTERN_OP): New macro for re_compile_fastmap. + (DEBUG_PUSH, DEBUG_POP): Remove. + (NUM_REG_ITEMS): Remove. + (NUM_NONREG_ITEMS): Adjust. + (FAILURE_PAT, FAILURE_STR, NEXT_FAILURE_HANDLE, TOP_FAILURE_HANDLE): + New macros for the cycle detection. + (ENSURE_FAIL_STACK): New macro for PUSH_FAILURE_(REG|POINT). + (PUSH_FAILURE_REG, POP_FAILURE_REG, CHECK_INFINITE_LOOP): New macros. + (PUSH_FAILURE_POINT): Don't push registers any more. + The pattern address pushed is not the destination of the jump + but the source of it instead. + (NUM_FAILURE_ITEMS): Remove. + (POP_FAILURE_POINT): Adapt to the new stack structure (i.e. pop + registers before the actual failure point). + Don't hardcode any meaning for str==NULL anymore. + (union register_info_type, REG_MATCH_NULL_STRING_P, IS_ACTIVE) + (MATCHED_SOMETHING, EVER_MATCHED_SOMETHING, SET_REGS_MATCHED): Remove. + (REG_UNSET_VALUE): Use NULL (why not?). + (compile_range): Remove declaration since it doesn't exist. + (struct compile_stack_elt_t): Remove inner_group_offset. + (old_reg(start|end), reg_info, reg_dummy, reg_info_dummy): Remove. + (regex_grow_registers): Remove dead code. + (FIXUP_ALT_JUMP): New macro. + (regex_compile): Add shy-groups + Change loops to use on_failure_jump_smart&jump instead of + on_failure_jump&maybe_pop_jump. + Change + loops to eliminate the initial (dummy_failure_)jump. + Remove c1_base (looks like unused variable to me). + Use `jump' instead of `jump_past_alt' and don't bother with + push_dummy_failure in alternatives since it is now unnecessary. + Use FIXUP_ALT_JUMP. + Eliminate a useless `#ifdef emacs' for (re)allocating the stack. + (re_compile_fastmap): Remove dead variables i and num_regs. + Exit from loop when bufp->can_be_null rather than jumping to `done'. + Avoid jumping backwards so as to ensure termination. + Use PATTERN_STACK_EMPTY and POP_PATTERN_OP. + Improved handling of backreferences. + Remove dead code in handling of `anychar'. + (skip_noops, mutually_exclusive_p): New functions taken from the + handling of `maybe_pop_jump' in re_match_2_internal. + Slightly improve mutually_exclusive_p to handle ".+\n". + ((lowest|highest)_active_reg, NO_(LOWEST|HIGHEST)_ACTIVE_REG) + Remove. + (re_match_2_internal): Use %p instead of 0x%x when printf'ing ptrs. + Don't SET_REGS_MATCHED anymore. Remove many dead variables. + Push register (in `start_memory') on the stack rather than storing it + in old_reg(start|end). + Remove the cycle detection from `stop_memory', replaced by the use + of on_failure_jump_loop for greedy loops. + Add code for the new on_failure_jump_. + Remove ad-hoc code in `on_failure_jump' to push more registers + in the case of a loop. + Take out code from `maybe_pop_jump' into separate functions and + adapt it to the semantics of `on_failure_jump_smart'. + Remove jump_past_alt, dummy_failure_jump and push_dummy_failure. + Remove dummy_failure handling and handling of `failures to jump + to on_failure_jump' (this last one was already dead code, it seems). + ((group|alt|common_op)_match_null_string_p): Remove. + 2000-03-08 Dave Love * config.in: Don't depend on __STDC__ for volatile. -- 2.39.5