From: Luc Teirlinck Date: Tue, 7 Mar 2006 23:28:33 +0000 (+0000) Subject: (Syntax of Regexps): More accurately describe X-Git-Tag: emacs-pretest-22.0.90~3744 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=179a6f216dc7a2f4dc7f490ea5c84953201d43d8;p=emacs.git (Syntax of Regexps): More accurately describe which characters are special in which situations. (Regexp Special): Recommend _not_ to quote `]' or `-' when they are not special. Describe in detail when `[' and `]' are special. (Regexp Backslash): Plenty of regexps with unbalanced square brackets are valid, so reword that statement. --- diff --git a/lispref/searching.texi b/lispref/searching.texi index 7c10ed6881b..b45467fbf83 100644 --- a/lispref/searching.texi +++ b/lispref/searching.texi @@ -235,12 +235,15 @@ it easier to verify even very complex regexps. Regular expressions have a syntax in which a few characters are special constructs and the rest are @dfn{ordinary}. An ordinary -character is a simple regular expression that matches that character and -nothing else. The special characters are @samp{.}, @samp{*}, @samp{+}, -@samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new -special characters will be defined in the future. Any other character -appearing in a regular expression is ordinary, unless a @samp{\} -precedes it. +character is a simple regular expression that matches that character +and nothing else. The special characters are @samp{.}, @samp{*}, +@samp{+}, @samp{?}, @samp{[}, @samp{^}, @samp{$}, and @samp{\}; no new +special characters will be defined in the future. The character +@samp{]} is special if it ends a character alternative (see later). +The character @samp{-} is special inside a character alternative. A +@samp{[:} and balancing @samp{:]} enclose a character class inside a +character alternative. Any other character appearing in a regular +expression is ordinary, unless a @samp{\} precedes it. For example, @samp{f} is not a special character, so it is ordinary, and therefore @samp{f} is a regular expression that matches the string @@ -468,6 +471,34 @@ ordinary since there is no preceding expression on which the @samp{*} can act. It is poor practice to depend on this behavior; quote the special character anyway, regardless of where it appears.@refill +As a @samp{\} is not special inside a character alternative, it can +never remove the special meaning of @samp{-} or @samp{]}. So you +should not quote these characters when they have no special meaning +either. This would not clarify anything, since backslashes can +legitimately precede these characters where they @emph{have} special +meaning, as in @code{[^\]} (@code{"[^\\]"} for Lisp string syntax), +which matches any single character except a backslash. + +In practice, most @samp{]} that occur in regular expressions close a +character alternative and hence are special. However, occasionally a +regular expression may try to match a complex pattern of literal +@samp{[} and @samp{]}. In such situations, it sometimes may be +necessary to carefully parse the regexp from the start to determine +which square brackets enclose a character alternative. For example, +@code{[^][]]}, consists of the complemented character alternative +@code{[^][]}, which matches any single character that is not a square +bracket, followed by a literal @samp{]}. + +The exact rules are that at the beginning of a regexp, @samp{[} is +special and @samp{]} not. This lasts until the first unquoted +@samp{[}, after which we are in a character alternative; @samp{[} is +no longer special (except when it starts a character class) but @samp{]} +is special, unless it immediately follows the special @samp{[} or that +@samp{[} followed by a @samp{^}. This lasts until the next special +@samp{]} that does not end a character class. This ends the character +alternative and restores the ordinary syntax of regular expressions; +an unquoted @samp{[} is special again and a @samp{]} not. + @node Char Classes @subsubsection Character Classes @cindex character classes in regexp @@ -740,8 +771,8 @@ with a symbol-constituent character. @kindex invalid-regexp Not every string is a valid regular expression. For example, a string -with unbalanced square brackets is invalid (with a few exceptions, such -as @samp{[]]}), and so is a string that ends with a single @samp{\}. If +that ends inside a character alternative without terminating @samp{]} +is invalid, and so is a string that ends with a single @samp{\}. If an invalid regular expression is passed to any of the search functions, an @code{invalid-regexp} error is signaled.