New multi-line regexp and new regexp syntax.

author Francesco Potortì <pot@gnu.org>

Thu, 13 Jun 2002 11:15:46 +0000 (11:15 +0000)

committer Francesco Potortì <pot@gnu.org>

Thu, 13 Jun 2002 11:15:46 +0000 (11:15 +0000)
author Francesco Potortì <pot@gnu.org>
Thu, 13 Jun 2002 11:15:46 +0000 (11:15 +0000)
committer Francesco Potortì <pot@gnu.org>
Thu, 13 Jun 2002 11:15:46 +0000 (11:15 +0000)
diff --git a/etc/NEWS b/etc/NEWS

index 441316e675701ad7216b8ce526e39914301771ce..7d993150cd67490904725deceaff736fe0428fdb 100644 (file)
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -569,6 +569,23 @@ comparison.
  
  ** Etags changes.
  
+*** New syntax for regular expressions, multi-line regular expressions.
+The syntax --ignore-case-regexp=/REGEX/NAME/ is now undocumented and
+retained only for backward compatibility.  The new equivalent syntax is
+--regex=/REGEX/NAME/i.  More generally, it is --regex=/REGEX/NAME/MODS,
+where `/NAME' is optional, as usual, and MODS is a string of 0 or more
+characters among `i' (ignore case), `m' (multi-line) and `s'
+(single-line).  The `m' and `s' modifiers behave as in Perl regular
+expressions: `m' allows regexps to match more than one line, while `s'
+(which implies `m') means that `.' matches newlines.  The ability to
+span newlines allows writing of much more powerful regular expressions
+and rapid prototyping for tagging new languages.
+
+*** Regular expressions can use char escape sequences as in Gcc
+The escaped character sequence \a, \b, \d, \e, \f, \n, \r, \t, \v,
+respectively, stand for the ASCII characters BEL, BS, DEL, ESC, FF, NL,
+CR, TAB, VT,
+
  *** In Prolog, etags creates tags for rules in addition to predicates.
  
  *** In Perl, packages are tags.
@@ -596,9 +613,6 @@ be used (only once) in place of a file name on the command line.  Etags
  will read from standard input and mark the produced tags as belonging to
  the file FILE.
  
-*** Regular expressions can use char escape sequences as in Gcc
-These are the escapes \a, \b, \d, \e, \f, \n, \r, \t, \v.
-
  +++
  ** The command line option --no-windows has been changed to
  --no-window-system.  The old one still works, but is deprecated.
diff --git a/etc/etags.1 b/etc/etags.1

index ffa937750bc930bead7e3c6ee66571d1938de219..75af1ef437a74e750b14deea798dd8875fd5bf7c 100644 (file)
--- a/etc/etags.1
+++ b/etc/etags.1
@@ -22,7 +22,6 @@ etags, ctags \- generate tag file for Emacs, vi
  [\|\-\-ignore\-indentation\|] [\|\-\-language=\fIlanguage\fP\|]
  [\|\-\-members\|] [\|\-\-output=\fItagfile\fP\|]
  [\|\-\-regex=\fIregexp\fP\|] [\|\-\-no\-regex\|]
-[\|\-\-ignore\-case\-regex=\fIregexp\fP\|]
  [\|\-\-help\|] [\|\-\-version\|]
  \fIfile\fP .\|.\|.
  
@@ -36,7 +35,6 @@ etags, ctags \- generate tag file for Emacs, vi
  [\|\-\-globals\|] [\|\-\-ignore\-indentation\|]
  [\|\-\-language=\fIlanguage\fP\|] [\|\-\-members\|]
  [\|\-\-output=\fItagfile\fP\|] [\|\-\-regex=\fIregexp\fP\|]
-[\|\-\-ignore\-case\-regex=\fIregexp\fP\|]
  [\|\-\-typedefs\|] [\|\-\-typedefs\-and\-c++\|]
  [\|\-\-update\|] [\|\-\-no\-warn\|]
  [\|\-\-help\|] [\|\-\-version\|]
@@ -149,27 +147,32 @@ Explicit name of file for tag table; overrides default \fBTAGS\fP or
  \fBtags\fP.   (But ignored with \fB\-v\fP or \fB\-x\fP.)
  .TP
  \fB\-r\fP \fIregexp\fP, \fB\-\-regex=\fIregexp\fP
-.TP
-\fB\-\-ignore\-case\-regex=\fIregexp\fP
-Make tags based on regexp matching for each line of the files
-following this option, in addition to the tags made with the standard
-parsing based on language.  When using \fB\-\-regex\fP, case is
-significant, while it is not with \fB\-\-ignore\-case\-regex\fP. May
-be freely intermixed with filenames and the \fB\-R\fP option.  The
-regexps are cumulative, i.e. each option will add to the previous
-ones.  The regexps are of the form:
+
+Make tags based on regexp matching for the files following this option,
+in addition to the tags made with the standard parsing based on
+language. May be freely intermixed with filenames and the \fB\-R\fP
+option.  The regexps are cumulative, i.e. each such option will add to
+the previous ones.  The regexps are of the form:
  .br
-       \fB/\fP\fItagregexp\fP[\fB/\fP\fInameregexp\fP]\fB/\fP
+       \fB/\fP\fItagregexp/\fP[\fInameregexp\fP\fB/\fP]\fImodifiers\fP
  .br
  
-where \fItagregexp\fP is used to match the lines that must be tagged.
-It should not match useless characters.  If the match is
-such that more characters than needed are unavoidably matched by
-\fItagregexp\fP, it may be useful to add a \fInameregexp\fP, to
-narrow down the tag scope.  \fBctags\fP ignores regexps without a
-\fInameregexp\fP.  The syntax of regexps is the same as in emacs.
-The following character escape sequences are supported:
-\\a, \\b, \\d, \\e, \\f, \\n, \\r, \\t, \\v.
+where \fItagregexp\fP is used to match the tag.  It should not match
+useless characters.  If the match is such that more characters than
+needed are unavoidably matched by \fItagregexp\fP, it may be useful to
+add a \fInameregexp\fP, to narrow down the tag scope.  \fBctags\fP
+ignores regexps without a \fInameregexp\fP.  The syntax of regexps is
+the same as in emacs.  The following character escape sequences are
+supported: \\a, \\b, \\d, \\e, \\f, \\n, \\r, \\t, \\v, which
+respectively stand for the ASCII characters BEL, BS, DEL, ESC, FF, NL,
+CR, TAB, VT.
+.br
+The \fImodifiers\fP are a sequence of 0 or more characters among
+\fIi\fP, which means to ignore case when matching; \fIm\fP, which means
+that the \fItagregexp\fP will be matched against the whole file contents
+at once, rather than line by line, and the matching sequence can match
+multiple lines; and \fIs\fP, which implies \fIm\fP and means that the
+dot character in \fItagregexp\fP matches the newline char as well.
  
  .br
  Here are some examples.  All the regexps are quoted to protect them
diff --git a/lib-src/ChangeLog b/lib-src/ChangeLog

index 85a0a332ba6ea56e724e89666a0a8153f8e22679..c870164f7a5ba795681ed94c52057b15dd3f0684 100644 (file)
--- a/lib-src/ChangeLog
+++ b/lib-src/ChangeLog
@@ -1,3 +1,31 @@
+2002-06-12  Francesco Potorti`  <pot@gnu.org>
+
+       * etags.c: New multi-line regexp and new regexp syntax.
+       (arg_type): at_icregexp label removed (obsolete).
+       (pattern): New member multi_line for multi-line regexps.
+       (filebuf): A global buffer containing the whole file as a string
+       for multi-line regexp matching.
+       (need_filebuf): Global flag raised if multi-line regexps used.
+       (print_help): Document new regexp modifiers, remove references to
+       obsolete option --ignore-case-regexp.
+       (main): Do not set regexp syntax and translation table here.
+       (main): Treat -c option as a backward compatibility hack.
+       (main, find_entries): Init and free filebuf.
+       (find_entries): Call regex_tag_multiline after the regular parser.
+       (scan_separators): Check for untermintaed regexp and return NULL.
+       (analyse_regex, add_regex): Remove the ignore_case argument, which
+       is now a modifier to the regexp.  All callers changed.
+       (add_regex): Manage the regexp modifiers.
+       (regex_tag_multiline): New function.  Reads from filebuf.
+       (readline_internal): If necessary, copy the whole file into filebuf.
+       (readline): Skip multi-line regexps, leave them to regex_tag_multiline.
+
+2002-06-11  Francesco Potorti`  <pot@gnu.org>
+
+       * etags.c (add_regex): Better check for null regexps.
+       (readline): Check for regex matching null string.
+       (find_entries): Reorganisation.
+
  2002-06-07  Francesco Potorti`  <pot@gnu.org>
  
         * etags.c (scan_separators): Support all character escape
author	Francesco Potortì <pot@gnu.org>
	Thu, 13 Jun 2002 11:15:46 +0000 (11:15 +0000)
committer	Francesco Potortì <pot@gnu.org>
	Thu, 13 Jun 2002 11:15:46 +0000 (11:15 +0000)
etc/NEWS		patch \| blob \| history
etc/etags.1		patch \| blob \| history
lib-src/ChangeLog		patch \| blob \| history