]> git.eshelyaron.com Git - emacs.git/commit
rx: Better translation of char-matching patterns
authorMattias Engdegård <mattiase@acm.org>
Sat, 12 Aug 2023 15:39:58 +0000 (17:39 +0200)
committerMattias Engdegård <mattiase@acm.org>
Sat, 12 Aug 2023 15:40:36 +0000 (17:40 +0200)
commitde6c1c4d5c92b92d5b280e157c2a5bc3228749f2
tree8f3f73dc1e94dd2f30c305019e7968e4a7c9c06a
parent7b1eb9d753bed5f2891d10efe164eb40ed3ab4fc
rx: Better translation of char-matching patterns

Translate or-patterns that (even partially) match single characters
into character alternatives which are more efficient in matching,
sometimes algorithmically so.  Example:

  (or "%" (in "a-z") space)

was previously translated to

  "%\\|[a-z]\\|[[:space:]]"

but now becomes

  "[%a-z[:space:]]"

Single-char patterns include `nonl` and `anychar`, which now can also
be used in set operations (union, complement and intersection), and
character classes.  For example, `(or nonl "\n")` is now equivalent to
`anychar`.

* lisp/emacs-lisp/rx.el (rx--expand-def): Remove, split into...
(rx--expand-def-form, rx--expand-def-symbol): ...these.
(rx--translate-compat-symbol-entry)
(rx--translate-compat-form-entry): New functions for handling the
legacy extension mechanism.
(rx--normalise-or-arg): Renamed to...
(rx--normalise-char-pattern): ...this, and rewrite.
(rx--all-string-or-args): Remove, split into...
(rx--all-string-branches-p, rx--collect-or-strings): ...these.
(rx--char-alt-union, rx--intersection-intervals)
(rx--reduce-to-char-alt, rx--optimise-or-args)
(rx--translate-char-alt, rx--human-readable): New.
(rx--translate-or, rx--translate-not, rx--translate-intersection):
Rewrite.
(rx--charset-p, rx--intervals-to-alt, rx--charset-intervals)
(rx--charset-union, rx--charset-intersection, rx--charset-all)
(rx--translate-union): Remove.
(rx--generate-alt): Decide whether to generate a negated character
alternative.
(rx--complement-intervals, rx--intersect-intervals)
(rx--union-intervals): Rename to...
(rx--interval-set-complement, rx--interval-set-intersection)
(rx--interval-set-union): ...these.
(rx--translate-symbol, rx--translate-form): Refactor extension
processing.  Handle synthetic `rx--char-alt` form.
* test/lisp/emacs-lisp/rx-tests.el (rx-or, rx-char-any-raw-byte)
(rx-any, rx-charset-or): Adapt to changes and extend.
* test/lisp/emacs-lisp/rx-tests.el (rx--complement-intervals)
(rx--union-intervals, rx--intersect-intervals): Rename to...
(rx--interval-set-complement, rx--interval-set-union)
(rx--interval-set-intersection): ...these.
lisp/emacs-lisp/rx.el
test/lisp/emacs-lisp/rx-tests.el