]> git.eshelyaron.com Git - emacs.git/commit
Don't distort character ranges in rx translation
authorMattias Engdegård <mattiase@acm.org>
Mon, 17 Jul 2023 11:05:21 +0000 (13:05 +0200)
committerMattias Engdegård <mattiase@acm.org>
Mon, 17 Jul 2023 15:56:54 +0000 (17:56 +0200)
commit157e735ce89ede9cc939f4ed0f72c5af7ae60735
tree58c17cfd219647bec129ccbc77ced41233d65844
parent7446a8c34e2b793df52dbf56b630e20f8c10568c
Don't distort character ranges in rx translation

The Emacs regexp engine interprets character ranges from ASCII to raw
bytes, such as [a-\xfe], as not including non-ASCII Unicode at all;
ranges from non-ACII Unicode to raw bytes, such as [ü-\x91], are
ignored entirely.

To make rx produce a translation that works as intended, split ranges
that that go from ordinary characters to raw bytes. Such ranges may
appear from set manipulation and regexp optimisation.

* lisp/emacs-lisp/rx.el (rx--generate-alt): Split intervals that
straddle the char-raw boundary when rendering a string regexp from an
interval set.
* test/lisp/emacs-lisp/rx-tests.el (rx-char-any-raw-byte):
Add test cases.
lisp/emacs-lisp/rx.el
test/lisp/emacs-lisp/rx-tests.el