From: Dave Love Date: Fri, 13 Oct 2000 16:36:35 +0000 (+0000) Subject: Non-ASCII in regexp ranges. X-Git-Tag: emacs-pretest-21.0.90~900 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=6cc089d2ade30c1d8dfc71d2d239b99a056834cd;p=emacs.git Non-ASCII in regexp ranges. --- diff --git a/lispref/searching.texi b/lispref/searching.texi index 0b54fcd2fe8..7274209adb7 100644 --- a/lispref/searching.texi +++ b/lispref/searching.texi @@ -311,10 +311,17 @@ matches both @samp{]} and @samp{-}. To include @samp{^} in a character alternative, put it anywhere but at the beginning. -The beginning and end of a range must be in the same character set -(@pxref{Character Sets}). Thus, @samp{[a-\x8e0]} is invalid because -@samp{a} is in the @sc{ascii} character set but the character 0x8e0 -(@samp{a} with grave accent) is in the Emacs character set for Latin-1. +The beginning and end of a range of multibyte characters must be in the +same character set (@pxref{Character Sets}). Thus, @samp{[\x8e0-\x97c]} +is invalid because character 0x8e0 (@samp{a} with grave accent) is in +the Emacs character set for Latin-1 but the character 0x97c (@samp{u} +with diaeresis) is in the Emacs character set for Latin-2. + +If a range starts with a unibyte character @var{c} and ends with a +multibyte character @var{c2}, the range is divided into two parts: one +is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where +@var{c1} is the first character of the charset to which @var{c2} +belongs. You cannot always match all non-@sc{ascii} characters with the regular expression @samp{[\200-\377]}. This works when searching a unibyte