From 2b31e667be95731d7e9ee328c8331eecf69b3831 Mon Sep 17 00:00:00 2001 From: Eli Zaretskii Date: Thu, 21 Jul 2022 09:53:45 +0300 Subject: [PATCH] ;Improve documentation of locale-specific string comparison * doc/lispref/strings.texi (Text Comparison): Mention the Unicode collation rules and buffer-local case-tables. --- doc/lispref/strings.texi | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index c9612e598a3..89120575f52 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -564,11 +564,19 @@ equal with respect to collation rules. A collation rule is not only determined by the lexicographic order of the characters contained in @var{string1} and @var{string2}, but also further rules about relations between these characters. Usually, it is defined by the -@var{locale} environment Emacs is running with. - -For example, characters with different coding points but -the same meaning might be considered as equal, like different grave -accent Unicode characters: +@var{locale} environment Emacs is running with and by the Standard C +library against which Emacs was linked@footnote{ +For more information about collation rules and their locale +dependencies, see @uref{https://unicode.org/reports/tr10/, The Unicode +Collation Algorithm}. Some Standard C libraries, such as the +@acronym{GNU} C Library (a.k.a.@: @dfn{glibc}) implement large +portions of the Unicode Collation Algorithm and use the associated +locale data, Common Locale Data Repository, or @acronym{CLDR}. +}. + +For example, characters with different code points but the same +meaning, like different grave accent Unicode characters, might, in +some locales, be considered as equal: @example @group @@ -756,7 +764,8 @@ The strings are compared by the numeric values of their characters. For instance, @var{str1} is considered less than @var{str2} if its first differing character has a smaller numeric value. If @var{ignore-case} is non-@code{nil}, characters are converted to -upper-case before comparing them. Unibyte strings are converted to +upper-case, using the current buffer's case-table (@pxref{Case +Tables}), before comparing them. Unibyte strings are converted to multibyte for comparison (@pxref{Text Representations}), so that a unibyte string and its conversion to multibyte are always regarded as equal. -- 2.39.5