From aa73aecb9eb6207b46bf9c00daebfef4e9820292 Mon Sep 17 00:00:00 2001 From: Eli Zaretskii Date: Fri, 1 Nov 2024 16:39:39 +0200 Subject: [PATCH] Improve documentation of letter-case conversions * doc/lispref/nonascii.texi (Character Properties): * doc/lispref/strings.texi (Case Conversion, Case Tables): Document that special-casing rules override the case-table conversions. (Bug#74155) (cherry picked from commit f7b85fe986e74b649f8148ee407cf7a2327ff4a9) --- doc/lispref/nonascii.texi | 21 +++++++++++++-------- doc/lispref/strings.texi | 31 ++++++++++++++++++++++++++----- 2 files changed, 39 insertions(+), 13 deletions(-) diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index 145d55690c3..cb32c7671e7 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi @@ -632,8 +632,10 @@ is @code{nil}, which means the character itself. Corresponds to Unicode language- and context-independent special upper-casing rules. The value of this property is a string (which may be empty). For example mapping for U+00DF @sc{latin small letter sharp s} is -@code{"SS"}. For characters with no special mapping, the value is @code{nil} -which means @code{uppercase} property needs to be consulted instead. +@code{"SS"}. This mapping overrides the @code{uppercase} property, and +thus the current case table. For characters with no special mapping, +the value is @code{nil}, which means @code{uppercase} property needs to +be consulted instead. @item special-lowercase Corresponds to Unicode language- and context-independent special @@ -641,16 +643,19 @@ lower-casing rules. The value of this property is a string (which may be empty). For example mapping for U+0130 @sc{latin capital letter i with dot above} the value is @code{"i\u0307"} (i.e. 2-character string consisting of @sc{latin small letter i} followed by U+0307 -@sc{combining dot above}). For characters with no special mapping, -the value is @code{nil} which means @code{lowercase} property needs to -be consulted instead. +@sc{combining dot above}). This mapping overrides the @code{lowercase} +property, and thus the current case table. For characters with no +special mapping, the value is @code{nil}, which means @code{lowercase} +property needs to be consulted instead. @item special-titlecase Corresponds to Unicode unconditional special title-casing rules. The value of this property is a string (which may be empty). For example mapping for -U+FB01 @sc{latin small ligature fi} the value is @code{"Fi"}. For -characters with no special mapping, the value is @code{nil} which means -@code{titlecase} property needs to be consulted instead. +U+FB01 @sc{latin small ligature fi} the value is @code{"Fi"}. This +mapping overrides the @code{titlecase} property, and thus the current +case table. For characters with no special mapping, the value is +@code{nil}, which means @code{titlecase} property needs to be consulted +instead. @end table @defun get-char-code-property char propname diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index 71d85acb37c..c2b18387ad5 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -1591,9 +1591,12 @@ using @code{string} function, before being passed to one of the casing functions. Of course, no assumptions on the length of the result may be made. - Mapping for such special cases are taken from -@code{special-uppercase}, @code{special-lowercase} and -@code{special-titlecase} @xref{Character Properties}. + Other characters can also have special case-conversion rules. They +all have non-@code{nil} character properties @code{special-uppercase}, +@code{special-lowercase} or @code{special-titlecase} (@pxref{Character +Properties}) defined by the Unicode Standard. These properties define +special case-conversion rules which override the current case table +(@pxref{Case Tables}). @xref{Text Comparison}, for functions that compare strings; some of them ignore case differences, or can optionally ignore case differences. @@ -1634,14 +1637,32 @@ correspondence. There may be two different lower case letters with the same upper case equivalent. In these cases, you need to specify the maps for both lower case and upper case. - The extra table @var{canonicalize} maps each character to a canonical + Some characters have special case-conversion rules defined for them, +which by default override the current case table. These characters have +non-@code{nil} character properties @code{special-uppercase}, +@code{special-lowercase} or @code{special-titlecase} (@pxref{Character +Properties}) defined by the Unicode Standard. An example is U+00DF +LATIN SMALL LETTER SHARP S, @ss{}, which by default up-cases to the +string @code{"SS"}, not to U+1E9E LATIN CAPITAL LETTER SHARP S@. To +force these characters follow the case-table conversions, set the +corresponding Unicode property to @code{nil}: + +@example + (upcase "@ss{}") + => "SS" + (put-char-code-property ?@ss{} 'special-uppercase nil) + (upcase "@ss{}") + => "ẞ" +@end example + + The extra slot @var{canonicalize} of a case table maps each character to a canonical equivalent; any two characters that are related by case-conversion have the same canonical equivalent character. For example, since @samp{a} and @samp{A} are related by case-conversion, they should have the same canonical equivalent character (which should be either @samp{a} for both of them, or @samp{A} for both of them). - The extra table @var{equivalences} is a map that cyclically permutes + The extra slot @var{equivalences} is a map that cyclically permutes each equivalence class (of characters with the same canonical equivalent). (For ordinary @acronym{ASCII}, this would map @samp{a} into @samp{A} and @samp{A} into @samp{a}, and likewise for each set of -- 2.39.5