Improve documentation of letter-case conversions

author Eli Zaretskii <eliz@gnu.org>

Fri, 1 Nov 2024 14:39:39 +0000 (16:39 +0200)

committer Eshel Yaron <me@eshelyaron.com>

Tue, 5 Nov 2024 11:16:32 +0000 (12:16 +0100)
author Eli Zaretskii <eliz@gnu.org>
Fri, 1 Nov 2024 14:39:39 +0000 (16:39 +0200)
committer Eshel Yaron <me@eshelyaron.com>
Tue, 5 Nov 2024 11:16:32 +0000 (12:16 +0100)
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi

index 145d55690c3daec8b2aa689893880852ce74b10b..cb32c7671e750422679c6e6ca1ca69eb365335bd 100644 (file)
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -632,8 +632,10 @@ is @code{nil}, which means the character itself.
  Corresponds to Unicode language- and context-independent special upper-casing
  rules.  The value of this property is a string (which may be empty).  For
  example mapping for U+00DF @sc{latin small letter sharp s} is
-@code{"SS"}.  For characters with no special mapping, the value is @code{nil}
-which means @code{uppercase} property needs to be consulted instead.
+@code{"SS"}.  This mapping overrides the @code{uppercase} property, and
+thus the current case table.  For characters with no special mapping,
+the value is @code{nil}, which means @code{uppercase} property needs to
+be consulted instead.
  
  @item special-lowercase
  Corresponds to Unicode language- and context-independent special
@@ -641,16 +643,19 @@ lower-casing rules.  The value of this property is a string (which may
  be empty).  For example mapping for U+0130 @sc{latin capital letter i
  with dot above} the value is @code{"i\u0307"} (i.e. 2-character string
  consisting of @sc{latin small letter i} followed by U+0307
-@sc{combining dot above}).  For characters with no special mapping,
-the value is @code{nil} which means @code{lowercase} property needs to
-be consulted instead.
+@sc{combining dot above}).  This mapping overrides the @code{lowercase}
+property, and thus the current case table.  For characters with no
+special mapping, the value is @code{nil}, which means @code{lowercase}
+property needs to be consulted instead.
  
  @item special-titlecase
  Corresponds to Unicode unconditional special title-casing rules.  The value of
  this property is a string (which may be empty).  For example mapping for
-U+FB01 @sc{latin small ligature fi} the value is @code{"Fi"}.  For
-characters with no special mapping, the value is @code{nil} which means
-@code{titlecase} property needs to be consulted instead.
+U+FB01 @sc{latin small ligature fi} the value is @code{"Fi"}.  This
+mapping overrides the @code{titlecase} property, and thus the current
+case table.  For characters with no special mapping, the value is
+@code{nil}, which means @code{titlecase} property needs to be consulted
+instead.
  @end table
  
  @defun get-char-code-property char propname
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi

index 71d85acb37c02f704995b35d59c485e60828c747..c2b18387ad5e20b9667bb30200ade541f6760c16 100644 (file)
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -1591,9 +1591,12 @@ using @code{string} function, before being passed to one of the casing
  functions.  Of course, no assumptions on the length of the result may
  be made.
  
-  Mapping for such special cases are taken from
-@code{special-uppercase}, @code{special-lowercase} and
-@code{special-titlecase} @xref{Character Properties}.
+  Other characters can also have special case-conversion rules.  They
+all have non-@code{nil} character properties @code{special-uppercase},
+@code{special-lowercase} or @code{special-titlecase} (@pxref{Character
+Properties}) defined by the Unicode Standard.  These properties define
+special case-conversion rules which override the current case table
+(@pxref{Case Tables}).
  
    @xref{Text Comparison}, for functions that compare strings; some of
  them ignore case differences, or can optionally ignore case differences.
@@ -1634,14 +1637,32 @@ correspondence.  There may be two different lower case letters with the
  same upper case equivalent.  In these cases, you need to specify the
  maps for both lower case and upper case.
  
-  The extra table @var{canonicalize} maps each character to a canonical
+  Some characters have special case-conversion rules defined for them,
+which by default override the current case table.  These characters have
+non-@code{nil} character properties @code{special-uppercase},
+@code{special-lowercase} or @code{special-titlecase} (@pxref{Character
+Properties}) defined by the Unicode Standard.  An example is U+00DF
+LATIN SMALL LETTER SHARP S, @ss{}, which by default up-cases to the
+string @code{"SS"}, not to U+1E9E LATIN CAPITAL LETTER SHARP S@.  To
+force these characters follow the case-table conversions, set the
+corresponding Unicode property to @code{nil}:
+
+@example
+ (upcase "@ss{}")
+  => "SS"
+ (put-char-code-property ?@ss{} 'special-uppercase nil)
+ (upcase "@ss{}")
+  => "ẞ"
+@end example
+
+  The extra slot @var{canonicalize} of a case table maps each character to a canonical
  equivalent; any two characters that are related by case-conversion have
  the same canonical equivalent character.  For example, since @samp{a}
  and @samp{A} are related by case-conversion, they should have the same
  canonical equivalent character (which should be either @samp{a} for both
  of them, or @samp{A} for both of them).
  
-  The extra table @var{equivalences} is a map that cyclically permutes
+  The extra slot @var{equivalences} is a map that cyclically permutes
  each equivalence class (of characters with the same canonical
  equivalent).  (For ordinary @acronym{ASCII}, this would map @samp{a} into
  @samp{A} and @samp{A} into @samp{a}, and likewise for each set of
author	Eli Zaretskii <eliz@gnu.org>
	Fri, 1 Nov 2024 14:39:39 +0000 (16:39 +0200)
committer	Eshel Yaron <me@eshelyaron.com>
	Tue, 5 Nov 2024 11:16:32 +0000 (12:16 +0100)
doc/lispref/nonascii.texi		patch \| blob \| history
doc/lispref/strings.texi		patch \| blob \| history