From: Eli Zaretskii Date: Thu, 18 Aug 2011 10:53:55 +0000 (+0300) Subject: Improve documentation of bidi in ELisp manual. X-Git-Tag: emacs-pretest-24.0.90~104^2~124^2~17 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=c094bb0cf7eee9defdd76b8432dcbc24a7c6856d;p=emacs.git Improve documentation of bidi in ELisp manual. doc/lispref/nonascii.texi (Character Properties): Document use of `bidi-class' and `mirroring' properties as part of reordering. Provide cross-references to "Bidirectional Display". doc/lispref/display.texi (Bidirectional Display): Document the pitfalls of concatenating strings with bidirectional content, with possible solutions. Document string-mark-left-to-right. Mention paragraph direction in modes that inherit from prog-mode. Document use of `bidi-class' and `mirroring' properties as part of reordering. etc/NEWS: Mark string-mark-left-to-right as documented. --- diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 56175a34eee..03a20ba5830 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog @@ -1,3 +1,15 @@ +2011-08-18 Eli Zaretskii + + * nonascii.texi (Character Properties): Document use of + `bidi-class' and `mirroring' properties as part of reordering. + Provide cross-references to "Bidirectional Display". + + * display.texi (Bidirectional Display): Document the pitfalls of + concatenating strings with bidirectional content, with possible + solutions. Document string-mark-left-to-right. Mention paragraph + direction in modes that inherit from prog-mode. Document use of + `bidi-class' and `mirroring' properties as part of reordering. + 2011-08-16 Eli Zaretskii * modes.texi (Major Mode Conventions): Improve the documentation diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi index 64a9054f596..7e7851452d8 100644 --- a/doc/lispref/display.texi +++ b/doc/lispref/display.texi @@ -5992,6 +5992,7 @@ left-to-right and right-to-left characters. for editing and displaying bidirectional text. @cindex logical order +@cindex reading order @cindex visual order @cindex unicode bidirectional algorithm Emacs stores right-to-left and bidirectional text in the so-called @@ -6006,17 +6007,16 @@ for display. Reordering of bidirectional text for display in Emacs is a ``Full bidirectionality'' class implementation of the @acronym{UBA}. @defvar bidi-display-reordering - The buffer-local variable @code{bidi-display-reordering} controls -whether text in the buffer is reordered for display. If its value is -non-@code{nil}, Emacs reorders characters that have right-to-left -directionality when they are displayed. The default value is -@code{t}. Text in overlay strings (@pxref{Overlay -Properties,,before-string}), display strings (@pxref{Overlay -Properties,,display}), and @code{display} text properties -(@pxref{Display Property}) is also reordered if the buffer whose text -includes these strings is reordered for display. Turning off -@code{bidi-display-reordering} for a buffer turns off reordering of -all the overlay and display strings in that buffer. + This buffer-local variable controls whether text in the buffer is +reordered for display. If its value is non-@code{nil}, Emacs reorders +characters that have right-to-left directionality when they are +displayed. The default value is @code{t}. Text in overlay strings +(@pxref{Overlay Properties,,before-string}), display strings +(@pxref{Overlay Properties,,display}), and @code{display} text +properties (@pxref{Display Property}) is also reordered for display if +the buffer whose text includes these strings is reordered. Turning +off @code{bidi-display-reordering} for a buffer turns off reordering +of all the overlay and display strings in that buffer. Reordering of strings that are unrelated to any buffer, such as text displayed on the mode line (@pxref{Mode Line Format}) or header line @@ -6056,7 +6056,7 @@ it is reordered for display. That is, the entire chunk of text covered by these properties is reordered together. Moreover, the bidirectional properties of the characters in this chunk of text are ignored, and Emacs reorders them as if they were replaced with a -single character @code{u+FFFC}, known as the @dfn{Object Replacement +single character @code{U+FFFC}, known as the @dfn{Object Replacement Character}. This means that placing a display property over a portion of text may change the way that the surrounding text is reordered for display. To prevent this unexpected effect, always place such @@ -6073,9 +6073,9 @@ begins at the right margin and is continued or truncated at the left margin. @defvar bidi-paragraph-direction - Emacs determines the base direction of each paragraph dynamically, -based on the text at the beginning of the paragraph. The precise -method of determining the base direction is specified by the + By default, Emacs determines the base direction of each paragraph +dynamically, based on the text at the beginning of the paragraph. The +precise method of determining the base direction is specified by the @acronym{UBA}; in a nutshell, the first character in a paragraph that has an explicit directionality determines the base direction of the paragraph. However, sometimes a buffer may need to force a certain @@ -6087,6 +6087,13 @@ dynamic determination of the base direction, and instead forces all paragraphs in the buffer to have the direction specified by its buffer-local value. The value can be either @code{right-to-left} or @code{left-to-right}. Any other value is interpreted as @code{nil}. +The default is @code{nil}. + +@cindex @code{prog-mode}, and @code{bidi-paragraph-direction} +Modes that are meant to display program source code should force a +@code{left-to-right} paragraph direction. The easiest way of doing so +is to derive the mode from Prog Mode, which already sets +@code{bidi-paragraph-direction} to that value. @end defvar @defun current-bidi-paragraph-direction &optional buffer @@ -6099,3 +6106,70 @@ non-@code{nil}, the returned value will be identical to that value; otherwise, the returned value reflects the paragraph direction determined dynamically by Emacs. @end defun + +@cindex layout on display, and bidirectional text +@cindex jumbled display of bidirectional text +@cindex concatenating bidirectional strings + Reordering of bidirectional text for display can have surprising and +unpleasant effects when two strings with bidirectional content are +juxtaposed in a buffer, or otherwise programmatically concatenated +into a string of text. A typical example is a buffer whose lines are +actually sequences of items, or fields, separated by whitespace or +punctuation characters. This is used in specialized modes such as +Buffer-menu Mode or various email summary modes, like Rmail Summary +Mode. Because these separator characters are @dfn{weak}, i.e.@: have +no strong directionality, they take on the directionality of +surrounding text. As result, a numeric field that follows a field +with bidirectional content can be displayed @emph{to the left} of the +preceding field, producing a jumbled display and messing up the +expected layout. + + To countermand this, you can use one of the following techniques for +forcing correct order of fields on display: + +@itemize @minus +@item +Append the special character @code{U+200E}, LEFT-TO-RIGHT MARK, or +@acronym{LRM}, to the end of each field that may have bidirectional +content, or prepend it to the beginning of the following field. The +function @code{string-mark-left-to-right}, described below, comes in +handy for this purpose. (In a right-to-left paragraph, use +@code{U+200F}, RIGHT-TO-LEFT MARK, or @acronym{RLM}, instead.) This +is one of the solutions recommended by +@uref{http://www.unicode.org/reports/tr9/#Separators, the +@acronym{UBA}}. + +@item +Include the tab character in the field separator. The tab character +plays the role of @dfn{segment separator} in the @acronym{UBA} +reordering, whose effect is to make each field a separate segment, and +thus reorder them separately. +@end itemize + +@defun string-mark-left-to-right string +This subroutine returns its argument @var{string}, possibly modified, +such that the result can be safely concatenated with another string, +or juxtaposed with another string in a buffer, without disrupting the +relative layout of this string and the next one on display. If the +string returned by this function is displayed as part of a +left-to-right paragraph, it will always appear on display to the left +of the text that follows it. The function works by examining the +characters of its argument, and if any of those characters could cause +reordering on display, the function appends the @acronym{LRM} +character to the string. The appended @acronym{LRM} character is made +@emph{invisible} (@pxref{Invisible Text}), to hide it on display. +@end defun + + The reordering algorithm uses the bidirectional properties of the +characters stored as their @code{bidi-class} property +(@pxref{Character Properties}). Lisp programs can change these +properties by calling the @code{put-char-code-property} function. +However, doing this requires a thorough understanding of the +@acronym{UBA}, and is therefore not recommended. Any changes to the +bidirectional properties of a character have global effect: they +affect all Emacs frames and windows. + + Similarly, the @code{mirroring} property is used to display the +appropriate mirrored character in the reordered text. Lisp programs +can affect the mirrored display by changing this property. Again, any +such changes affect all of Emacs display. diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index 83f9f424834..7b6d665b2ac 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi @@ -392,7 +392,8 @@ The value is an integer number. @item bidi-class Corresponds to the Unicode @code{Bidi_Class} property. The value is a symbol whose name is the Unicode @dfn{directional type} of the -character. +character. Emacs uses this property when it reorders bidirectional +text for display (@pxref{Bidirectional Display}). @item decomposition Corresponds to the Unicode @code{Decomposition_Type} and @@ -440,7 +441,9 @@ defined mirroring glyph. All the characters whose @code{mirrored} property is @code{N} have @code{nil} as their @code{mirroring} property; however, some characters whose @code{mirrored} property is @code{Y} also have @code{nil} for @code{mirroring}, because no -appropriate characters exist with mirrored glyphs. +appropriate characters exist with mirrored glyphs. Emacs uses this +property to display mirror images of characters when appropriate +(@pxref{Bidirectional Display}). @item old-name Corresponds to the Unicode @code{Unicode_1_Name} property. The value diff --git a/etc/NEWS b/etc/NEWS index 8707a8b0adc..ed472a5668a 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1043,6 +1043,7 @@ of function value which looks like (closure ENV ARGS &rest BODY). *** New function `special-variable-p' to check whether a variable is declared as dynamically bound. ++++ ** New function `string-mark-left-to-right'. Given a string containing right-to-left (RTL) script, this function returns another string with a terminating LRM (left-to-right mark)