well as Cyrillic, Devanagari (for Hindi and Marathi), Ethiopic, Greek,
Han (for Chinese and Japanese), Hangul (for Korean), Hebrew, IPA,
Kannada, Lao, Malayalam, Tamil, Thai, Tibetan, and Vietnamese scripts.
-Emacs also supports various encodings of these characters used by
+Emacs also supports various encodings of these characters that are used by
other internationalized software, such as word processors and mailers.
Emacs allows editing text with international characters by supporting
@item
You can insert non-@acronym{ASCII} characters or search for them. To do that,
you can specify an input method (@pxref{Select Input Method}) suitable
-for your language, or use the default input method set up when you set
+for your language, or use the default input method set up when you chose
your language environment. If
your keyboard can produce non-@acronym{ASCII} characters, you can select an
appropriate keyboard coding system (@pxref{Terminal Coding}), and Emacs
will accept those characters. Latin-1 characters can also be input by
using the @kbd{C-x 8} prefix, see @ref{Unibyte Mode}.
-On the X Window System, your locale should be set to an appropriate
+With the X Window System, your locale should be set to an appropriate
value to make sure Emacs interprets keyboard input correctly; see
@ref{Language Environments, locales}.
@end itemize
@menu
* International Chars:: Basic concepts of multibyte characters.
-* Enabling Multibyte:: Controlling whether to use multibyte characters.
+* Disabling Multibyte:: Controlling whether to use multibyte characters.
* Language Environments:: Setting things up for the language you use.
* Input Methods:: Entering text characters not on your keyboard.
* Select Input Method:: Specifying your choice of input methods.
in a buffer whose coding system is @code{utf-8-unix}:
@smallexample
- character: @`A (192, #o300, #xc0)
-preferred charset: unicode (Unicode (ISO10646))
- code point: 0xC0
- syntax: w which means: word
- category: j:Japanese l:Latin v:Vietnamese
- buffer code: #xC3 #x80
- file code: not encodable by coding system undecided-unix
- display: by this font (glyph code)
+ position: 1 of 1 (0%), column: 0
+ character: @`A (displayed as @`A) (codepoint 192, #o300, #xc0)
+ preferred charset: unicode (Unicode (ISO10646))
+code point in charset: 0xC0
+ syntax: w which means: word
+ category: .:Base, L:Left-to-right (strong),
+ j:Japanese, l:Latin, v:Viet
+ buffer code: #xC3 #x80
+ file code: not encodable by coding system undecided-unix
+ display: by this font (glyph code)
xft:-unknown-DejaVu Sans Mono-normal-normal-
normal-*-13-*-*-*-m-0-iso10646-1 (#x82)
Character code properties: customize what to show
name: LATIN CAPITAL LETTER A WITH GRAVE
+ old-name: LATIN CAPITAL LETTER A GRAVE
general-category: Lu (Letter, Uppercase)
decomposition: (65 768) ('A' '`')
- old-name: LATIN CAPITAL LETTER A GRAVE
-
-There are text properties here:
- auto-composed t
@end smallexample
-@node Enabling Multibyte
-@section Enabling Multibyte Characters
+@c FIXME? Does this section even belong in the user manual?
+@c Seems more appropriate to the lispref?
+@node Disabling Multibyte
+@section Disabling Multibyte Characters
By default, Emacs starts in multibyte mode: it stores the contents
of buffers and strings using an internal encoding that represents
@samp{raw-text} doesn't disable format conversion, uncompression, or
auto mode selection.
+@c Not a single file in Emacs uses this feature. Is it really worth
+@c mentioning in the _user_ manual? Also, this duplicates somewhat
+@c "Loading Non-ASCII" from the lispref.
@cindex Lisp files, and multibyte operation
@cindex multibyte operation, and Lisp files
@cindex unibyte operation, and Lisp files
@cindex init file, and non-@acronym{ASCII} characters
Emacs normally loads Lisp files as multibyte.
This includes the Emacs initialization
-file, @file{.emacs}, and the initialization files of Emacs packages
+file, @file{.emacs}, and the initialization files of packages
such as Gnus. However, you can specify unibyte loading for a
-particular Lisp file, by putting @w{@samp{-*-unibyte: t;-*-}} in a
-comment on the first line (@pxref{File Variables}). Then that file is
-always loaded as unibyte text. The motivation for these conventions
-is that it is more reliable to always load any particular Lisp file in
-the same way. However, you can load a Lisp file as unibyte, on any
-one occasion, by typing @kbd{C-x @key{RET} c raw-text @key{RET}}
-immediately before loading it.
-
- The mode line indicates whether multibyte character support is
-enabled in the current buffer. If it is, there are two or more
-characters (most often two dashes) near the beginning of the mode
-line, before the indication of the visited file's end-of-line
-convention (colon, backslash, etc.). When multibyte characters
-are not enabled, nothing precedes the colon except a single dash.
-@xref{Mode Line}, for more details about this.
+particular Lisp file, by adding an entry @samp{unibyte: t} in a file
+local variables section (@pxref{File Variables}). Then that file is
+always loaded as unibyte text. Note that this does not represent a
+real @code{unibyte} variable, rather it just acts as an indicator
+to Emacs in the same way as @code{coding} does (@pxref{Specify Coding}).
+@ignore
+@c I don't see the point of this statement:
+The motivation for these conventions is that it is more reliable to
+always load any particular Lisp file in the same way.
+@end ignore
+Note also that this feature only applies to @emph{loading} Lisp files
+for evaluation, not to visiting them for editing. You can also load a
+Lisp file as unibyte, on any one occasion, by typing @kbd{C-x
+@key{RET} c raw-text @key{RET}} immediately before loading it.
+
+@c See http://debbugs.gnu.org/11226 for lack of unibyte tooltip.
+@vindex enable-multibyte-characters
+The buffer-local variable @code{enable-multibyte-characters} is
+non-@code{nil} in multibyte buffers, and @code{nil} in unibyte ones.
+The mode line also indicates whether a buffer is multibyte or not.
+@xref{Mode Line}. With a graphical display, in a multibyte buffer,
+the portion of the mode line that indicates the character set has a
+tooltip that (amongst other things) says that the buffer is multibyte.
+In a unibyte buffer, the character set indicator is absent. Thus, in
+a unibyte buffer (when using a graphical display) there is normally
+nothing before the indication of the visited file's end-of-line
+convention (colon, backslash, etc.), unless you are using an input
+method.
@findex toggle-enable-multibyte-characters
-You can turn on multibyte support in a specific buffer by invoking the
+You can turn off multibyte support in a specific buffer by invoking the
command @code{toggle-enable-multibyte-characters} in that buffer.
@node Language Environments
set-language-environment} and specify a suitable language environment
such as @samp{Latin-@var{n}}.
- For more information about unibyte operation, see @ref{Enabling
+ For more information about unibyte operation, see @ref{Disabling
Multibyte}. Note particularly that you probably want to ensure that
your initialization files are read as unibyte if they contain
non-@acronym{ASCII} characters.
library is loaded, the @key{ALT} modifier key, if the keyboard has
one, serves the same purpose as @kbd{C-x 8}: use @key{ALT} together
with an accent character to modify the following letter. In addition,
-if the keyboard has keys for the Latin-1 ``dead accent characters,''
+if the keyboard has keys for the Latin-1 ``dead accent characters'',
they too are defined to compose with the following character, once
@code{iso-transl} is loaded.