@code{nil} otherwise.
@example
+@group
(characterp 65)
@result{} t
+@end group
+@group
(characterp 4194303)
@result{} t
+@end group
+@group
(characterp 4194304)
@result{} nil
+@end group
+@end example
+@end defun
+
+@cindex maximum value of character codepoint
+@cindex codepoint, largest value
+@defun max-char
+This function returns the largest value that a valid character
+codepoint can have.
+
+@example
+@group
+(characterp (max-char))
+ @result{} t
+@end group
+@group
+(characterp (1+ (max-char)))
+ @result{} nil
+@end group
@end example
@end defun
@subsection Basic Concepts of Coding Systems
@cindex character code conversion
- @dfn{Character code conversion} involves conversion between the encoding
-used inside Emacs and some other encoding. Emacs supports many
-different encodings, in that it can convert to and from them. For
-example, it can convert text to or from encodings such as Latin 1, Latin
-2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some
-cases, Emacs supports several alternative encodings for the same
-characters; for example, there are three coding systems for the Cyrillic
-(Russian) alphabet: ISO, Alternativnyj, and KOI8.
-
+ @dfn{Character code conversion} involves conversion between the
+internal representation of characters used inside Emacs and some other
+encoding. Emacs supports many different encodings, in that it can
+convert to and from them. For example, it can convert text to or from
+encodings such as Latin 1, Latin 2, Latin 3, Latin 4, Latin 5, and
+several variants of ISO 2022. In some cases, Emacs supports several
+alternative encodings for the same characters; for example, there are
+three coding systems for the Cyrillic (Russian) alphabet: ISO,
+Alternativnyj, and KOI8.
+
+@c I think this paragraph is no longer correct.
+@ignore
Most coding systems specify a particular character code for
conversion, but some of them leave the choice unspecified---to be chosen
heuristically for each file, based on the data.
+@end ignore
In general, a coding system doesn't guarantee roundtrip identity:
decoding a byte sequence using coding system, then encoding the
resulting text in the same coding system, can produce a different byte
-sequence. However, the following coding systems do guarantee that the
-byte sequence will be the same as what you originally decoded:
+sequence. But some coding systems do guarantee that the byte sequence
+will be the same as what you originally decoded. Here are a few
+examples:
@quotation
-chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
-greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3
-iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe
-japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
+iso-8859-1, utf-8, big5, shift_jis, euc-jp
@end quotation
Encoding buffer text and then decoding the result can also fail to
-reproduce the original text. For instance, if you encode Latin-2
-characters with @code{utf-8} and decode the result using the same
-coding system, you'll get Unicode characters (of charset
-@code{mule-unicode-0100-24ff}). If you encode Unicode characters with
-@code{iso-latin-2} and decode the result with the same coding system,
-you'll get Latin-2 characters.
+reproduce the original text. For instance, if you encode a character
+with a coding system which does not support that character, the result
+is unpredictable, and thus decoding it using the same coding system
+may produce a different text. Currently, Emacs can't report errors
+that result from encoding unsupported characters.
@cindex EOL conversion
@cindex end-of-line conversion
@cindex line end conversion
- @dfn{End of line conversion} handles three different conventions used
-on various systems for representing end of line in files. The Unix
-convention is to use the linefeed character (also called newline). The
-DOS convention is to use a carriage-return and a linefeed at the end of
-a line. The Mac convention is to use just carriage-return.
+ @dfn{End of line conversion} handles three different conventions
+used on various systems for representing end of line in files. The
+Unix convention, used on GNU and Unix systems, is to use the linefeed
+character (also called newline). The DOS convention, used on
+MS-Windows and MS-DOS systems, is to use a carriage-return and a
+linefeed at the end of a line. The Mac convention is to use just
+carriage-return.
@cindex base coding system
@cindex variant coding system
conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:
it specifies no conversion of either character codes or end-of-line.
- The coding system @code{emacs-mule} specifies that the data is
+@vindex emacs-internal@r{ coding system}
+ The coding system @code{emacs-internal} specifies that the data is
represented in the internal Emacs encoding. This is like
@code{raw-text} in that no code conversion happens, but different in
that the result is multibyte data.
@defun coding-system-get coding-system property
This function returns the specified property of the coding system
@var{coding-system}. Most coding system properties exist for internal
-purposes, but one that you might find useful is @code{mime-charset}.
+purposes, but one that you might find useful is @code{:mime-charset}.
That property's value is the name used in MIME for the character coding
which this coding system can read and write. Examples:
@example
-(coding-system-get 'iso-latin-1 'mime-charset)
+(coding-system-get 'iso-latin-1 :mime-charset)
@result{} iso-8859-1
-(coding-system-get 'iso-2022-cn 'mime-charset)
+(coding-system-get 'iso-2022-cn :mime-charset)
@result{} iso-2022-cn
-(coding-system-get 'cyrillic-koi8 'mime-charset)
+(coding-system-get 'cyrillic-koi8 :mime-charset)
@result{} koi8-r
@end example
-The value of the @code{mime-charset} property is also defined
+The value of the @code{:mime-charset} property is also defined
as an alias for the coding system.
@end defun
@end defun
@defun check-coding-system coding-system
-This function checks the validity of @var{coding-system}.
-If that is valid, it returns @var{coding-system}.
-Otherwise it signals an error with condition @code{coding-system-error}.
+This function checks the validity of @var{coding-system}. If that is
+valid, it returns @var{coding-system}. If @var{coding-system} is
+@code{nil}, the function return @code{nil}. For any other values, it
+signals an error whose @code{error-symbol} is @code{coding-system-error}
+(@pxref{Signaling Errors, signal}).
@end defun
@defun coding-system-eol-type coding-system
@defun detect-coding-region start end &optional highest
This function chooses a plausible coding system for decoding the text
-from @var{start} to @var{end}. This text should be a byte sequence
-(@pxref{Explicit Encoding}).
+from @var{start} to @var{end}. This text should be a byte sequence,
+i.e.@: unibyte text or multibyte text with only @acronym{ASCII} and
+eight-bit characters (@pxref{Explicit Encoding}).
Normally this function returns a list of coding systems that could
handle decoding the text that was scanned. They are listed in order of
The result of encoding, and the input to decoding, are not ordinary
text. They logically consist of a series of byte values; that is, a
-series of characters whose codes are in the range 0 through 255. In a
-multibyte buffer or string, character codes 128 through 159 are
-represented by multibyte sequences, but this is invisible to Lisp
-programs.
+series of @acronym{ASCII} and eight-bit characters. In unibyte
+buffers and strings, these characters have codes in the range 0
+through 255. In a multibyte buffer or string, eight-bit characters
+have character codes higher than 255 (@pxref{Text Representations}),
+but Emacs transparently converts them to their single-byte values when
+you encode or decode such text.
The usual way to read a file into a buffer as a sequence of bytes, so
you can decode the contents explicitly, is with
Here are the functions to perform explicit encoding or decoding. The
encoding functions produce sequences of bytes; the decoding functions
are meant to operate on sequences of bytes. All of these functions
-discard text properties.
+discard text properties. They also set @code{last-coding-system-used}
+to the precise coding system they used.
-@deffn Command encode-coding-region start end coding-system
+@deffn Command encode-coding-region start end coding-system &optional destination
This command encodes the text from @var{start} to @var{end} according
-to coding system @var{coding-system}. The encoded text replaces the
-original text in the buffer. The result of encoding is logically a
-sequence of bytes, but the buffer remains multibyte if it was multibyte
-before.
-
-This command returns the length of the encoded text.
+to coding system @var{coding-system}. Normally, the encoded text
+replaces the original text in the buffer, but the optional argument
+@var{destination} can change that. If @var{destination} is a buffer,
+the encoded text is inserted in that buffer after point (point does
+not move); if it is @code{t}, the command returns the encoded text as
+a unibyte string without inserting it.
+
+If encoded text is inserted in some buffer, this command returns the
+length of the encoded text.
+
+The result of encoding is logically a sequence of bytes, but the
+buffer remains multibyte if it was multibyte before, and any 8-bit
+bytes are converted to their multibyte representation (@pxref{Text
+Representations}).
@end deffn
-@defun encode-coding-string string coding-system &optional nocopy
+@defun encode-coding-string string coding-system &optional nocopy buffer
This function encodes the text in @var{string} according to coding
system @var{coding-system}. It returns a new string containing the
encoded text, except when @var{nocopy} is non-@code{nil}, in which
operation is trivial. The result of encoding is a unibyte string.
@end defun
-@deffn Command decode-coding-region start end coding-system
+@deffn Command decode-coding-region start end coding-system destination
This command decodes the text from @var{start} to @var{end} according
-to coding system @var{coding-system}. The decoded text replaces the
-original text in the buffer. To make explicit decoding useful, the text
-before decoding ought to be a sequence of byte values, but both
-multibyte and unibyte buffers are acceptable.
-
-This command returns the length of the decoded text.
+to coding system @var{coding-system}. To make explicit decoding
+useful, the text before decoding ought to be a sequence of byte
+values, but both multibyte and unibyte buffers are acceptable (in the
+multibyte case, the raw byte values should be represented as eight-bit
+characters). Normally, the decoded text replaces the original text in
+the buffer, but the optional argument @var{destination} can change
+that. If @var{destination} is a buffer, the decoded text is inserted
+in that buffer after point (point does not move); if it is @code{t},
+the command returns the decoded text as a multibyte string without
+inserting it.
+
+If decoded text is inserted in some buffer, this command returns the
+length of the decoded text.
@end deffn
-@defun decode-coding-string string coding-system &optional nocopy
-This function decodes the text in @var{string} according to coding
-system @var{coding-system}. It returns a new string containing the
-decoded text, except when @var{nocopy} is non-@code{nil}, in which
-case the function may return @var{string} itself if the decoding
-operation is trivial. To make explicit decoding useful, the contents
-of @var{string} ought to be a sequence of byte values, but a multibyte
-string is acceptable.
+@defun decode-coding-string string coding-system &optional nocopy buffer
+This function decodes the text in @var{string} according to
+@var{coding-system}. It returns a new string containing the decoded
+text, except when @var{nocopy} is non-@code{nil}, in which case the
+function may return @var{string} itself if the decoding operation is
+trivial. To make explicit decoding useful, the contents of
+@var{string} ought to be a unibyte string with a sequence of byte
+values, but a multibyte string is also acceptable (assuming it
+contains 8-bit bytes in their multibyte form).
+
+If optional argument @var{buffer} specifies a buffer, the decoded text
+is inserted in that buffer after point (point does not move). In this
+case, the return value is the length of the decoded text.
@end defun
@defun decode-coding-inserted-region from to filename &optional visit beg end replace
@subsection Terminal I/O Encoding
Emacs can decode keyboard input using a coding system, and encode
-terminal output. This is useful for terminals that transmit or display
-text using a particular encoding such as Latin-1. Emacs does not set
-@code{last-coding-system-used} for encoding or decoding for the
-terminal.
+terminal output. This is useful for terminals that transmit or
+display text using a particular encoding such as Latin-1. Emacs does
+not set @code{last-coding-system-used} for encoding or decoding of
+terminal I/O.
@defun keyboard-coding-system
This function returns the coding system that is in use for decoding