From 6fa886202fdca940dd16e9f0b863347c4f565e8a Mon Sep 17 00:00:00 2001 From: Kenichi Handa Date: Fri, 1 Apr 2005 00:29:51 +0000 Subject: [PATCH] (Coding System Basics): Describe about rondtrip identity of coding systems. --- lispref/nonascii.texi | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi index 70e77e0a837..91a47ea50f9 100644 --- a/lispref/nonascii.texi +++ b/lispref/nonascii.texi @@ -628,6 +628,28 @@ characters; for example, there are three coding systems for the Cyrillic conversion, but some of them leave the choice unspecified---to be chosen heuristically for each file, based on the data. +In general, a coding system doesn't guarantee a roundtrip identity, +i.e. decoding followed by encoding in the same coding system can +result in the different byte sequence. But there are several coding +systems that go guarantee that the result will be the same as what you +originally decoded. They are: + +@quotation +chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule +greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3 +iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe +japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text +@end quotation + +Likewise, a coding systme doesn't guarantee the other way of roundtrip +identity, i.e. encoding buffer text into a coding system followed by +decoding again with the same coding system will produce the different +buffer text. For instance, when you encode Latin-2 characters by +@code{utf-8} and decode it back by the same coding system, you'll get +Unicode charactes (of charset @code{mule-unicode-0100-24ff}), and when +you encode Unicode characters by @code{iso-latin-2} and decode it back +by the same coding system, you'll get Latin-2 characters. + @cindex end of line conversion @dfn{End of line conversion} handles three different conventions used on various systems for representing end of line in files. The Unix -- 2.39.2