From: Richard M. Stallman Date: Sat, 28 Feb 1998 01:49:58 +0000 (+0000) Subject: Initial revision X-Git-Tag: emacs-20.3~2068 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=cc6d0d2c9435d5d065121468b3655f4941403685;p=emacs.git Initial revision --- diff --git a/lispref/customize.texi b/lispref/customize.texi new file mode 100644 index 00000000000..738734fe7c4 --- /dev/null +++ b/lispref/customize.texi @@ -0,0 +1,765 @@ +@c -*-texinfo-*- +@c This is part of the GNU Emacs Lisp Reference Manual. +@c Copyright (C) 1997, 1998 Free Software Foundation, Inc. +@c See the file elisp.texi for copying conditions. +@setfilename ../info/customize +@node Customization, Loading, Macros, Top +@chapter Writing Customization Definitions + +This chapter describes how to declare customization groups, variables, +and faces. We use the term @dfn{customization item} to include all +three of those. This has few examples, but please look at the file +@file{cus-edit.el}, which contains many declarations you can learn from. + +@menu +* Common Keywords:: +* Group Definitions:: +* Variable Definitions:: +* Face Definitions:: +* Customization Types:: +@end menu + +@node Common Keywords +@section Common Keywords for All Kinds of Items + +All three kinds of customization declarations (for groups, variables, +and faces) accept keyword arguments for specifying various information. +This section describes some keywords that apply to all three. + +All of these keywords, except @code{:tag}, can be used more than once in +a given item. Each use of the keyword has an independent effect. The +keyword @code{:tag} is an exception because any given item can only +display one name item. + +@table @code +@item :group @var{group} +Put this customization item in group @var{group}. When you use +@code{:group} in a @code{defgroup}, it makes the new group a subgroup of +@var{group}. + +If you use this keyword more than once, you can put a single item into +more than one group. Displaying any of those groups will show this +item. Be careful not to overdo this! + +@item :link @var{link-data} +Include an external link after the documentation string for this item. +This is a sentence containing an active field which references some +other documentation. + +There are three alternatives you can use for @var{link-data}: + +@table @code +@item (custom-manual @var{info-node}) +Link to an Info node; @var{info-node} is a string which specifies the +node name, as in @code{"(emacs)Top"}. The link appears as +@samp{[manual]} in the customization buffer. + +@item (info-link @var{info-node}) +Like @code{custom-manual} except that the link appears +in the customization buffer with the Info node name. + +@item (url-link @var{url}) +Link to a web page; @var{url} is a string which specifies the URL. The +link appears in the customization buffer as @var{url}. +@end table + +You can specify the text to use in the customization buffer by adding +@code{:tag @var{name}} after the first element of the @var{link-data}; +for example, @code{(info-link :tag "foo" "(emacs)Top")} makes a link to +the Emacs manual which appears in the buffer as @samp{foo}. + +An item can have more than one external link; however, most items have +none at all. + +@item :load @var{file} +Load file @var{file} (a string) before displaying this customization +item. Loading is done with @code{load-library}, and only if the file is +not already loaded. + +@item :require @var{feature} +Require feature @var{feature} (a symbol) when installing a value for +this item (an option or a face) that was saved using the customization +feature. This is done by calling @code{require}. + +The most common reason to use @code{:require} is when a variable enables +a feature such as a minor mode, and just setting the variable won't have +any effect unless the code which implements the mode is loaded. + +@item :tag @var{name} +Use @var{name}, a string, instead of the item's name, to label the item +in customization menus and buffers. +@end table + +@node Group Definitions +@section Defining Custom Groups + +Each Emacs Lisp package should have one main customization group which +contains all the options, faces and other groups in the package. If the +package has a small number of options and faces, use just one group and +put everything in it. When there are more than twelve or so options and +faces, then you should structure them into subgroups, and put the +subgroups under the package's main customization group. It is ok to +have some of the options and faces in the package's main group alongside +the subgroups. + +The package's main or only group should be a member of one or more of +the standard customization groups. Type press @kbd{C-h p} to display a +list of finder keywords; them choose some of them add your group to each +of them, using the @code{:group} keyword. + +The way to declare new customization groups is with @code{defgroup}. + +@tindex defgroup +@defmac defgroup group members doc [keyword value]... +Declare @var{group} as a customization group containing @var{members}. +Do not quote the symbol @var{group}. The argument @var{doc} specifies +the documentation string for the group. + +The arguments @var{members} can be an alist whose elements specify +members of the group; however, normally @var{members} is @code{nil}, and +you specify the group's members by using the @code{:group} keyword when +defining those members. + +@ignore +@code{(@var{name} @var{widget})}. Here @var{name} is a symbol, and +@var{widget} is a widget for editing that symbol. Useful widgets are +@code{custom-variable} for editing variables, @code{custom-face} for +editing faces, and @code{custom-group} for editing groups. +@end ignore + +In addition to the common keywords (@pxref{Common Keywords}), you can +use this keyword in @code{defgroup}: + +@table @code +@item :prefix @var{prefix} +If the name of an item in the group starts with @var{prefix}, then the +tag for that item is constructed (by default) by omitting @var{prefix}. + +One group can have any number of prefixes. +@end table +@end defmac + +The @code{:prefix} feature is currently turned off, which means that +@code{:prefix} currently has no effect. We did this because we found +that discarding the specified prefixes often led to confusing names for +options. This happened because the people who wrote the @code{defgroup} +definitions for various groups added @code{:prefix} keywords whenever +they make logical sense---that is, whenever they say that there was a +common prefix for the option names in a library. + +In order to obtain good results with @code{:prefix}, it is necessary to +check the specific effects of discarding a particular prefix, given the +specific items in a group and their names and documentation. If the +resulting text is not clear, then @code{:prefix} should not be used in +that case. + +It should be possible to recheck all the customization groups, delete +the @code{:prefix} specifications which give unclear results, and then +turn this feature back on, if someone would like to do the work. + +@node Variable Definitions +@section Defining Customization Variables + + Use @code{defcustom} to declare user editable variables. + +@tindex defcustom +@defmac defcustom option value doc [keyword value]... +Declare @var{option} as a customizable user option variable that +defaults to @var{value}. Do not quote @var{option}. @var{value} should +be an expression to compute the value; it will be be evaluated on more +than one occasion. + +If @var{option} is void, @code{defcustom} initializes it to @var{value}. + +The argument @var{doc} specifies the documentation string for the variable. + +The following additional keywords are defined: + +@table @code +@item :type @var{type} +Use @var{type} as the data type for this option. It specifies which +values are legitimate, and how to display the value. +@xref{Customization Types}, for more information. + +@item :options @var{list} +Specify @var{list} as the list of reasonable values for use in this +option. + +Currently this is meaningful only when type is @code{hook}. The +elements of @var{list} are functions that you might likely want to use +as elements of the hook value. The user is not actually restricted to +using only these functions, but they are offered as convenient +alternatives. + +@item :version @var{version} +This option specifies that the variable's default value was changed in +Emacs version @var{version}. For example, + +@example +(defcustom foo-max 34 + "*Maximum number of foo's allowed." + :type 'integer + :group 'foo + :version "20.3") +@end example + +@item :set @var{setfunction} +Specify @var{setfunction} as the way to change the value of this option. +The function @var{setfunction} should take two arguments, a symbol and +the new value, and should do whatever is necessary to update the value +properly for this option (which may not mean simply setting the option +as a Lisp variable). The default for @var{setfunction} is +@code{set-default}. + +@item :get @var{getfunction} +Specify @var{getfunction} as the way to extract the value of this +option. The function @var{getfunction} should take one argument, a +symbol, and should return the ``current value'' for that symbol (which +need not be the symbol's Lisp value). The default is +@code{default-value}. + +@item :initialize @var{function} +@var{function} should be a function used to initialize the variable when +the @code{defcustom} is evaluated. It should take two arguments, the +symbol and value. Here are some predefined functions meant for use in +this way: + +@table @code +@item custom-initialize-set +Use the variable's @code{:set} function to initialize the variable. Do +not reinitialize it if it is already non-void. This is the default +@code{:initialize} function. + +@item custom-initialize-default +Always use @code{set-default} to initialize the variable, even if some +other @code{:set} function has been specified. + +@item custom-initialize-reset +Even if the variable is already non-void, reset it by calling the +@code{:set} function using the current value (returned by the +@code{:get} method). + +@item custom-initialize-changed +Like @code{custom-initialize-reset}, except use @code{set-default} +(rather than the @code{:set} function) to initialize the variable if it +is not bound and has not been set already. +@end table + +@item :require @var{feature} +If the user saves a customized value for this item, them Emacs should do +@code{(require @var{feature})} after installing the saved value. + +The place to use this feature is for an option that turns on the +operation of a certain feature. Assuming that the package is coded to +check the value of the option, you still need to arrange for the package +to be loaded. That is what @code{:require} is for. +@end table +@end defmac + +@ignore +Use @code{custom-add-option} to specify that a specific function is +useful as an member of a hook. + +@defun custom-add-option symbol option +To the variable @var{symbol} add @var{option}. + +If @var{symbol} is a hook variable, @var{option} should be a hook +member. For other types variables, the effect is undefined." +@end defun +@end ignore + +Internally, @code{defcustom} uses the symbol property +@code{standard-value} to record the expression for the default value, +and @code{saved-value} to record the value saved by the user with the +customization buffer. The @code{saved-value} property is actually a +list whose car is an expression which evaluates to the value. + +@node Face Definitions +@section Defining Faces + +Faces are declared with @code{defface}. + +@tindex defface +@defmac defface face spec doc [keyword value]... +Declare @var{face} as a customizable face that defaults according to +@var{spec}. Do not quote the symbol @var{face}. + +@var{doc} is the face documentation. + +@var{spec} should be an alist whose elements have the form +@code{(@var{display} @var{atts})} (see below). When @code{defface} +executes, it defines the face according to @var{spec}, then uses any +customizations saved in the @file{.emacs} file to override that +specification. + +In each element of @var{spec}, @var{atts} is a list of face attributes +and their values. The possible attributes are defined in the variable +@code{custom-face-attributes}. + +The @var{display} part of an element of @var{spec} determines which +frames the element applies to. If more than one element of @var{spec} +matches a given frame, the first matching element is the only one used +for that frame. + +If @var{display} is @code{t} in a @var{spec} element, that element +matches all frames. (This means that any subsequent elements of +@var{spec} are never used.) + +Alternatively, @var{display} can be an alist whose elements have the +form @code{(@var{characteristic} @var{value}@dots{})}. Here +@var{characteristic} specifies a way of classifying frames, and the +@var{value}s are possible classifications which @var{display} should +apply to. Here are the possible values of @var{characteristic}: + +@table @code +@item type +The kind of window system the frame uses---either @code{x}, @code{pc} +(for the MS-DOS console), @code{w32} (for MS Windows 9X/NT), or +@code{tty}. + +@item class +What kinds of colors the frame supports---either @code{color}, +@code{grayscale}, or @code{mono}. + +@item background +The kind of background--- either @code{light} or @code{dark}. +@end table + +If an element of @var{display} specifies more than one +@var{value} for a given @var{characteristic}, any of those values +is acceptable. If an element of @var{display} has elements for +more than one @var{characteristic}, then @var{each} characteristic +of the frame must match one of the values specified for it. +@end defmac + +Internally, @code{defface} uses the symbol property +@code{face-defface-spec} to record the face attributes specified in +@code{defface}, @code{saved-face} for the attributes saved by the user +with the customization buffer, and @code{face-documentation} for the +documentation string. + +@node Customization Types +@section Customization Types + + When you define a user option with @code{defcustom}, you must specify +its @dfn{customization type}. That is a Lisp object which indictaes (1) +which values are legitimate and (2) how to display the value in the +customization buffer for editing. + + You specify the customization type in @code{defcustom} with the +@code{:type} keyword. The argument of @code{:type} is evaluated; since +types that vary at run time are rarely useful, normally it is a quoted +constant. For example: + +@example +(defcustom diff-command "diff" + "*The command to use to run diff." + :type 'string + :group 'diff) +@end example + + In general, a customization type appears is a list whose first element +is a symbol, one of the customization type names defined in the +following sections. After this symbol come a number of arguments, +depending on the symbol. Some of the type symbols do not use any +arguments; those are called @dfn{simple types}. + + In between the type symbol and its arguments, you can optionally +write keyword-value pairs. @xref{Type Keywords}. + + For a simple type, if you do not use any keyword-value pairs, you can +omit the parentheses around the type symbol. The above example does +this, using just @code{string} as the customization type. +But @code{(string)} would mean the same thing. + +@menu +* Simple Types:: +* Composite Types:: +* Splicing into Lists:: +* Type Keywords:: +@end menu + +@node Simple Types +@subsection Simple Types + + This section describes all the simple customization types. + +@table @code +@item sexp +The value may be any Lisp object that can be printed and read back. You +can use @code{sexp} as a fall-back for any option, if you don't want to +take the time to work out a more specific type to use. + +@item integer +The value must be an integer, and is represented textually +in the customization buffer. + +@item number +The value must be a number, and is represented textually in the +customization buffer. + +@item string +The value must be a string, and the customization buffer shows just the +contents, with no @samp{"} characters or quoting with @samp{\}. + +@item regexp +The value must be a string which is a valid regular expression. + +@item character +The value must be a character code. A character code is actually an +integer, but this type shows the value by inserting the character in the +buffer, rather than by showing the number. + +@item file +The value must be a file name, and you can do completion with +@kbd{M-@key{TAB}}. + +@item (file :must-match t) +The value must be a file name for an existing file, and you can do +completion with @kbd{M-@key{TAB}}. + +@item directory +The value must be a directory name, and you can do completion with +@kbd{M-@key{TAB}}. + +@item symbol +The value must be a symbol. It appears in the customization buffer as +the name of the symbol. + +@item function +The value must be either a lambda expression or a function name. When +it is a function name, you can do completion with @kbd{M-@key{TAB}}. + +@item variable +The value must be a variable name, and you can do completion with +@kbd{M-@key{TAB}}. + +@item boolean +The value is boolean---either @code{nil} or @code{t}. +@end table + +@node Composite Types +@subsection Composite Types + + When none of the simple types is appropriate, you can use composite +types, which build from simple types. Here are several ways of doing +that: + +@table @code +@item (restricted-sexp :match-alternatives @var{criteria}) +The value may be any Lisp object that satisfies one of @var{criteria}. +@var{criteria} should be a list, and each elements should be +one of these possibilities: + +@itemize @bullet +@item +A predicate---that is, a function of one argument that returns non-@code{nil} +if the argument fits a certain type. This means that objects of that type +are acceptable. + +@item +A quoted constant---that is, @code{'@var{object}}. This means that +@var{object} is an acceptable value. +@end itemize + +For example, + +@example +(restricted-sexp :match-alternatives (integerp 't 'nil)) +@end example + +@noindent +allows integers, @code{t} and @code{nil} as legitimate values. + +The customization buffer shows all legitimate values using their read +syntax, and the user edits them textually. + +@item (cons @var{car-type} @var{cdr-type}) +The value must be a cons cell, its @sc{car} must fit @var{car-type}, and +its @sc{cdr} must fit @var{cdr-type}. For example, @code{(const string +symbol)} is a customization type which matches values such as +@code{("foo" . foo)}. + +In the customization buffeer, the @sc{car} and the @sc{cdr} are +displayed and edited separately, each according to the type +that you specify for it. + +@item (list @var{element-types}@dots{}) +The value must be a list with exactly as many elements as the +@var{element-types} you have specified; and each element must fit the +corresponding @var{element-type}. + +For example, @code{(list integer string function)} describes a list of +three elements; the first element must be an integer, the second a +string, and the third a function. + +In the customization buffeer, the each element is displayed and edited +separately, according to the type specified for it. + +@item (vector @var{element-types}@dots{}) +Like @code{list} except that the value must be a vector instead of a +list. The elements work the same as in @code{list}. + +@item (choice @var{alternative-types}...) +The value must fit at least one of @var{alternative-types}. +For example, @code{(choice integer string)} allows either an +integer or a string. + +In the customization buffer, the user selects one of the alternatives +using a menu, and can then edit the value in the usual way for that +alternative. + +Normally the strings in this menu are determined automatically from the +choices; however, you can specify different strings for the menu by +including the @code{:tag} keyword in the alternatives. For example, if +an integer stands for a number of spaces, while a string is text to use +verbatim, you might write the customization type this way, + +@smallexample +(choice (integer :tag "Number of spaces") + (string :tag "Literal text")) +@end smallexample + +@noindent +so that the menu offers @samp{Number of spaces} and @samp{Literal Text}. + +@item (const @var{value}) +The value must be @var{value}---nothing else is allowed. + +The main use of @code{const} is inside of @code{choice}. For example, +@code{(choice integer (const nil))} allows either an integer or +@code{nil}. @code{:tag} is often used with @code{const}. + +@item (function-item @var{function}) +Like @code{const}, but used for values which are functions. This +displays the documentation string of the function @var{function} +as well as its name. + +@item (variable-item @var{variable}) +Like @code{const}, but used for values which are variable names. This +displays the documentation string of the variable @var{variable} as well +as its name. + +@item (set @var{elements}@dots{}) +The value must be a list and each element of the list must be one of the +@var{elements} specified. This appears in the customization buffer as a +checklist. + +@item (repeat @var{element-type}) +The value must be a list and each element of the list must fit the type +@var{element-type}. This appears in the customization buffer as a +list of elements, with @samp{[INS]} and @samp{[DEL]} buttons for adding +more elements or removing elements. +@end table + +@node Splicing into Lists +@subsection Splicing into Lists + + The @code{:inline} feature lets you splice a variable number of +elements into the middle of a list or vector. You use it in a +@code{set}, @code{choice} or @code{repeat} type which appears among the +element-types of a @code{list} or @code{vector}. + + Normally, each of the element-types in a @code{list} or @code{vector} +describes one and only one element of the list or vector. Thus, if an +element-type is a @code{repeat}, that specifies a list of unspecified +length which appears as one element. + + But when the element-type uses @code{:inline}, the value it matches is +merged directly into the containing sequence. For example, if it +matches a list with three elements, those become three elements of the +overall sequence. This is analogous to using @samp{,@@} in the backquote +construct. + + For example, to specify a list whose first element must be @code{t} +and whose remaining arguments should be zero or more of @code{foo} and +@code{bar}, use this customization type: + +@example +(list (const t) (set :inline t foo bar)) +@end example + +@noindent +This matches values such as @code{(t)}, @code{(t foo)}, @code{(t bar)} +and @code{(t foo bar)}. + + When the element-type is a @code{choice}, you use @code{:inline} not +in the @code{choice} itself, but in (some of) the alternatives of the +@code{choice}. For example, to match a list which must start with a +file name, followed either by the symbol @code{t} or two strings, use +this customization type: + +@example +(list file + (choice (const t) + (list :inline t string string))) +@end example + +@noindent +If the user chooses the first alternative in the choice, then the +overall list has two elements and the second element is @code{t}. If +the user chooses the second alternative, then the overall list has three +elements and the second and third must be strings. + +@node Type Keywords +@subsection Type Keywords + +You can specify keyword-argument pairs in a customization type after the +type name symbol. Here are the keywords you can use, and their +meanings: + +@table @code +@item :value @var{default} +This is used for a type that appears as an alternative inside of +@code{:choice}; it specifies the default value to use, at first, if and +when the user selects this alternative with the menu in the +customization buffer. + +Of course, if the actual value of the option fits this alternative, it +will appear showing the actual value, not @var{default}. + +@item :format @var{format-string} +This string will be inserted in the buffer to represent the value +corresponding to the type. The following @samp{%} escapes are available +for use in @var{format-string}: + +@table @samp +@ignore +@item %[@var{button}%] +Display the text @var{button} marked as a button. The @code{:action} +attribute specifies what the button will do if the user invokes it; +its value is a function which takes two arguments---the widget which +the button appears in, and the event. + +There is no way to specify two different buttons with different +actions; but perhaps there is no need for one. +@end ignore + +@item %@{@var{sample}%@} +Show @var{sample} in a special face specified by @code{:sample-face}. + +@item %v +Substitute the item's value. How the value is represented depends on +the kind of item, and (for variables) on the customization type. + +@item %d +Substitute the item's documentation string. + +@item %h +Like @samp{%d}, but if the documentation string is more than one line, +add an active field to control whether to show all of it or just the +first line. + +@item %t +Substitute the tag here. You specify the tag with the @code{:tag} +keyword. + +@item %% +Display a literal @samp{%}. +@end table + +@item :button-face @var{face} +Use face @var{face} for text displayed with @samp{%[@dots{}%]}. + +@item :button-prefix +@itemx :button-suffix +These specify the text to display before and after a button. +Each can be: + +@table @asis +@item @code{nil} +No text is inserted. + +@item a string +The string is inserted literally. + +@item a symbol +The symbol's value is used. +@end table + +@item :doc @var{doc} +Use @var{doc} as the documentation string for this item. + +@item :tag @var{tag} +Use @var{tag} (a string) as the tag for this item. + +@item :help-echo @var{motion-doc} +When you move to this item with @code{widget-forward} or +@code{widget-backward}, it will display the string @var{motion-doc} +in the echo area. + +@item :match @var{function} +Specify how to decide whether a value matches the type. @var{function} +should be a function that accepts two arguments, a widget and a value; +it should return non-@code{nil} if the value is acceptable. + +@ignore +@item :indent @var{columns} +Indent this item by @var{columns} columns. The indentation is used for +@samp{%n}, and automatically for group names, for checklists and radio +buttons, and for editable lists. It affects the whole of the +item except for the first line. + +@item :offset @var{columns} +An integer indicating how many extra spaces to indent the subitems of +this item. By default, subitems are indented the same as their parent. + +@item :extra-offset +An integer indicating how many extra spaces to add to this item's +indentation, compared to its parent. + +@item :notify +A function called each time the item or a subitem is changed. The +function is called with two or three arguments. The first argument is +the item itself, the second argument is the item that was changed, and +the third argument is the event leading to the change, if any. + +@item :menu-tag +Tag used in the menu when the widget is used as an option in a +@code{menu-choice} widget. + +@item :menu-tag-get +Function used for finding the tag when the widget is used as an option +in a @code{menu-choice} widget. By default, the tag used will be either the +@code{:menu-tag} or @code{:tag} property if present, or the @code{princ} +representation of the @code{:value} property if not. + +@item :validate +A function which takes a widget as an argument, and return nil if the +widgets current value is valid for the widget. Otherwise, it should +return the widget containing the invalid data, and set that widgets +@code{:error} property to a string explaining the error. + +You can use the function @code{widget-children-validate} for this job; +it tests that all children of @var{widget} are valid. + +@item :tab-order +Specify the order in which widgets are traversed with +@code{widget-forward} or @code{widget-backward}. This is only partially +implemented. + +@enumerate a +@item +Widgets with tabbing order @code{-1} are ignored. + +@item +(Unimplemented) When on a widget with tabbing order @var{n}, go to the +next widget in the buffer with tabbing order @var{n+1} or @code{nil}, +whichever comes first. + +@item +When on a widget with no tabbing order specified, go to the next widget +in the buffer with a positive tabbing order, or @code{nil} +@end enumerate + +@item :parent +The parent of a nested widget (e.g. a @code{menu-choice} item or an +element of a @code{editable-list} widget). + +@item :sibling-args +This keyword is only used for members of a @code{radio-button-choice} or +@code{checklist}. The value should be a list of extra keyword +arguments, which will be used when creating the @code{radio-button} or +@code{checkbox} associated with this item. +@end ignore +@end table diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi new file mode 100644 index 00000000000..16a22f2c443 --- /dev/null +++ b/lispref/nonascii.texi @@ -0,0 +1,691 @@ +@c -*-texinfo-*- +@c This is part of the GNU Emacs Lisp Reference Manual. +@c Copyright (C) 1998 Free Software Foundation, Inc. +@c See the file elisp.texi for copying conditions. +@setfilename ../info/characters +@node Non-ASCII Characters, Searching and Matching, Text, Top +@chapter Non-ASCII Characters +@cindex multibyte characters +@cindex non-ASCII characters + + This chapter covers the special issues relating to non-@sc{ASCII} +characters and how they are stored in strings and buffers. + +@menu +* Text Representations:: +* Converting Representations:: +* Selecting a Representation:: +* Character Codes:: +* Character Sets:: +* Scanning Charsets:: +* Chars and Bytes:: +* Coding Systems:: +* Default Coding Systems:: +* Specifying Coding Systems:: +* Explicit Encoding:: +@end menu + +@node Text Representations +@section Text Representations +@cindex text representations + + Emacs has two @dfn{text representations}---two ways to represent text +in a string or buffer. These are called @dfn{unibyte} and +@dfn{multibyte}. Each string, and each buffer, uses one of these two +representations. For most purposes, you can ignore the issue of +representations, because Emacs converts text between them as +appropriate. Occasionally in Lisp programming you will need to pay +attention to the difference. + +@cindex unibyte text + In unibyte representation, each character occupies one byte and +therefore the possible character codes range from 0 to 255. Codes 0 +through 127 are @sc{ASCII} characters; the codes from 128 through 255 +are used for one non-@sc{ASCII} character set (you can choose which one +by setting the variable @code{nonascii-insert-offset}). + +@cindex leading code +@cindex multibyte text + In multibyte representation, a character may occupy more than one +byte, and as a result, the full range of Emacs character codes can be +stored. The first byte of a multibyte character is always in the range +128 through 159 (octal 0200 through 0237). These values are called +@dfn{leading codes}. The first byte determines which character set the +character belongs to (@pxref{Character Sets}); in particular, it +determines how many bytes long the sequence is. The second and +subsequent bytes of a multibyte character are always in the range 160 +through 255 (octal 0240 through 0377). + + In a buffer, the buffer-local value of the variable +@code{enable-multibyte-characters} specifies the representation used. +The representation for a string is determined based on the string +contents when the string is constructed. + +@tindex enable-multibyte-characters +@defvar enable-multibyte-characters +This variable specifies the current buffer's text representation. +If it is non-@code{nil}, the buffer contains multibyte text; otherwise, +it contains unibyte text. + +@strong{Warning:} do not set this variable directly; instead, use the +function @code{set-buffer-multibyte} to change a buffer's +representation. +@end defvar + +@tindex default-enable-multibyte-characters +@defvar default-enable-multibyte-characters +This variable`s value is entirely equivalent to @code{(default-value +'enable-multibyte-characters)}, and setting this variable changes that +default value. Although setting the local binding of +@code{enable-multibyte-characters} in a specific buffer is dangerous, +changing the default value is safe, and it is a reasonable thing to do. + +The @samp{--unibyte} command line option does its job by setting the +default value to @code{nil} early in startup. +@end defvar + +@tindex multibyte-string-p +@defun multibyte-string-p string +Return @code{t} if @var{string} contains multibyte characters. +@end defun + +@node Converting Representations +@section Converting Text Representations + + Emacs can convert unibyte text to multibyte; it can also convert +multibyte text to unibyte, though this conversion loses information. In +general these conversions happen when inserting text into a buffer, or +when putting text from several strings together in one string. You can +also explicitly convert a string's contents to either representation. + + Emacs chooses the representation for a string based on the text that +it is constructed from. The general rule is to convert unibyte text to +multibyte text when combining it with other multibyte text, because the +multibyte representation is more general and can hold whatever +characters the unibyte text has. + + When inserting text into a buffer, Emacs converts the text to the +buffer's representation, as specified by +@code{enable-multibyte-characters} in that buffer. In particular, when +you insert multibyte text into a unibyte buffer, Emacs converts the text +to unibyte, even though this conversion cannot in general preserve all +the characters that might be in the multibyte text. The other natural +alternative, to convert the buffer contents to multibyte, is not +acceptable because the buffer's representation is a choice made by the +user that cannot simply be overrided. + + Converting unibyte text to multibyte text leaves @sc{ASCII} characters +unchanged. It converts the non-@sc{ASCII} codes 128 through 255 by +adding the value @code{nonascii-insert-offset} to each character code. +By setting this variable, you specify which character set the unibyte +characters correspond to. For example, if @code{nonascii-insert-offset} +is 2048, which is @code{(- (make-char 'latin-iso8859-1 0) 128)}, then +the unibyte non-@sc{ASCII} characters correspond to Latin 1. If it is +2688, which is @code{(- (make-char 'greek-iso8859-7 0) 128)}, then they +correspond to Greek letters. + + Converting multibyte text to unibyte is simpler: it performs +logical-and of each character code with 255. If +@code{nonascii-insert-offset} has a reasonable value, corresponding to +the beginning of some character set, this conversion is the inverse of +the other: converting unibyte text to multibyte and back to unibyte +reproduces the original unibyte text. + +@tindex nonascii-insert-offset +@defvar nonascii-insert-offset +This variable specifies the amount to add to a non-@sc{ASCII} character +when converting unibyte text to multibyte. It also applies when +@code{insert-char} or @code{self-insert-command} inserts a character in +the unibyte non-@sc{ASCII} range, 128 through 255. + +The right value to use to select character set @var{cs} is @code{(- +(make-char @var{cs} 0) 128)}. If the value of +@code{nonascii-insert-offset} is zero, then conversion actually uses the +value for the Latin 1 character set, rather than zero. +@end defvar + +@tindex nonascii-translate-table +@defvar nonascii-translate-table +This variable provides a more general alternative to +@code{nonascii-insert-offset}. You can use it to specify independently +how to translate each code in the range of 128 through 255 into a +multibyte character. The value should be a vector, or @code{nil}. +@end defvar + +@tindex string-make-unibyte +@defun string-make-unibyte string +This function converts the text of @var{string} to unibyte +representation, if it isn't already, and return the result. If +conversion does not change the contents, the value may be @var{string} +itself. +@end defun + +@tindex string-make-multibyte +@defun string-make-multibyte string +This function converts the text of @var{string} to multibyte +representation, if it isn't already, and return the result. If +conversion does not change the contents, the value may be @var{string} +itself. +@end defun + +@node Selecting a Representation +@section Selecting a Representation + + Sometimes it is useful to examine an existing buffer or string as +multibyte when it was unibyte, or vice versa. + +@tindex set-buffer-multibyte +@defun set-buffer-multibyte multibyte +Set the representation type of the current buffer. If @var{multibyte} +is non-@code{nil}, the buffer becomes multibyte. If @var{multibyte} +is @code{nil}, the buffer becomes unibyte. + +This function leaves the buffer contents unchanged when viewed as a +sequence of bytes. As a consequence, it can change the contents viewed +as characters; a sequence of two bytes which is treated as one character +in multibyte representation will count as two characters in unibyte +representation. + +This function sets @code{enable-multibyte-characters} to record which +representation is in use. It also adjusts various data in the buffer +(including its overlays, text properties and markers) so that they +cover or fall between the same text as they did before. +@end defun + +@tindex string-as-unibyte +@defun string-as-unibyte string +This function returns a string with the same bytes as @var{string} but +treating each byte as a character. This means that the value may have +more characters than @var{string} has. + +If @var{string} is unibyte already, then the value may be @var{string} +itself. +@end defun + +@tindex string-as-multibyte +@defun string-as-multibyte string +This function returns a string with the same bytes as @var{string} but +treating each multibyte sequence as one character. This means that the +value may have fewer characters than @var{string} has. + +If @var{string} is multibyte already, then the value may be @var{string} +itself. +@end defun + +@node Character Codes +@section Character Codes +@cindex character codes + + The unibyte and multibyte text representations use different character +codes. The valid character codes for unibyte representation range from +0 to 255---the values that can fit in one byte. The valid character +codes for multibyte representation range from 0 to 524287, but not all +values in that range are valid. In particular, the values 128 through +255 are not valid in multibyte text. Only the @sc{ASCII} codes 0 +through 127 are used in both representations. + +@defun char-valid-p charcode +This returns @code{t} if @var{charcode} is valid for either one of the two +text representations. + +@example +(char-valid-p 65) + @result{} t +(char-valid-p 256) + @result{} nil +(char-valid-p 2248) + @result{} t +@end example +@end defun + +@node Character Sets +@section Character Sets +@cindex character sets + + Emacs classifies characters into various @dfn{character sets}, each of +which has a name which is a symbol. Each character belongs to one and +only one character set. + + In general, there is one character set for each distinct script. For +example, @code{latin-iso8859-1} is one character set, +@code{greek-iso8859-7} is another, and @code{ascii} is another. An +Emacs character set can hold at most 9025 characters; therefore. in some +cases, a set of characters that would logically be grouped together are +split into several character sets. For example, one set of Chinese +characters is divided into eight Emacs character sets, +@code{chinese-cns11643-1} through @code{chinese-cns11643-7}. + +@tindex charsetp +@defun charsetp object +Return @code{t} if @var{object} is a character set name symbol, +@code{nil} otherwise. +@end defun + +@tindex charset-list +@defun charset-list +This function returns a list of all defined character set names. +@end defun + +@tindex char-charset +@defun char-charset character +This function returns the the name of the character +set that @var{character} belongs to. +@end defun + +@node Scanning Charsets +@section Scanning for Character Sets + + Sometimes it is useful to find out which character sets appear in a +part of a buffer or a string. One use for this is in determining which +coding systems (@pxref{Coding Systems}) are capable of representing all +of the text in question. + +@tindex find-charset-region +@defun find-charset-region beg end &optional unification +This function returns a list of the character sets +that appear in the current buffer between positions @var{beg} +and @var{end}. +@end defun + +@tindex find-charset-string +@defun find-charset-string string &optional unification +This function returns a list of the character sets +that appear in the string @var{string}. +@end defun + +@node Chars and Bytes +@section Characters and Bytes +@cindex bytes and characters + + In multibyte representation, each character occupies one or more +bytes. The functions in this section convert between characters and the +byte values used to represent them. + +@tindex char-bytes +@defun char-bytes character +This function returns the number of bytes used to represent the +character @var{character}. In most cases, this is the same as +@code{(length (split-char @var{character}))}; the only exception is for +ASCII characters, which use just one byte. + +@example +(char-bytes 2248) + @result{} 2 +(char-bytes 65) + @result{} 1 +@end example + +This function's values are correct for both multibyte and unibyte +representations, because the non-@sc{ASCII} character codes used in +those two representations do not overlap. + +@example +(char-bytes 192) + @result{} 1 +@end example +@end defun + +@tindex split-char +@defun split-char character +Return a list containing the name of the character set of +@var{character}, followed by one or two byte-values which identify +@var{character} within that character set. + +@example +(split-char 2248) + @result{} (latin-iso8859-1 72) +(split-char 65) + @result{} (ascii 65) +@end example + +Unibyte non-@sc{ASCII} characters are considered as part of +the @code{ascii} character set: + +@example +(split-char 192) + @result{} (ascii 192) +@end example +@end defun + +@tindex make-char +@defun make-char charset &rest byte-values +Thus function returns the character in character set @var{charset} +identified by @var{byte-values}. This is roughly the opposite of +split-char. + +@example +(make-char 'latin-iso8859-1 72) + @result{} 2248 +@end example +@end defun + +@node Coding Systems +@section Coding Systems + +@cindex coding system + When Emacs reads or writes a file, and when Emacs sends text to a +subprocess or receives text from a subprocess, it normally performs +character code conversion and end-of-line conversion as specified +by a particular @dfn{coding system}. + +@cindex character code conversion + @dfn{Character code conversion} involves conversion between the encoding +used inside Emacs and some other encoding. Emacs supports many +different encodings, in that it can convert to and from them. For +example, it can convert text to or from encodings such as Latin 1, Latin +2, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some +cases, Emacs supports several alternative encodings for the same +characters; for example, there are three coding systems for the Cyrillic +(Russian) alphabet: ISO, Alternativnyj, and KOI8. + +@cindex end of line conversion + @dfn{End of line conversion} handles three different conventions used +on various systems for end of line. The Unix convention is to use the +linefeed character (also called newline). The DOS convention is to use +the two character sequence, carriage-return linefeed, at the end of a +line. The Mac convention is to use just carriage-return. + + Most coding systems specify a particular character code for +conversion, but some of them leave this unspecified---to be chosen +heuristically based on the data. + +@cindex base coding system +@cindex variant coding system + @dfn{Base coding systems} such as @code{latin-1} leave the end-of-line +conversion unspecified, to be chosen based on the data. @dfn{Variant +coding systems} such as @code{latin-1-unix}, @code{latin-1-dos} and +@code{latin-1-mac} specify the end-of-line conversion explicitly as +well. Each base coding system has three corresponding variants whose +names are formed by adding @samp{-unix}, @samp{-dos} and @samp{-mac}. + + Here are Lisp facilities for working with coding systems; + +@tindex coding-system-list +@defun coding-system-list &optional base-only +This function returns a list of all coding system names (symbols). If +@var{base-only} is non-@code{nil}, the value includes only the +base coding systems. Otherwise, it includes variant coding systems as well. +@end defun + +@tindex coding-system-p +@defun coding-system-p object +This function returns @code{t} if @var{object} is a coding system +name. +@end defun + +@tindex check-coding-system +@defun check-coding-system coding-system +This function checks the validity of @var{coding-system}. +If that is valid, it returns @var{coding-system}. +Otherwise it signals an error with condition @code{coding-system-error}. +@end defun + +@tindex detect-coding-region +@defun detect-coding-region start end highest +This function chooses a plausible coding system for decoding the text +from @var{start} to @var{end}. This text should be ``raw bytes'' +(@pxref{Specifying Coding Systems}). + +Normally this function returns is a list of coding systems that could +handle decoding the text that was scanned. They are listed in order of +decreasing priority, based on the priority specified by the user with +@code{prefer-coding-system}. But if @var{highest} is non-@code{nil}, +then the return value is just one coding system, the one that is highest +in priority. +@end defun + +@tindex detect-coding-string string highest +@defun detect-coding-string +This function is like @code{detect-coding-region} except that it +operates on the contents of @var{string} instead of bytes in the buffer. +@end defun + +@defun find-operation-coding-system operation &rest arguments +This function returns the coding system to use (by default) for +performing @var{operation} with @var{arguments}. The value has this +form: + +@example +(@var{decoding-system} @var{encoding-system}) +@end example + +The first element, @var{decoding-system}, is the coding system to use +for decoding (in case @var{operation} does decoding), and +@var{encoding-system} is the coding system for encoding (in case +@var{operation} does encoding). + +The argument @var{operation} should be an Emacs I/O primitive: +@code{insert-file-contents}, @code{write-region}, @code{call-process}, +@code{call-process-region}, @code{start-process}, or +@code{open-network-stream}. + +The remaining arguments should be the same arguments that might be given +to that I/O primitive. Depending on which primitive, one of those +arguments is selected as the @dfn{target}. For example, if +@var{operation} does file I/O, whichever argument specifies the file +name is the target. For subprocess primitives, the process name is the +target. For @code{open-network-stream}, the target is the service name +or port number. + +This function looks up the target in @code{file-coding-system-alist}, +@code{process-coding-system-alist}, or +@code{network-coding-system-alist}, depending on @var{operation}. +@xref{Default Coding Systems}. +@end defun + +@node Default Coding Systems +@section Default Coding Systems + + These variable specify which coding system to use by default for +certain files or when running certain subprograms. The idea of these +variables is that you set them once and for all to the defaults you +want, and then do not change them again. To specify a particular coding +system for a particular operation, don't change these variables; +instead, override them using @code{coding-system-for-read} and +@code{coding-system-for-write} (@pxref{Specifying Coding Systems}). + +@tindex file-coding-system-alist +@defvar file-coding-system-alist +This variable is an alist that specifies the coding systems to use for +reading and writing particular files. Each element has the form +@code{(@var{pattern} . @var{coding})}, where @var{pattern} is a regular +expression that matches certain file names. The element applies to file +names that match @var{pattern}. + +The @sc{cdr} of the element, @var{val}, should be either a coding +system, a cons cell containing two coding systems, or a function symbol. +If @var{val} is a coding system, that coding system is used for both +reading the file and writing it. If @var{val} is a cons cell containing +two coding systems, its @sc{car} specifies the coding system for +decoding, and its @sc{cdr} specifies the coding system for encoding. + +If @var{val} is a function symbol, the function must return a coding +system or a cons cell containing two coding systems. This value is used +as described above. +@end defvar + +@tindex process-coding-system-alist +@defvar process-coding-system-alist +This variable is an alist specifying which coding systems to use for a +subprocess, depending on which program is running in the subprocess. It +works like @code{file-coding-system-alist}, except that @var{pattern} is +matched against the program name used to start the subprocess. The coding +system or systems specified in this alist are used to initialize the +coding systems used for I/O to the subprocess, but you can specify +other coding systems later using @code{set-process-coding-system}. +@end defvar + +@tindex network-coding-system-alist +@defvar network-coding-system-alist +This variable is an alist that specifies the coding system to use for +network streams. It works much like @code{file-coding-system-alist}, +with the difference that the @var{pattern} in an elemetn may be either a +port number or a regular expression. If it is a regular expression, it +is matched against the network service name used to open the network +stream. +@end defvar + +@tindex default-process-coding-system +@defvar default-process-coding-system +This variable specifies the coding systems to use for subprocess (and +network stream) input and output, when nothing else specifies what to +do. + +The value should be a cons cell of the form @code{(@var{output-coding} +. @var{input-coding})}. Here @var{output-coding} applies to output to +the subprocess, and @var{input-coding} applies to input from it. +@end defvar + +@node Specifying Coding Systems +@section Specifying a Coding System for One Operation + + You can specify the coding system for a specific operation by binding +the variables @code{coding-system-for-read} and/or +@code{coding-system-for-write}. + +@tindex coding-system-for-read +@defvar coding-system-for-read +If this variable is non-@code{nil}, it specifies the coding system to +use for reading a file, or for input from a synchronous subprocess. + +It also applies to any asynchronous subprocess or network stream, but in +a different way: the value of @code{coding-system-for-read} when you +start the subprocess or open the network stream specifies the input +decoding method for that subprocess or network stream. It remains in +use for that subprocess or network stream unless and until overridden. + +The right way to use this variable is to bind it with @code{let} for a +specific I/O operation. Its global value is normally @code{nil}, and +you should not globally set it to any other value. Here is an example +of the right way to use the variable: + +@example +;; @r{Read the file with no character code conversion.} +;; @r{Assume CRLF represents end-of-line.} +(let ((coding-system-for-write 'emacs-mule-dos)) + (insert-file-contents filename)) +@end example + +When its value is non-@code{nil}, @code{coding-system-for-read} takes +precedence all other methods of specifying a coding system to use for +input, including @code{file-coding-system-alist}, +@code{process-coding-system-alist} and +@code{network-coding-system-alist}. +@end defvar + +@tindex coding-system-for-write +@defvar coding-system-for-write +This works much like @code{coding-system-for-read}, except that it +applies to output rather than input. It affects writing to files, +subprocesses, and net connections. + +When a single operation does both input and output, as do +@code{call-process-region} and @code{start-process}, both +@code{coding-system-for-read} and @code{coding-system-for-write} +affect it. +@end defvar + +@tindex last-coding-system-used +@defvar last-coding-system-used +All operations that use a coding system set this variable +to the coding system name that was used. +@end defvar + +@tindex inhibit-eol-conversion +@defvar inhibit-eol-conversion +When this variable is non-@code{nil}, no end-of-line conversion is done, +no matter which coding system is specified. This applies to all the +Emacs I/O and subprocess primitives, and to the explicit encoding and +decoding functions (@pxref{Explicit Encoding}). +@end defvar + +@tindex keyboard-coding-system +@defun keyboard-coding-system +This function returns the coding system that is in use for decoding +keyboard input---or @code{nil} if no coding system is to be used. +@end defun + +@tindex set-keyboard-coding-system +@defun set-keyboard-coding-system coding-system +This function specifies @var{coding-system} as the coding system to +use for decoding keyboard input. If @var{coding-system} is @code{nil}, +that means do not decode keyboard input. +@end defun + +@tindex terminal-coding-system +@defun terminal-coding-system +This function returns the coding system that is in use for encoding +terminal output---or @code{nil} for no encoding. +@end defun + +@tindex set-terminal-coding-system +@defun set-terminal-coding-system coding-system +This function specifies @var{coding-system} as the coding system to use +for encoding terminal output. If @var{coding-system} is @code{nil}, +that means do not encode terminal output. +@end defun + + See also the functions @code{process-coding-system} and +@code{set-process-coding-system}. @xref{Process Information}. + + See also @code{read-coding-system} in @ref{High-Level Completion}. + +@node Explicit Encoding +@section Explicit Encoding and Decoding +@cindex encoding text +@cindex decoding text + + All the operations that transfer text in and out of Emacs have the +ability to use a coding system to encode or decode the text. +You can also explicitly encode and decode text using the functions +in this section. + +@cindex raw bytes + The result of encoding, and the input to decoding, are not ordinary +text. They are ``raw bytes''---bytes that represent text in the same +way that an external file would. When a buffer contains raw bytes, it +is most natural to mark that buffer as using unibyte representation, +using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}), +but this is not required. + + The usual way to get raw bytes in a buffer, for explicit decoding, is +to read them with from a file with @code{insert-file-contents-literally} +(@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile} +arguments when visiting a file with @code{find-file-noselect}. + + The usual way to use the raw bytes that result from explicitly +encoding text is to copy them to a file or process---for example, to +write it with @code{write-region} (@pxref{Writing to Files}), and +suppress encoding for that @code{write-region} call by binding +@code{coding-system-for-write} to @code{no-conversion}. + +@tindex encode-coding-region +@defun encode-coding-region start end coding-system +This function encodes the text from @var{start} to @var{end} according +to coding system @var{coding-system}. The encoded text replaces +the original text in the buffer. The result of encoding is +``raw bytes.'' +@end defun + +@tindex encode-coding-string +@defun encode-coding-string string coding-system +This function encodes the text in @var{string} according to coding +system @var{coding-system}. It returns a new string containing the +encoded text. The result of encoding is ``raw bytes.'' +@end defun + +@tindex decode-coding-region +@defun decode-coding-region start end coding-system +This function decodes the text from @var{start} to @var{end} according +to coding system @var{coding-system}. The decoded text replaces the +original text in the buffer. To make explicit decoding useful, the text +before decoding ought to be ``raw bytes.'' +@end defun + +@tindex decode-coding-string +@defun decode-coding-string string coding-system +This function decodes the text in @var{string} according to coding +system @var{coding-system}. It returns a new string containing the +decoded text. To make explicit decoding useful, the contents of +@var{string} ought to be ``raw bytes.'' +@end defun