From 483ab23014e2879d1f83620cd27e1c5f7b3c3d46 Mon Sep 17 00:00:00 2001 From: Chong Yidong Date: Thu, 8 Mar 2012 13:27:03 +0800 Subject: [PATCH] More updates to Text chapter of Lisp manual. * doc/lispref/text.texi (Mode-Specific Indent): Document new behavior of indent-for-tab-command. Document tab-always-indent. (Special Properties): Copyedits. (Checksum/Hash): Improve secure-hash doc. Do not recommend MD5. (Parsing HTML/XML): Rename from Parsing HTML. Update doc of libxml-parse-html-region. --- doc/lispref/ChangeLog | 9 ++ doc/lispref/elisp.texi | 3 +- doc/lispref/text.texi | 295 ++++++++++++++++++++++------------------- doc/lispref/vol1.texi | 3 +- doc/lispref/vol2.texi | 3 +- etc/NEWS | 11 +- 6 files changed, 182 insertions(+), 142 deletions(-) diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 42ec24fac5f..16291e144d3 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog @@ -1,3 +1,12 @@ +2012-03-08 Chong Yidong + + * text.texi (Mode-Specific Indent): Document new behavior of + indent-for-tab-command. Document tab-always-indent. + (Special Properties): Copyedits. + (Checksum/Hash): Improve secure-hash doc. Do not recommend MD5. + (Parsing HTML/XML): Rename from Parsing HTML. Update doc of + libxml-parse-html-region. + 2012-03-07 Glenn Morris * markers.texi (The Region): Briefly mention use-empty-active-region diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi index 7a444ee4039..ea304292497 100644 --- a/doc/lispref/elisp.texi +++ b/doc/lispref/elisp.texi @@ -1054,7 +1054,8 @@ Text * Registers:: How registers are implemented. Accessing the text or position stored in a register. * Base 64:: Conversion to or from base 64 encoding. -* Checksum/Hash:: Computing "message digests"/"checksums"/"hashes". +* Checksum/Hash:: Computing cryptographic hashes. +* Parsing HTML/XML:: Parsing HTML and XML. * Atomic Changes:: Installing several buffer changes "atomically". * Change Hooks:: Supplying functions to be run when text is changed. diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi index 88cb6a157f8..c60150cc061 100644 --- a/doc/lispref/text.texi +++ b/doc/lispref/text.texi @@ -56,8 +56,8 @@ the character after point. * Registers:: How registers are implemented. Accessing the text or position stored in a register. * Base 64:: Conversion to or from base 64 encoding. -* Checksum/Hash:: Computing "message digests"/"checksums"/"hashes". -* Parsing HTML:: Parsing HTML and XML. +* Checksum/Hash:: Computing cryptographic hashes. +* Parsing HTML/XML:: Parsing HTML and XML. * Atomic Changes:: Installing several buffer changes "atomically". * Change Hooks:: Supplying functions to be run when text is changed. @end menu @@ -2203,14 +2203,48 @@ key to indent properly for the language being edited. This section describes the mechanism of the @key{TAB} key and how to control it. The functions in this section return unpredictable values. -@defvar indent-line-function -This variable's value is the function to be used by @key{TAB} (and -various commands) to indent the current line. The command -@code{indent-according-to-mode} does little more than call this function. +@deffn Command indent-for-tab-command &optional rigid +This is the command bound to @key{TAB} in most editing modes. Its +usual action is to indent the current line, but it can alternatively +insert a tab character or indent a region. + +Here is what it does: -In Lisp mode, the value is the symbol @code{lisp-indent-line}; in C -mode, @code{c-indent-line}; in Fortran mode, @code{fortran-indent-line}. -The default value is @code{indent-relative}. @xref{Auto-Indentation}. +@itemize +@item +First, it checks whether Transient Mark mode is enabled and the region +is active. If so, it called @code{indent-region} to indent all the +text in the region (@pxref{Region Indent}). + +@item +Otherwise, if the indentation function in @code{indent-line-function} +is @code{indent-to-left-margin} (a trivial command that inserts a tab +character), or if the variable @code{tab-always-indent} specifies that +a tab character ought to be inserted (see below), then it inserts a +tab character. + +@item +Otherwise, it indents the current line; this is done by calling the +function in @code{indent-line-function}. If the line is already +indented, and the value of @code{tab-always-indent} is @code{complete} +(see below), it tries completing the text at point. +@end itemize + +If @var{rigid} is non-@code{nil} (interactively, with a prefix +argument), then after this command indents a line or inserts a tab, it +also rigidly indents the entire balanced expression which starts at +the beginning of the current line, in order to reflect the new +indentation. This argument is ignored if the command indents the +region. +@end deffn + +@defvar indent-line-function +This variable's value is the function to be used by +@code{indent-for-tab-command}, and various other indentation commands, +to indent the current line. It is usually assigned by the major mode; +for instance, Lisp mode sets it to @code{lisp-indent-line}, C mode +sets it to @code{c-indent-line}, and so on. The default value is +@code{indent-relative}. @xref{Auto-Indentation}. @end defvar @deffn Command indent-according-to-mode @@ -2218,41 +2252,31 @@ This command calls the function in @code{indent-line-function} to indent the current line in a way appropriate for the current major mode. @end deffn -@deffn Command indent-for-tab-command &optional rigid -This command calls the function in @code{indent-line-function} to -indent the current line; however, if that function is -@code{indent-to-left-margin}, @code{insert-tab} is called instead. -(That is a trivial command that inserts a tab character.) If -@var{rigid} is non-@code{nil}, this function also rigidly indents the -entire balanced expression that starts at the beginning of the current -line, to reflect change in indentation of the current line. -@end deffn - @deffn Command newline-and-indent This function inserts a newline, then indents the new line (the one -following the newline just inserted) according to the major mode. - -It does indentation by calling the current @code{indent-line-function}. -In programming language modes, this is the same thing @key{TAB} does, -but in some text modes, where @key{TAB} inserts a tab, -@code{newline-and-indent} indents to the column specified by -@code{left-margin}. +following the newline just inserted) according to the major mode. It +does indentation by calling @code{indent-according-to-mode}. @end deffn @deffn Command reindent-then-newline-and-indent -@comment !!SourceFile simple.el This command reindents the current line, inserts a newline at point, and then indents the new line (the one following the newline just -inserted). - -This command does indentation on both lines according to the current -major mode, by calling the current value of @code{indent-line-function}. -In programming language modes, this is the same thing @key{TAB} does, -but in some text modes, where @key{TAB} inserts a tab, -@code{reindent-then-newline-and-indent} indents to the column specified -by @code{left-margin}. +inserted). It does indentation on both lines by calling +@code{indent-according-to-mode}. @end deffn +@defopt tab-always-indent +This variable can be used to customize the behavior of the @key{TAB} +(@code{indent-for-tab-command}) command. If the value is @code{t} +(the default), the command normally just indents the current line. If +the value is @code{nil}, the command indents the current line only if +point is at the left margin or in the line's indentation; otherwise, +it inserts a tab character. If the value is @code{complete}, the +command first tries to indent the current line, and if the line was +already indented, it calls @code{completion-at-point} to complete the +text at point (@pxref{Completion in Buffers}). +@end defopt + @node Region Indent @subsection Indenting an Entire Region @@ -2827,7 +2851,7 @@ faster to process chunks of text that have the same property value. comparing property values. In all cases, @var{object} defaults to the current buffer. - For high performance, it's very important to use the @var{limit} + For good performance, it's very important to use the @var{limit} argument to these functions, especially the ones that search for a single property---otherwise, they may spend a long time scanning to the end of the buffer, if the property you are interested in does not change. @@ -2839,15 +2863,15 @@ different properties. @defun next-property-change pos &optional object limit The function scans the text forward from position @var{pos} in the -string or buffer @var{object} till it finds a change in some text +string or buffer @var{object} until it finds a change in some text property, then returns the position of the change. In other words, it returns the position of the first character beyond @var{pos} whose properties are not identical to those of the character just after @var{pos}. If @var{limit} is non-@code{nil}, then the scan ends at position -@var{limit}. If there is no property change before that point, -@code{next-property-change} returns @var{limit}. +@var{limit}. If there is no property change before that point, this +function returns @var{limit}. The value is @code{nil} if the properties remain unchanged all the way to the end of @var{object} and @var{limit} is @code{nil}. If the value @@ -2980,10 +3004,9 @@ character. @item face @cindex face codes of text @kindex face @r{(text property)} -You can use the property @code{face} to control the font and color of -text. @xref{Faces}, for more information. - -@code{face} can be the following: +The @code{face} property controls the appearance of the character, +such as its font and color. @xref{Faces}. The value of the property +can be the following: @itemize @bullet @item @@ -2996,10 +3019,10 @@ face attribute name and @var{value} is a meaningful value for that attribute. With this feature, you do not need to create a face each time you want to specify a particular attribute for certain text. @xref{Face Attributes}. -@end itemize -@code{face} can also be a list, where each element uses one of the -forms listed above. +@item +A list, where each element uses one of the two forms listed above. +@end itemize Font Lock mode (@pxref{Font Lock Mode}) works in most buffers by dynamically updating the @code{face} property of characters based on @@ -3354,15 +3377,15 @@ of the text. Self-inserting characters normally take on the same properties as the preceding character. This is called @dfn{inheritance} of properties. - In a Lisp program, you can do insertion with inheritance or without, -depending on your choice of insertion primitive. The ordinary text -insertion functions such as @code{insert} do not inherit any properties. -They insert text with precisely the properties of the string being -inserted, and no others. This is correct for programs that copy text -from one context to another---for example, into or out of the kill ring. -To insert with inheritance, use the special primitives described in this -section. Self-inserting characters inherit properties because they work -using these primitives. + A Lisp program can do insertion with inheritance or without, +depending on the choice of insertion primitive. The ordinary text +insertion functions, such as @code{insert}, do not inherit any +properties. They insert text with precisely the properties of the +string being inserted, and no others. This is correct for programs +that copy text from one context to another---for example, into or out +of the kill ring. To insert with inheritance, use the special +primitives described in this section. Self-inserting characters +inherit properties because they work using these primitives. When you do insertion with inheritance, @emph{which} properties are inherited, and from where, depends on which properties are @dfn{sticky}. @@ -4063,46 +4086,64 @@ The decoding functions ignore newline characters in the encoded text. @node Checksum/Hash @section Checksum/Hash @cindex MD5 checksum -@cindex hashing, secure -@cindex SHA-1 -@cindex message digest computation - - MD5 cryptographic checksums, or @dfn{message digests}, are 128-bit -``fingerprints'' of a document or program. They are used to verify -that you have an exact and unaltered copy of the data. The algorithm -to calculate the MD5 message digest is defined in Internet -RFC@footnote{ -For an explanation of what is an RFC, see the footnote in @ref{Base -64}. -}1321. This section describes the Emacs facilities for computing -message digests and other forms of ``secure hash''. +@cindex SHA hash +@cindex hash, cryptographic +@cindex cryptographic hash + + Emacs has built-in support for computing @dfn{cryptographic hashes}. +A cryptographic hash, or @dfn{checksum}, is a digital ``fingerprint'' +of a piece of data (e.g.@: a block of text) which can be used to check +that you have an unaltered copy of that data. + +@cindex message digest + Emacs supports several common cryptographic hash algorithms: MD5, +SHA-1, SHA-2, SHA-224, SHA-256, SHA-384 and SHA-512. MD5 is the +oldest of these algorithms, and is commonly used in @dfn{message +digests} to check the integrity of messages transmitted over a +network. MD5 is not ``collision resistant'' (i.e.@: it is possible to +deliberately design different pieces of data which have the same MD5 +hash), so you should not used it for anything security-related. A +similar theoretical weakness also exists in SHA-1. Therefore, for +security-related applications you should use the other hash types, +such as SHA-2. -@defun md5 object &optional start end coding-system noerror -This function returns the MD5 message digest of @var{object}, which -should be a buffer or a string. +@defun secure-hash algorithm object &optional start end binary +This function returns a hash for @var{object}. The argument +@var{algorithm} is a symbol stating which hash to compute: one of +@code{md5}, @code{sha1}, @code{sha224}, @code{sha256}, @code{sha384} +or @code{sha512}. The argument @var{object} should be a buffer or a +string. -The two optional arguments @var{start} and @var{end} are character +The optional arguments @var{start} and @var{end} are character positions specifying the portion of @var{object} to compute the -message digest for. If they are @code{nil} or omitted, the digest is +message digest for. If they are @code{nil} or omitted, the hash is computed for the whole of @var{object}. -The function @code{md5} does not compute the message digest directly -from the internal Emacs representation of the text (@pxref{Text -Representations}). Instead, it encodes the text using a coding -system, and computes the message digest from the encoded text. The -optional fourth argument @var{coding-system} specifies which coding -system to use for encoding the text. It should be the same coding -system that you used to read the text, or that you used or will use -when saving or sending the text. @xref{Coding Systems}, for more -information about coding systems. - -If @var{coding-system} is @code{nil} or omitted, the default depends -on @var{object}. If @var{object} is a buffer, the default for -@var{coding-system} is whatever coding system would be chosen by -default for writing this text into a file. If @var{object} is a -string, the user's most preferred coding system (@pxref{Recognize -Coding, prefer-coding-system, the description of -@code{prefer-coding-system}, emacs, GNU Emacs Manual}) is used. +If the argument @var{binary} is omitted or @code{nil}, the function +returns the @dfn{text form} of the hash, as an ordinary Lisp string. +If @var{binary} is non-@code{nil}, it returns the hash in @dfn{binary +form}, as a sequence of bytes stored in a unibyte string. + +This function does not compute the hash directly from the internal +representation of @var{object}'s text (@pxref{Text Representations}). +Instead, it encodes the text using a coding system (@pxref{Coding +Systems}), and computes the hash from that encoded text. If +@var{object} is a buffer, the coding system used is the one which +would be chosen by default for writing the text into a file. If +@var{object} is a string, the user's preferred coding system is used +(@pxref{Recognize Coding,,, emacs, GNU Emacs Manual}). +@end defun + +@defun md5 object &optional start end coding-system noerror +This function returns an MD5 hash. It is semi-obsolete, since for +most purposes it is equivalent to calling @code{secure-hash} with +@code{md5} as the @var{algorithm} argument. The @var{object}, +@var{start} and @var{end} arguments have the same meanings as in +@code{secure-hash}. + +If @var{coding-system} is non-@code{nil}, it specifies a coding system +to use to encode the text; if omitted or @code{nil}, the default +coding system is used, like in @code{secure-hash}. Normally, @code{md5} signals an error if the text can't be encoded using the specified or chosen coding system. However, if @@ -4110,65 +4151,53 @@ using the specified or chosen coding system. However, if coding instead. @end defun -@defun secure-hash algorithm object &optional start end binary -This function provides a general interface to a variety of secure -hashing algorithms. As well as the MD5 algorithm, it supports SHA-1, -SHA-2, SHA-224, SHA-256, SHA-384 and SHA-512. The argument -@var{algorithm} is a symbol stating which hash to compute. The -arguments @var{object}, @var{start}, and @var{end} are as for the -@code{md5} function. If the optional argument @var{binary} is -non-@code{nil}, returns a string in binary form. -@end defun - -@node Parsing HTML -@section Parsing HTML +@node Parsing HTML/XML +@section Parsing HTML and XML @cindex parsing html +When Emacs is compiled with libxml2 support, the following functions +are available to parse HTML or XML text into Lisp object trees. + @defun libxml-parse-html-region start end &optional base-url -This function provides HTML parsing via the @code{libxml2} library. -It parses ``real world'' HTML and tries to return a sensible parse tree -regardless. +This function parses the text between @var{start} and @var{end} as +HTML, and returns a list representing the HTML @dfn{parse tree}. It +attempts to handle ``real world'' HTML by robustly coping with syntax +mistakes. -In addition to @var{start} and @var{end} (specifying the start and end -of the region to act on), it takes an optional parameter, -@var{base-url}, which is used to expand relative URLs in the document, -if any. +The optional argument @var{base-url}, if non-@code{nil}, should be a +string specifying the base URL for relative URLs occurring in links. -Here's an example demonstrating the structure of the parsed data you -get out. Given this HTML document: +In the parse tree, each HTML node is represented by a list in which +the first element is a symbol representing the node name, the second +element is an alist of node attributes, and the remaining elements are +the subnodes. + +The following example demonstrates this. Given this (malformed) HTML +document: @example -
Foo
Yes +
Foo
Yes @end example -You get this parse tree: +@noindent +A call to @code{libxml-parse-html-region} returns this: @example -(html - (head) - (body - (:width . "101") - (div - (:class . "thing") - (text . "Foo") - (div - (text . "Yes\n"))))) +(html () + (head ()) + (body ((width . "101")) + (div ((class . "thing")) + "Foo" + (div () + "Yes")))) @end example - -It's a simple tree structure, where the @code{car} for each node is -the name of the node, and the @code{cdr} is the value, or the list of -values. - -Attributes are coded the same way as child nodes, but with @samp{:} as -the first character. @end defun @cindex parsing xml @defun libxml-parse-xml-region start end &optional base-url - -This is much the same as @code{libxml-parse-html-region} above, but -operates on XML instead of HTML, and is correspondingly stricter about -syntax. +This function is the same as @code{libxml-parse-html-region}, except +that it parses the text as XML rather than HTML (so it is stricter +about syntax). @end defun @node Atomic Changes diff --git a/doc/lispref/vol1.texi b/doc/lispref/vol1.texi index a92a807b747..58092f23157 100644 --- a/doc/lispref/vol1.texi +++ b/doc/lispref/vol1.texi @@ -1076,7 +1076,8 @@ Text * Registers:: How registers are implemented. Accessing the text or position stored in a register. * Base 64:: Conversion to or from base 64 encoding. -* Checksum/Hash:: Computing "message digests"/"checksums"/"hashes". +* Checksum/Hash:: Computing cryptographic hashes. +* Parsing HTML/XML:: Parsing HTML and XML. * Atomic Changes:: Installing several buffer changes "atomically". * Change Hooks:: Supplying functions to be run when text is changed. diff --git a/doc/lispref/vol2.texi b/doc/lispref/vol2.texi index 97b21aba10b..a42b70d77a4 100644 --- a/doc/lispref/vol2.texi +++ b/doc/lispref/vol2.texi @@ -1075,7 +1075,8 @@ Text * Registers:: How registers are implemented. Accessing the text or position stored in a register. * Base 64:: Conversion to or from base 64 encoding. -* Checksum/Hash:: Computing "message digests"/"checksums"/"hashes". +* Checksum/Hash:: Computing cryptographic hashes. +* Parsing HTML/XML:: Parsing HTML and XML. * Atomic Changes:: Installing several buffer changes "atomically". * Change Hooks:: Supplying functions to be run when text is changed. diff --git a/etc/NEWS b/etc/NEWS index fd4f7afa863..4e4a551a9d1 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1482,13 +1482,12 @@ These require Emacs to be built with ImageMagick support. image-transform-fit-to-height, image-transform-fit-to-width, image-transform-set-rotation, image-transform-set-scale. ++++ ** XML and HTML parsing -If Emacs is compiled with libxml2 support, there are two new functions: -`libxml-parse-html-region' (which parses "real world" HTML) and -`libxml-parse-xml-region' (which parses XML). Both return an Emacs -Lisp parse tree. - -FIXME: These should be front-ended by xml.el. +If Emacs is compiled with libxml2 support, there are two new +functions: `libxml-parse-html-region' (which parses "real world" HTML) +and `libxml-parse-xml-region' (which parses XML). Both return an +Emacs Lisp parse tree. ** GnuTLS -- 2.39.2