From 5dcb4c4e5d757099766f7147eace43f0b00c7fe4 Mon Sep 17 00:00:00 2001 From: Stefan Monnier Date: Tue, 7 Dec 2010 14:44:38 -0500 Subject: [PATCH] * doc/lispref/modes.texi (Auto-Indentation): New section to document SMIE. (Major Mode Conventions): * doc/lispref/text.texi (Mode-Specific Indent): Refer to it. --- doc/lispref/ChangeLog | 206 +++++++------ doc/lispref/modes.texi | 668 ++++++++++++++++++++++++++++++++++++++++- doc/lispref/text.texi | 2 +- 3 files changed, 771 insertions(+), 105 deletions(-) diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 6efa8466563..b27efdda941 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog @@ -1,3 +1,9 @@ +2010-12-07 Stefan Monnier + + * modes.texi (Auto-Indentation): New section to document SMIE. + (Major Mode Conventions): + * text.texi (Mode-Specific Indent): Refer to it. + 2010-12-04 Eli Zaretskii * display.texi (Other Display Specs): Document left-fringe and @@ -12,8 +18,8 @@ 2010-11-21 Chong Yidong - * nonascii.texi (Converting Representations): Document - byte-to-string. + * nonascii.texi (Converting Representations): + Document byte-to-string. * strings.texi (Creating Strings): Don't mention semi-obsolete function char-to-string. @@ -44,8 +50,8 @@ * customize.texi (Composite Types): Lower-case index entry. - * loading.texi (How Programs Do Loading): Document - load-file-name. (Bug#7346) + * loading.texi (How Programs Do Loading): + Document load-file-name. (Bug#7346) 2010-11-10 Glenn Morris @@ -152,8 +158,8 @@ * keymaps.texi (Menu Bar): Document :advertised-binding property. - * functions.texi (Obsolete Functions): Document - set-advertised-calling-convention. + * functions.texi (Obsolete Functions): + Document set-advertised-calling-convention. * minibuf.texi (Basic Completion): Document completion-in-region. (Programmed Completion): Document completion-annotate-function. @@ -278,8 +284,8 @@ * minibuf.texi (Basic Completion): 4th arg to all-completions is obsolete. - * processes.texi (Process Buffers): Document - process-kill-buffer-query-function. + * processes.texi (Process Buffers): + Document process-kill-buffer-query-function. 2009-12-05 Glenn Morris @@ -726,8 +732,8 @@ (Suspending Emacs): Mark suspend-emacs as a command. (Processor Run Time): Mark emacs-uptime and emacs-init-time as commands. - (Terminal Output): Remove obsolete function baud-rate. Document - TERMINAL arg for send-string-to-terminal. + (Terminal Output): Remove obsolete function baud-rate. + Document TERMINAL arg for send-string-to-terminal. * nonascii.texi (Terminal I/O Encoding): Document TERMINAL arg for terminal-coding-system and set-terminal-coding-system. @@ -831,8 +837,8 @@ 2009-05-09 Eli Zaretskii * nonascii.texi (Default Coding Systems): Document - find-auto-coding, set-auto-coding, and auto-coding-alist. Add - indexing. + find-auto-coding, set-auto-coding, and auto-coding-alist. + Add indexing. (Lisp and Coding Systems): Add index entries. 2009-05-09 Martin Rudalics @@ -874,8 +880,8 @@ 2009-04-22 Chong Yidong - * os.texi (Command-Line Arguments): Document - command-line-args-left. + * os.texi (Command-Line Arguments): + Document command-line-args-left. (Suspending Emacs): Adapt text to multi-tty case. Document use of terminal objects for tty arguments. (Startup Summary): Add xref to Session Management. @@ -951,13 +957,13 @@ 2009-04-09 Chong Yidong * text.texi (Yank Commands): Note that yank uses push-mark. - (Filling): Clarify REGION argument of fill-paragraph. Document - fill-forward-paragraph-function. + (Filling): Clarify REGION argument of fill-paragraph. + Document fill-forward-paragraph-function. (Special Properties): Remove "new in Emacs 22" declaration. (Clickable Text): Merge with Links and Mouse-1 node. - * display.texi (Button Properties, Button Buffer Commands): Change - xref to Clickable Text. + * display.texi (Button Properties, Button Buffer Commands): + Change xref to Clickable Text. * tips.texi (Key Binding Conventions): Change xref to Clickable Text. @@ -1019,8 +1025,8 @@ 2009-03-29 Chong Yidong - * help.texi (Accessing Documentation, Help Functions): Remove - information about long-obsolete Emacs versions. + * help.texi (Accessing Documentation, Help Functions): + Remove information about long-obsolete Emacs versions. * modes.texi (Mode Line Variables): The default values of the mode line variables are now more complicated. @@ -1063,8 +1069,8 @@ 2009-03-23 Chong Yidong * minibuf.texi (Intro to Minibuffers): Remove long-obsolete info - about minibuffers in old Emacs versions. Copyedits. Emphasize - that enable-recursive-minibuffers defaults to nil. + about minibuffers in old Emacs versions. Copyedits. + Emphasize that enable-recursive-minibuffers defaults to nil. (Text from Minibuffer): Simplify introduction. 2009-03-22 Alan Mackenzie @@ -1118,8 +1124,8 @@ * customize.texi (Common Keywords): It's not necessary to use :tag to remove hyphens, as custom-unlispify-tag-name does it automatically. - (Variable Definitions): Link to File Local Variables. Document - customized-value symbol property. + (Variable Definitions): Link to File Local Variables. + Document customized-value symbol property. (Customization Types): Move menu to end of node. 2009-03-10 Chong Yidong @@ -1230,8 +1236,8 @@ * text.texi (Commands for Insertion): * commands.texi (Event Mod): * keymaps.texi (Searching Keymaps): - * nonascii.texi (Translation of Characters): Reinstate - documentation of translation-table-for-input. + * nonascii.texi (Translation of Characters): + Reinstate documentation of translation-table-for-input. (Explicit Encoding): Document the `charset' text property produced by decode-coding-region and decode-coding-string. @@ -1260,8 +1266,8 @@ 2009-01-22 Chong Yidong * files.texi (Format Conversion Piecemeal): Clarify behavior of - write-region-annotate-functions. Document - write-region-post-annotation-function. + write-region-annotate-functions. + Document write-region-post-annotation-function. 2009-01-19 Chong Yidong @@ -1328,8 +1334,8 @@ * processes.texi (Serial Ports): Improve wording, suggested by RMS. - * nonascii.texi (Lisp and Coding Systems): Document - inhibit-null-byte-detection and inhibit-iso-escape-detection. + * nonascii.texi (Lisp and Coding Systems): + Document inhibit-null-byte-detection and inhibit-iso-escape-detection. (Character Properties): Improve wording. 2009-01-09 Chong Yidong @@ -1337,8 +1343,8 @@ * display.texi (Font Lookup): Remove obsolete function x-font-family-list. x-list-fonts accepts Fontconfig/GTK syntax. (Low-Level Font): Rename from Fonts, move to end of Faces section. - (Font Selection): Reorder order of variable descriptions. Minor - clarifications. + (Font Selection): Reorder order of variable descriptions. + Minor clarifications. * elisp.texi (Top): Update node listing. @@ -1359,8 +1365,8 @@ * elisp.texi: Update node listing. * display.texi (Faces): Put Font Selection node after Auto Faces. - (Face Attributes): Don't link to Font Lookup. Document - font-family-list. + (Face Attributes): Don't link to Font Lookup. + Document font-family-list. (Fonts): New node. 2009-01-08 Jason Rumney @@ -1588,8 +1594,8 @@ * windows.texi (Window Hooks): Remove *-end-trigger-functions vars, which are obsolete. Mention jit-lock-register. - * modes.texi (Other Font Lock Variables): Document - jit-lock-register and jit-lock-unregister. + * modes.texi (Other Font Lock Variables): + Document jit-lock-register and jit-lock-unregister. * frames.texi (Color Parameters): Document alpha parameter. @@ -1661,8 +1667,8 @@ 2008-11-01 Eli Zaretskii * nonascii.texi (Text Representations): Rewrite to make consistent - with Emacs 23 internal representation of characters. Document - `unibyte-string'. + with Emacs 23 internal representation of characters. + Document `unibyte-string'. 2008-10-28 Chong Yidong @@ -1775,8 +1781,8 @@ * processes.texi (Synchronous Processes): Document `process-lines'. - * customize.texi (Variable Definitions): Document - `custom-reevaluate-setting'. + * customize.texi (Variable Definitions): + Document `custom-reevaluate-setting'. 2008-10-18 Martin Rudalics @@ -1792,13 +1798,13 @@ * maps.texi (Standard Keymaps): Document `multi-query-replace-map' and `search-map'. - * searching.texi (Search and Replace): Document - `replace-search-function' and `replace-re-search-function'. + * searching.texi (Search and Replace): + Document `replace-search-function' and `replace-re-search-function'. Document `multi-query-replace-map'. * minibuf.texi (Text from Minibuffer): Document `read-regexp'. - (Completion Commands, Reading File Names): Rename - `minibuffer-local-must-match-filename-map' to + (Completion Commands, Reading File Names): + Rename `minibuffer-local-must-match-filename-map' to `minibuffer-local-filename-must-match-map'. (Minibuffer Completion): The `require-match' argument to `completing-read' can now have the value `confirm-only'. @@ -2143,7 +2149,7 @@ 2007-12-30 Richard Stallman - * commands.texi (Accessing Mouse): Renamed from Accessing Events. + * commands.texi (Accessing Mouse): Rename from Accessing Events. (Accessing Scroll): New node broken out of Accessing Mouse. 2007-12-28 Richard Stallman @@ -2187,8 +2193,8 @@ 2007-11-29 Glenn Morris - * functions.texi (Declaring Functions): Add findex. Mention - `external' files. + * functions.texi (Declaring Functions): Add findex. + Mention `external' files. 2007-11-26 Juanma Barranquero @@ -2315,8 +2321,8 @@ * display.texi (Display Property): Explain some display specs don't let you move point in. - * frames.texi (Cursor Parameters): Describe - cursor-in-non-selected-windows here. Explain more values. + * frames.texi (Cursor Parameters): + Describe cursor-in-non-selected-windows here. Explain more values. * windows.texi (Basic Windows): Don't describe cursor-in-non-selected-windows here. @@ -2395,8 +2401,8 @@ 2007-08-16 Richard Stallman - * processes.texi (Asynchronous Processes): Clarify - doc of start-file-process. + * processes.texi (Asynchronous Processes): + Clarify doc of start-file-process. 2007-08-08 Martin Rudalics @@ -2463,8 +2469,8 @@ 2007-06-27 Richard Stallman - * files.texi (Format Conversion Piecemeal): Clarify - `after-insert-file-functions' calling convention. + * files.texi (Format Conversion Piecemeal): + Clarify `after-insert-file-functions' calling convention. 2007-06-27 Michael Albinus @@ -2519,8 +2525,8 @@ 2007-05-30 Nick Roberts - * commands.texi (Click Events): Layout more logically. Describe - width and height. + * commands.texi (Click Events): Layout more logically. + Describe width and height. (Drag Events, Motion Events): Update to new format for position. 2007-06-02 Richard Stallman @@ -2926,8 +2932,8 @@ 2007-03-05 Richard Stallman - * variables.texi (File Local Variables): Update - enable-local-variables values. + * variables.texi (File Local Variables): + Update enable-local-variables values. 2007-03-04 Richard Stallman @@ -2998,8 +3004,8 @@ 2007-02-03 Eli Zaretskii * elisp.texi (Top): Make the detailed menu headers compliant with - Texinfo guidelines and with what texnfo-upd.el expects. Add - comments to prevent people from inadvertently modifying the key + Texinfo guidelines and with what texnfo-upd.el expects. + Add comments to prevent people from inadvertently modifying the key parts needed by `texinfo-multiple-files-update'. 2007-02-02 Eli Zaretskii @@ -3086,8 +3092,8 @@ 2006-12-24 Richard Stallman - * customize.texi (Variable Definitions): Document - new name custom-add-frequent-value. + * customize.texi (Variable Definitions): + Document new name custom-add-frequent-value. 2006-12-19 Kim F. Storm @@ -3386,8 +3392,8 @@ 2006-09-01 Chong Yidong - * buffers.texi (Buffer Modification): Document - buffer-chars-modified-tick. + * buffers.texi (Buffer Modification): + Document buffer-chars-modified-tick. 2006-08-31 Richard Stallman @@ -3449,7 +3455,7 @@ 2006-08-12 Chong Yidong * text.texi (Near Point): Say "cursor" not "terminal cursor". - (Commands for Insertion): Removed split-line since it's not + (Commands for Insertion): Remove split-line since it's not relevant for Lisp programming. (Yank Commands): Rewrite introduction. (Undo): Clarify. @@ -3480,7 +3486,7 @@ (Major Mode Basics): Mention define-derived-mode explicitly. (Major Mode Conventions): Rebinding RET is OK for some modes. Mention change-major-mode-hook and after-change-major-mode-hook. - (Example Major Modes): Moved to end of Modes section. + (Example Major Modes): Move to end of Modes section. (Mode Line Basics): Clarify. (Mode Line Data): Mention help-echo and local-map in strings. Explain reason for treatment of non-risky variables. @@ -3979,7 +3985,7 @@ 2006-05-25 Chong Yidong - * keymaps.texi (Key Sequences): Renamed from Keymap Terminology. + * keymaps.texi (Key Sequences): Rename from Keymap Terminology. Explain string and vector representations of key sequences. * keymaps.texi (Changing Key Bindings): @@ -4028,8 +4034,8 @@ 2006-05-15 Oliver Scholz (tiny change) - * nonascii.texi (Explicit Encoding): Fix - typo (encoding<->decoding). + * nonascii.texi (Explicit Encoding): + Fix typo (encoding<->decoding). 2006-05-14 Richard Stallman @@ -4079,8 +4085,8 @@ 2006-05-09 Richard Stallman - * variables.texi (File Local Variables): Document - safe-local-eval-forms and safe-local-eval-function. + * variables.texi (File Local Variables): + Document safe-local-eval-forms and safe-local-eval-function. 2006-05-07 Kim F. Storm @@ -4564,8 +4570,8 @@ 2005-12-03 Eli Zaretskii - * hooks.texi (Standard Hooks): Add index entries. Mention - `compilation-finish-functions'. + * hooks.texi (Standard Hooks): Add index entries. + Mention `compilation-finish-functions'. 2005-11-27 Richard M. Stallman @@ -4788,8 +4794,8 @@ buffer-local. (Undo): Note that buffer-undo-list is buffer-local. - * windows.texi (Buffers and Windows): Document - buffer-display-count. + * windows.texi (Buffers and Windows): + Document buffer-display-count. 2005-09-06 Richard M. Stallman @@ -5030,7 +5036,7 @@ * display.texi (Displaying Messages): New node, with most of what was in The Echo Area. - (Progress): Moved under The Echo Area. + (Progress): Move under The Echo Area. (Logging Messages): New node with new text. (Echo Area Customization): New node, the rest of what was in The Echo Area. Document message-truncate-lines with @defvar. @@ -6345,8 +6351,8 @@ (Scroll Bars): Add scroll-bar-mode and scroll-bar-width. (Usual Display): Move tab-width up. - * customize.texi (Variable Definitions): Replace - show-paren-mode example with tooltip-mode. + * customize.texi (Variable Definitions): + Replace show-paren-mode example with tooltip-mode. (Simple Types, Composite Types, Defining New Types): Minor cleanups. @@ -6582,8 +6588,8 @@ (Display Fringe Bitmaps): New node. (Images): Add 'Image Slices' to menu. (Image Descriptors): Add `:pointer' and `:map' properties. - (Showing Images): Add slice arg to `insert-image'. Add - 'insert-sliced-image'. + (Showing Images): Add slice arg to `insert-image'. + Add 'insert-sliced-image'. 2004-09-20 Richard M. Stallman @@ -6596,8 +6602,8 @@ 2004-09-07 Luc Teirlinck - * locals.texi (Standard Buffer-Local Variables): Add - `buffer-auto-save-file-format'. + * locals.texi (Standard Buffer-Local Variables): + Add `buffer-auto-save-file-format'. * internals.texi (Buffer Internals): Describe new auto_save_file_format field of the buffer structure. * files.texi (Format Conversion): `auto-save-file-format' has been @@ -6985,8 +6991,8 @@ 2004-04-05 Jesper Harder - * variables.texi (Variable Aliases): Mention - cyclic-variable-indirection. + * variables.texi (Variable Aliases): + Mention cyclic-variable-indirection. * errors.texi (Standard Errors): Ditto. @@ -7165,7 +7171,7 @@ 2004-02-07 Jan Djärv - * positions.texi (Text Lines): Added missing end defun. + * positions.texi (Text Lines): Add missing end defun. 2004-02-07 Kim F. Storm @@ -7188,12 +7194,12 @@ read-minibuffer. (Minibuffer History): Clarify description of cons values for HISTORY arguments. - (Basic Completion): Various corrections and clarifications. Add - completion-regexp-list. + (Basic Completion): Various corrections and clarifications. + Add completion-regexp-list. (Minibuffer Completion): Correct and clarify description of completing-read. - (Completion Commands): Mention Partial Completion mode. Various - other minor changes. + (Completion Commands): Mention Partial Completion mode. + Various other minor changes. (High-Level Completion): Various corrections and clarifications. (Reading File Names): Ditto. (Minibuffer Misc): Ditto. @@ -7268,8 +7274,8 @@ * functions.texi: Various small changes in addition to the following. - (What Is a Function): `functionp' returns nil for macros. Clarify - behavior of this and following functions for symbol arguments. + (What Is a Function): `functionp' returns nil for macros. + Clarify behavior of this and following functions for symbol arguments. (Function Documentation): Add `\' in front of (fn @var{arglist}) and explain why. (Defining Functions): Mention DOCSTRING argument to `defalias'. @@ -8065,7 +8071,7 @@ 2003-01-31 Joe Buehler - * os.texi (System Environment): Added cygwin system-type. + * os.texi (System Environment): Add cygwin system-type. 2003-01-25 Richard M. Stallman @@ -8098,7 +8104,7 @@ * README: Target for Info file is `make info'. - * files.texi (File Name Components): Fixed typos in + * files.texi (File Name Components): Fix typos in `file-name-sans-extension'. (Magic File Names): Complete list of operations for magic file name handlers. @@ -8114,7 +8120,7 @@ 2002-08-05 Per Abrahamsen - * customize.texi (Splicing into Lists): Fixed example. + * customize.texi (Splicing into Lists): Fix example. Reported by Fabrice Bauzac . 2002-06-17 Juanma Barranquero @@ -8154,8 +8160,8 @@ 2001-11-17 Eli Zaretskii - * permute-index: Don't depend on csh-specific features. Replace - the interpreter name with /bin/sh. + * permute-index: Don't depend on csh-specific features. + Replace the interpreter name with /bin/sh. * two-volume-cross-refs.txt: New file. * two.el: New file. @@ -8293,8 +8299,8 @@ * numbers.texi (Integer Basics): Document CL style read syntax for integers in bases other than 10. - * positions.texi (List Motion): Document - open-paren-in-column-0-is-defun-start. + * positions.texi (List Motion): + Document open-paren-in-column-0-is-defun-start. * lists.texi (Sets And Lists): Document member-ignore-case. @@ -8489,7 +8495,7 @@ 1995-06-19 Richard Stallman * Makefile (VERSION): Update version number. - (maintainer-clean): Renamed from realclean. + (maintainer-clean): Rename from realclean. 1995-06-07 Karl Heuer @@ -8561,11 +8567,11 @@ 1991-11-26 Richard Stallman (rms@mole.gnu.ai.mit.edu) - * Makefile (srcs): Added index.perm. + * Makefile (srcs): Add index.perm. (elisp.dvi): Remove erroneous shell comment. Expect output of permute-index in permuted.fns. Save old elisp.aux in elisp.oaux. - (clean): Added index.texi to be deleted. + (clean): Add index.texi to be deleted. 1990-08-11 Richard Stallman (rms@sugar-bombs.ai.mit.edu) diff --git a/doc/lispref/modes.texi b/doc/lispref/modes.texi index 0ccb4ae04ed..0b6547177e0 100644 --- a/doc/lispref/modes.texi +++ b/doc/lispref/modes.texi @@ -24,10 +24,11 @@ user. For related topics such as keymaps and syntax tables, see * Major Modes:: Defining major modes. * Minor Modes:: Defining minor modes. * Mode Line Format:: Customizing the text that appears in the mode line. -* Imenu:: How a mode can provide a menu +* Imenu:: How a mode can provide a menu of definitions in the buffer. -* Font Lock Mode:: How modes can highlight text according to syntax. -* Desktop Save Mode:: How modes can have buffer state saved between +* Font Lock Mode:: How modes can highlight text according to syntax. +* Auto-Indentation:: How to teach Emacs to indent for a major mode. +* Desktop Save Mode:: How modes can have buffer state saved between Emacs sessions. @end menu @@ -332,7 +333,7 @@ In a major mode for editing some kind of structured text, such as a programming language, indentation of text according to structure is probably useful. So the mode should set @code{indent-line-function} to a suitable function, and probably customize other variables -for indentation. +for indentation. @xref{Auto-Indentation}. @item @cindex keymaps in modes @@ -3214,6 +3215,665 @@ Since this function is called after every buffer change, it should be reasonably fast. @end defvar +@node Auto-Indentation +@section Auto-indention of code + +For programming languages, an important feature of a major mode is to +provide automatic indentation. This is controlled in Emacs by +@code{indent-line-function} (@pxref{Mode-Specific Indent}). +Writing a good indentation function can be difficult and to a large +extent it is still a black art. + +Many major mode authors will start by writing a simple indentation +function that works for simple cases, for example by comparing with the +indentation of the previous text line. For most programming languages +that are not really line-based, this tends to scale very poorly: +improving such a function to let it handle more diverse situations tends +to become more and more difficult, resulting in the end with a large, +complex, unmaintainable indentation function which nobody dares to touch. + +A good indentation function will usually need to actually parse the +text, according to the syntax of the language. Luckily, it is not +necessary to parse the text in as much detail as would be needed +for a compiler, but on the other hand, the parser embedded in the +indentation code will want to be somewhat friendly to syntactically +incorrect code. + +Good maintainable indentation functions usually fall into 2 categories: +either parsing forward from some ``safe'' starting point until the +position of interest, or parsing backward from the position of interest. +Neither of the two is a clearly better choice than the other: parsing +backward is often more difficult than parsing forward because +programming languages are designed to be parsed forward, but for the +purpose of indentation it has the advantage of not needing to +guess a ``safe'' starting point, and it generally enjoys the property +that only a minimum of text will be analyzed to decide the indentation +of a line, so indentation will tend to be unaffected by syntax errors in +some earlier unrelated piece of code. Parsing forward on the other hand +is usually easier and has the advantage of making it possible to +reindent efficiently a whole region at a time, with a single parse. + +Rather than write your own indentation function from scratch, it is +often preferable to try and reuse some existing ones or to rely +on a generic indentation engine. There are sadly few such +engines. The CC-mode indentation code (used with C, C++, Java, Awk +and a few other such modes) has been made more generic over the years, +so if your language seems somewhat similar to one of those languages, +you might try to use that engine. @c FIXME: documentation? +Another one is SMIE which takes an approach in the spirit +of Lisp sexps and adapts it to non-Lisp languages. + +@menu +* SMIE:: A simple minded indentation engine +@end menu + +@node SMIE +@subsection Simple Minded Indentation Engine + +SMIE is a package that provides a generic navigation and indentation +engine. Based on a very simple parser using an ``operator precedence +grammar'', it lets major modes extend the sexp-based navigation of Lisp +to non-Lisp languages as well as provide a simple to use but reliable +auto-indentation. + +Operator precedence grammar is a very primitive technology for parsing +compared to some of the more common techniques used in compilers. +It has the following characteristics: its parsing power is very limited, +and it is largely unable to detect syntax errors, but it has the +advantage of being algorithmically efficient and able to parse forward +just as well as backward. In practice that means that SMIE can use it +for indentation based on backward parsing, that it can provide both +@code{forward-sexp} and @code{backward-sexp} functionality, and that it +will naturally work on syntactically incorrect code without any extra +effort. The downside is that it also means that most programming +languages cannot be parsed correctly using SMIE, at least not without +resorting to some special tricks (@pxref{SMIE Tricks}). + +@menu +* SMIE setup:: SMIE setup and features +* Operator Precedence Grammars:: A very simple parsing technique +* SMIE Grammar:: Defining the grammar of a language +* SMIE Lexer:: Defining tokens +* SMIE Tricks:: Working around the parser's limitations +* SMIE Indentation:: Specifying indentation rules +* SMIE Indentation Helpers:: Helper functions for indentation rules +* SMIE Indentation Example:: Sample indentation rules +@end menu + +@node SMIE setup +@subsubsection SMIE Setup and Features + +SMIE is meant to be a one-stop shop for structural navigation and +various other features which rely on the syntactic structure of code, in +particular automatic indentation. The main entry point is +@code{smie-setup} which is a function typically called while setting +up a major mode. + +@defun smie-setup grammar rules-function &rest keywords +Setup SMIE navigation and indentation. +@var{grammar} is a grammar table generated by @code{smie-prec2->grammar}. +@var{rules-function} is a set of indentation rules for use on +@code{smie-rules-function}. +@var{keywords} are additional arguments, which can include the following +keywords: +@itemize +@item +@code{:forward-token} @var{fun}: Specify the forward lexer to use. +@item +@code{:backward-token} @var{fun}: Specify the backward lexer to use. +@end itemize +@end defun + +Calling this function is sufficient to make commands such as +@code{forward-sexp}, @code{backward-sexp}, and @code{transpose-sexps} be +able to properly handle structural elements other than just the paired +parentheses already handled by syntax tables. For example, if the +provided grammar is precise enough, @code{transpose-sexps} can correctly +transpose the two arguments of a @code{+} operator, taking into account +the precedence rules of the language. + +Calling `smie-setup' is also sufficient to make TAB indentation work in +the expected way, and provides some commands that you can bind in the +major mode keymap. + +@deffn Command smie-close-block +This command closes the most recently opened (and not yet closed) block. +@end deffn + +@deffn Command smie-down-list &optional arg +This command is like @code{down-list} but it also pays attention to +nesting of tokens other than parentheses, such as @code{begin...end}. +@end deffn + +@node Operator Precedence Grammars +@subsubsection Operator Precedence Grammars + +SMIE's precedence grammars simply give to each token a pair of +precedences: the left-precedence and the right-precedence. We say +@code{T1 < T2} if the right-precedence of token @code{T1} is less than +the left-precedence of token @code{T2}. A good way to read this +@code{<} is as a kind of parenthesis: if we find @code{... T1 something +T2 ...} then that should be parsed as @code{... T1 (something T2 ...} +rather than as @code{... T1 something) T2 ...}. The latter +interpretation would be the case if we had @code{T1 > T2}. If we have +@code{T1 = T2}, it means that token T2 follows token T1 in the same +syntactic construction, so typically we have @code{"begin" = "end"}. +Such pairs of precedences are sufficient to express left-associativity +or right-associativity of infix operators, nesting of tokens like +parentheses and many other cases. + +@c ¡Let's leave this undocumented to leave it more open for change! +@c @defvar smie-grammar +@c The value of this variable is an alist specifying the left and right +@c precedence of each token. It is meant to be initialized by using one of +@c the functions below. +@c @end defvar + +@defun smie-prec2->grammar table +This function takes a @emph{prec2} grammar @var{table} and returns an +alist suitable for use in @code{smie-setup}. The @emph{prec2} +@var{table} is itself meant to be built by one of the functions below. +@end defun + +@defun smie-merge-prec2s &rest tables +This function takes several @emph{prec2} @var{tables} and merges them +into a new @emph{prec2} table. +@end defun + +@defun smie-precs->prec2 precs +This function builds a @emph{prec2} table from a table of precedences +@var{precs}. @var{precs} should be a list, sorted by precedence (for +example @code{"+"} will come before @code{"*"}), of elements of the form +@code{(@var{assoc} @var{op} ...)}, where each @var{op} is a token that +acts as an operator; @var{assoc} is their associativity, which can be +either @code{left}, @code{right}, @code{assoc}, or @code{nonassoc}. +All operators in a given element share the same precedence level +and associativity. +@end defun + +@defun smie-bnf->prec2 bnf &rest resolvers +This function lets you specify the grammar using a BNF notation. +It accepts a @var{bnf} description of the grammar along with a set of +conflict resolution rules @var{resolvers}, and +returns a @emph{prec2} table. + +@var{bnf} is a list of nonterminal definitions of the form +@code{(@var{nonterm} @var{rhs1} @var{rhs2} ...)} where each @var{rhs} +is a (non-empty) list of terminals (aka tokens) or non-terminals. + +Not all grammars are accepted: +@itemize +@item +An @var{rhs} cannot be an empty list (an empty list is never needed, +since SMIE allows all non-terminals to match the empty string anyway). +@item +An @var{rhs} cannot have 2 consecutive non-terminals: each pair of +non-terminals needs to be separated by a terminal (aka token). +This is a fundamental limitation of operator precedence grammars. +@end itemize + +Additionally, conflicts can occur: +@itemize +@item +The returned @emph{prec2} table holds constraints between pairs of tokens, and +for any given pair only one constraint can be present: T1 < T2, +T1 = T2, or T1 > T2. +@item +A token can be an @code{opener} (something similar to an open-paren), +a @code{closer} (like a close-paren), or @code{neither} of the two +(e.g. an infix operator, or an inner token like @code{"else"}). +@end itemize + +Precedence conflicts can be resolved via @var{resolvers}, which +is a list of @emph{precs} tables (see @code{smie-precs->prec2}): for +each precedence conflict, if those @code{precs} tables +specify a particular constraint, then the conflict is resolved by using +this constraint instead, else a conflict is reported and one of the +conflicting constraints is picked arbitrarily and the others are +simply ignored. +@end defun + +@node SMIE Grammar +@subsubsection Defining the Grammar of a Language + +The usual way to define the SMIE grammar of a language is by +defining a new global variable that holds the precedence table by +giving a set of BNF rules. +For example, the grammar definition for a small Pascal-like language +could look like: +@example +@group +(require 'smie) +(defvar sample-smie-grammar + (smie-prec2->grammar + (smie-bnf->prec2 +@end group +@group + '((id) + (inst ("begin" insts "end") + ("if" exp "then" inst "else" inst) + (id ":=" exp) + (exp)) + (insts (insts ";" insts) (inst)) + (exp (exp "+" exp) + (exp "*" exp) + ("(" exps ")")) + (exps (exps "," exps) (exp))) +@end group +@group + '((assoc ";")) + '((assoc ",")) + '((assoc "+") (assoc "*"))))) +@end group +@end example + +@noindent +A few things to note: + +@itemize +@item +The above grammar does not explicitly mention the syntax of function +calls: SMIE will automatically allow any sequence of sexps, such as +identifiers, balanced parentheses, or @code{begin ... end} blocks +to appear anywhere anyway. +@item +The grammar category @code{id} has no right hand side: this does not +mean that it can match only the empty string, since as mentioned any +sequence of sexps can appear anywhere anyway. +@item +Because non terminals cannot appear consecutively in the BNF grammar, it +is difficult to correctly handle tokens that act as terminators, so the +above grammar treats @code{";"} as a statement @emph{separator} instead, +which SMIE can handle very well. +@item +Separators used in sequences (such as @code{","} and @code{";"} above) +are best defined with BNF rules such as @code{(foo (foo "separator" foo) ...)} +which generate precedence conflicts which are then resolved by giving +them an explicit @code{(assoc "separator")}. +@item +The @code{("(" exps ")")} rule was not needed to pair up parens, since +SMIE will pair up any characters that are marked as having paren syntax +in the syntax table. What this rule does instead (together with the +definition of @code{exps}) is to make it clear that @code{","} should +not appear outside of parentheses. +@item +Rather than have a single @emph{precs} table to resolve conflicts, it is +preferable to have several tables, so as to let the BNF part of the +grammar specify relative precedences where possible. +@item +Unless there is a very good reason to prefer @code{left} or +@code{right}, it is usually preferable to mark operators as associative, +using @code{assoc}. For that reason @code{"+"} and @code{"*"} are +defined above as @code{assoc}, although the language defines them +formally as left associative. +@end itemize + +@node SMIE Lexer +@subsubsection Defining Tokens + +SMIE comes with a predefined lexical analyzer which uses syntax tables +in the following way: any sequence of characters that have word or +symbol syntax is considered a token, and so is any sequence of +characters that have punctuation syntax. This default lexer is +often a good starting point but is rarely actually correct for any given +language. For example, it will consider @code{"2,+3"} to be composed +of 3 tokens: @code{"2"}, @code{",+"}, and @code{"3"}. + +To describe the lexing rules of your language to SMIE, you need +2 functions, one to fetch the next token, and another to fetch the +previous token. Those functions will usually first skip whitespace and +comments and then look at the next chunk of text to see if it +is a special token. If so it should skip the token and +return a description of this token. Usually this is simply the string +extracted from the buffer, but it can be anything you want. +For example: +@example +@group +(defvar sample-keywords-regexp + (regexp-opt '("+" "*" "," ";" ">" ">=" "<" "<=" ":=" "="))) +@end group +@group +(defun sample-smie-forward-token () + (forward-comment (point-max)) + (cond + ((looking-at sample-keywords-regexp) + (goto-char (match-end 0)) + (match-string-no-properties 0)) + (t (buffer-substring-no-properties + (point) + (progn (skip-syntax-forward "w_") + (point)))))) +@end group +@group +(defun sample-smie-backward-token () + (forward-comment (- (point))) + (cond + ((looking-back sample-keywords-regexp (- (point) 2) t) + (goto-char (match-beginning 0)) + (match-string-no-properties 0)) + (t (buffer-substring-no-properties + (point) + (progn (skip-syntax-backward "w_") + (point)))))) +@end group +@end example + +Notice how those lexers return the empty string when in front of +parentheses. This is because SMIE automatically takes care of the +parentheses defined in the syntax table. More specifically if the lexer +returns nil or an empty string, SMIE tries to handle the corresponding +text as a sexp according to syntax tables. + +@node SMIE Tricks +@subsubsection Living With a Weak Parser + +The parsing technique used by SMIE does not allow tokens to behave +differently in different contexts. For most programming languages, this +manifests itself by precedence conflicts when converting the +BNF grammar. + +Sometimes, those conflicts can be worked around by expressing the +grammar slightly differently. For example, for Modula-2 it might seem +natural to have a BNF grammar that looks like this: + +@example + ... + (inst ("IF" exp "THEN" insts "ELSE" insts "END") + ("CASE" exp "OF" cases "END") + ...) + (cases (cases "|" cases) (caselabel ":" insts) ("ELSE" insts)) + ... +@end example + +But this will create conflicts for @code{"ELSE"}: on the one hand, the +IF rule implies (among many other things) that @code{"ELSE" = "END"}; +but on the other hand, since @code{"ELSE"} appears within @code{cases}, +which appears left of @code{"END"}, we also have @code{"ELSE" > "END"}. +We can solve the conflict either by using: +@example + ... + (inst ("IF" exp "THEN" insts "ELSE" insts "END") + ("CASE" exp "OF" cases "END") + ("CASE" exp "OF" cases "ELSE" insts "END") + ...) + (cases (cases "|" cases) (caselabel ":" insts)) + ... +@end example +or +@example + ... + (inst ("IF" exp "THEN" else "END") + ("CASE" exp "OF" cases "END") + ...) + (else (insts "ELSE" insts)) + (cases (cases "|" cases) (caselabel ":" insts) (else)) + ... +@end example + +Reworking the grammar to try and solve conflicts has its downsides, tho, +because SMIE assumes that the grammar reflects the logical structure of +the code, so it is preferable to keep the BNF closer to the intended +abstract syntax tree. + +Other times, after careful consideration you may conclude that those +conflicts are not serious and simply resolve them via the +@var{resolvers} argument of @code{smie-bnf->prec2}. Usually this is +because the grammar is simply ambiguous: the conflict does not affect +the set of programs described by the grammar, but only the way those +programs are parsed. This is typically the case for separators and +associative infix operators, where you want to add a resolver like +@code{'((assoc "|"))}. Another case where this can happen is for the +classic @emph{dangling else} problem, where you will use @code{'((assoc +"else" "then"))}. It can also happen for cases where the conflict is +real and cannot really be resolved, but it is unlikely to pose a problem +in practice. + +Finally, in many cases some conflicts will remain despite all efforts to +restructure the grammar. Do not despair: while the parser cannot be +made more clever, you can make the lexer as smart as you want. So, the +solution is then to look at the tokens involved in the conflict and to +split one of those tokens into 2 (or more) different tokens. E.g. if +the grammar needs to distinguish between two incompatible uses of the +token @code{"begin"}, make the lexer return different tokens (say +@code{"begin-fun"} and @code{"begin-plain"}) depending on which kind of +@code{"begin"} it finds. This pushes the work of distinguishing the +different cases to the lexer, which will thus have to look at the +surrounding text to find ad-hoc clues. + +@node SMIE Indentation +@subsubsection Specifying Indentation Rules + +Based on the provided grammar, SMIE will be able to provide automatic +indentation without any extra effort. But in practice, this default +indentation style will probably not be good enough. You will want to +tweak it in many different cases. + +SMIE indentation is based on the idea that indentation rules should be +as local as possible. To this end, it relies on the idea of +@emph{virtual} indentation, which is the indentation that a particular +program point would have if it were at the beginning of a line. +Of course, if that program point is indeed at the beginning of a line, +its virtual indentation is its current indentation. But if not, then +SMIE uses the indentation algorithm to compute the virtual indentation +of that point. Now in practice, the virtual indentation of a program +point does not have to be identical to the indentation it would have if +we inserted a newline before it. To see how this works, the SMIE rule +for indentation after a @code{@{} in C does not care whether the +@code{@{} is standing on a line of its own or is at the end of the +preceding line. Instead, these different cases are handled in the +indentation rule that decides how to indent before a @code{@{}. + +Another important concept is the notion of @emph{parent}: The +@emph{parent} of a token, is the head token of the nearest enclosing +syntactic construct. For example, the parent of an @code{else} is the +@code{if} to which it belongs, and the parent of an @code{if}, in turn, +is the lead token of the surrounding construct. The command +@code{backward-sexp} jumps from a token to its parent, but there are +some caveats: for @emph{openers} (tokens which start a construct, like +@code{if}), you need to start with point before the token, while for +others you need to start with point after the token. +@code{backward-sexp} stops with point before the parent token if that is +the @emph{opener} of the token of interest, and otherwise it stops with +point after the parent token. + +SMIE indentation rules are specified using a function that takes two +arguments @var{method} and @var{arg} where the meaning of @var{arg} and the +expected return value depend on @var{method}. + +@var{method} can be: +@itemize +@item +@code{:after}, in which case @var{arg} is a token and the function +should return the @var{offset} to use for indentation after @var{arg}. +@item +@code{:before}, in which case @var{arg} is a token and the function +should return the @var{offset} to use to indent @var{arg} itself. +@item +@code{:elem}, in which case the function should return either the offset +to use to indent function arguments (if @var{arg} is the symbol +@code{arg}) or the basic indentation step (if @var{arg} is the symbol +@code{basic}). +@item +@code{:list-intro}, in which case @var{arg} is a token and the function +should return non-@code{nil} if the token is followed by a list of +expressions (not separated by any token) rather than an expression. +@end itemize + +When @var{arg} is a token, the function is called with point just before +that token. A return value of nil always means to fallback on the +default behavior, so the function should return nil for arguments it +does not expect. + +@var{offset} can be: +@itemize +@item +@code{nil}: use the default indentation rule. +@item +@code{(column . @var{column})}: indent to column @var{column}. +@item +@var{number}: offset by @var{number}, relative to a base token which is +the current token for @code{:after} and its parent for @code{:before}. +@end itemize + +@node SMIE Indentation Helpers +@subsubsection Helper Functions for Indentation Rules + +SMIE provides various functions designed specifically for use in the +indentation rules function (several of those functions break if used in +another context). These functions all start with the prefix +@code{smie-rule-}. + +@defun smie-rule-bolp +Return non-@code{nil} if the current token is the first on the line. +@end defun + +@defun smie-rule-hanging-p +Return non-@code{nil} if the current token is @emph{hanging}. +A token is @emph{hanging} if it is the last token on the line +and if it is preceded by other tokens: a lone token on a line is not +hanging. +@end defun + +@defun smie-rule-next-p &rest tokens +Return non-@code{nil} if the next token is among @var{tokens}. +@end defun + +@defun smie-rule-prev-p &rest tokens +Return non-@code{nil} if the previous token is among @var{tokens}. +@end defun + +@defun smie-rule-parent-p &rest parents +Return non-@code{nil} if the current token's parent is among @var{parents}. +@end defun + +@defun smie-rule-sibling-p +Return non-nil if the current token's parent is actually a sibling. +This is the case for example when the parent of a @code{","} is just the +previous @code{","}. +@end defun + +@defun smie-rule-parent &optional offset +Return the proper offset to align the current token with the parent. +If non-@code{nil}, @var{offset} should be an integer giving an +additional offset to apply. +@end defun + +@defun smie-rule-separator method +Indent current token as a @emph{separator}. + +By @emph{separator}, we mean here a token whose sole purpose is to +separate various elements within some enclosing syntactic construct, and +which does not have any semantic significance in itself (i.e. it would +typically not exist as a node in an abstract syntax tree). + +Such a token is expected to have an associative syntax and be closely +tied to its syntactic parent. Typical examples are @code{","} in lists +of arguments (enclosed inside parentheses), or @code{";"} in sequences +of instructions (enclosed in a @code{@{...@}} or @code{begin...end} +block). + +@var{method} should be the method name that was passed to +`smie-rules-function'. +@end defun + +@node SMIE Indentation Example +@subsubsection Sample Indentation Rules + +Here is an example of an indentation function: + +@example +(eval-when-compile (require 'cl)) ;For the `case' macro. +(defun sample-smie-rules (kind token) + (case kind + (:elem (case token + (basic sample-indent-basic))) + (:after + (cond + ((equal token ",") (smie-rule-separator kind)) + ((equal token ":=") sample-indent-basic))) + (:before + (cond + ((equal token ",") (smie-rule-separator kind)) + ((member token '("begin" "(" "@{")) + (if (smie-rule-hanging-p) (smie-rule-parent))) + ((equal token "if") + (and (not (smie-rule-bolp)) (smie-rule-prev-p "else") + (smie-rule-parent))))))) +@end example + +@noindent +A few things to note: + +@itemize +@item +The first case indicates the basic indentation increment to use. +If @code{sample-indent-basic} is nil, then SMIE uses the global +setting @code{smie-indent-basic}. The major mode could have set +@code{smie-indent-basic} buffer-locally instead, but that +is discouraged. + +@item +The two (identical) rules for the token @code{","} make SMIE try to be +more clever when the comma separator is placed at the beginning of +lines. It tries to outdent the separator so as to align the code after +the comma; for example: + +@example +x = longfunctionname ( + arg1 + , arg2 + ); +@end example + +@item +The rule for indentation after @code{":="} exists because otherwise +SMIE would treat @code{":="} as an infix operator and would align the +right argument with the left one. + +@item +The rule for indentation before @code{"begin"} is an example of the use +of virtual indentation: This rule is used only when @code{"begin"} is +hanging, which can happen only when @code{"begin"} is not at the +beginning of a line. So this is not used when indenting +@code{"begin"} itself but only when indenting something relative to this +@code{"begin"}. Concretely, this rule changes the indentation from: + +@example + if x > 0 then begin + dosomething(x); + end +@end example +to +@example + if x > 0 then begin + dosomething(x); + end +@end example + +@item +The rule for indentation before @code{"if"} is similar to the one for +@code{"begin"}, but where the purpose is to treat @code{"else if"} +as a single unit, so as to align a sequence of tests rather than indent +each test further to the right. This function does this only in the +case where the @code{"if"} is not placed on a separate line, hence the +@code{smie-rule-bolp} test. + +If we know that the @code{"else"} is always aligned with its @code{"if"} +and is always at the beginning of a line, we can use a more efficient +rule: +@example +((equal token "if") + (and (not (smie-rule-bolp)) (smie-rule-prev-p "else") + (save-excursion + (sample-smie-backward-token) ;Jump before the "else". + (cons 'column (current-column))))) +@end example + +The advantage of this formulation is that it reuses the indentation of +the previous @code{"else"}, rather than going all the way back to the +first @code{"if"} of the sequence. +@end itemize + @node Desktop Save Mode @section Desktop Save Mode @cindex desktop save mode diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi index 4da94dacd71..3b08b472b06 100644 --- a/doc/lispref/text.texi +++ b/doc/lispref/text.texi @@ -2209,7 +2209,7 @@ various commands) to indent the current line. The command In Lisp mode, the value is the symbol @code{lisp-indent-line}; in C mode, @code{c-indent-line}; in Fortran mode, @code{fortran-indent-line}. -The default value is @code{indent-relative}. +The default value is @code{indent-relative}. @xref{Auto-Indentation}. @end defvar @deffn Command indent-according-to-mode -- 2.39.2