From 197f994384cb37ae4ae7a771815bbe565d4ae242 Mon Sep 17 00:00:00 2001 From: Eli Zaretskii Date: Sun, 29 Jan 2023 15:22:20 +0200 Subject: [PATCH] Document tree-sitter features in the user manual * lisp/progmodes/c-ts-mode.el (c-ts-mode-map): Bind "C-c .", for consistency with CC mode. * lisp/treesit.el (treesit-font-lock-level): Doc fix. * doc/emacs/programs.texi (C Indent, Custom C Indent): Document the indentation features of 'c-ts-mode'. (Moving by Defuns): Document 'treesit-defun-tactic'. * doc/emacs/files.texi (Visiting): Document 'treesit-max-buffer-size'. * doc/emacs/display.texi (Traditional Font Lock) (Parser-based Font Lock): New subsections. * doc/emacs/emacs.texi (Top): Update top-level menu. --- doc/emacs/display.texi | 131 ++++++++++++++++++++++++++++++------ doc/emacs/emacs.texi | 4 ++ doc/emacs/files.texi | 11 +++ doc/emacs/programs.texi | 42 +++++++++--- lisp/progmodes/c-ts-mode.el | 3 +- lisp/treesit.el | 15 +++-- 6 files changed, 170 insertions(+), 36 deletions(-) diff --git a/doc/emacs/display.texi b/doc/emacs/display.texi index f77ab569483..97732b65e32 100644 --- a/doc/emacs/display.texi +++ b/doc/emacs/display.texi @@ -1024,17 +1024,65 @@ customize-group @key{RET} font-lock-faces @key{RET}}. You can then use that customization buffer to customize the appearance of these faces. @xref{Face Customization}. +@cindex just-in-time (JIT) font-lock +@cindex background syntax highlighting + Fontifying very large buffers can take a long time. To avoid large +delays when a file is visited, Emacs initially fontifies only the +visible portion of a buffer. As you scroll through the buffer, each +portion that becomes visible is fontified as soon as it is displayed; +this type of Font Lock is called @dfn{Just-In-Time} (or @dfn{JIT}) +Lock. You can control how JIT Lock behaves, including telling it to +perform fontification while idle, by customizing variables in the +customization group @samp{jit-lock}. @xref{Specific Customization}. + + The information that major modes use for determining which parts of +buffer text to fontify and what faces to use can be based on several +different ways of analyzing the text: + +@itemize @bullet +@item +Search for keywords and other textual patterns based on regular +expressions (@pxref{Regexp Search,, Regular Expression Search}). + +@item +Find syntactically distinct parts of text based on built-in syntax +tables (@pxref{Syntax Tables,,, elisp, The Emacs Lisp Reference +Manual}). + +@item +Use syntax tree produced by a full-blown parser, via a special-purpose +library, such as the tree-sitter library (@pxref{Parsing Program +Source,,, elisp, The Emacs Lisp Reference Manual}), or an external +program. +@end itemize + +@menu +* Traditional Font Lock:: Font Lock based on regexps and syntax tables. +* Parser-based Font Lock:: Font Lock based on external parser. +@end menu + +@node Traditional Font Lock +@subsection Traditional Font Lock +@cindex traditional font-lock + + ``Traditional'' methods of providing font-lock information are based +on regular-expression search and on syntactic analysis using syntax +tables built into Emacs. This subsection describes the use and +customization of font-lock for major modes which use these traditional +methods. + @vindex font-lock-maximum-decoration - You can customize the variable @code{font-lock-maximum-decoration} -to alter the amount of fontification applied by Font Lock mode, for -major modes that support this feature. The value should be a number -(with 1 representing a minimal amount of fontification; some modes -support levels as high as 3); or @code{t}, meaning ``as high as -possible'' (the default). To be effective for a given file buffer, -the customization of @code{font-lock-maximum-decoration} should be -done @emph{before} the file is visited; if you already have the file -visited in a buffer when you customize this variable, kill the buffer -and visit the file again after the customization. + You can control the amount of fontification applied by Font Lock +mode by customizing the variable @code{font-lock-maximum-decoration}, +for major modes that support this feature. The value of this variable +should be a number (with 1 representing a minimal amount of +fontification; some modes support levels as high as 3); or @code{t}, +meaning ``as high as possible'' (the default). To be effective for a +given file buffer, the customization of +@code{font-lock-maximum-decoration} should be done @emph{before} the +file is visited; if you already have the file visited in a buffer when +you customize this variable, kill the buffer and visit the file again +after the customization. You can also specify different numbers for particular major modes; for example, to use level 1 for C/C++ modes, and the default level @@ -1082,16 +1130,59 @@ keywords by customizing the @code{font-lock-ignore} option, @pxref{Customizing Keywords,,, elisp, The Emacs Lisp Reference Manual}. -@cindex just-in-time (JIT) font-lock -@cindex background syntax highlighting - Fontifying large buffers can take a long time. To avoid large -delays when a file is visited, Emacs initially fontifies only the -visible portion of a buffer. As you scroll through the buffer, each -portion that becomes visible is fontified as soon as it is displayed; -this type of Font Lock is called @dfn{Just-In-Time} (or @dfn{JIT}) -Lock. You can control how JIT Lock behaves, including telling it to -perform fontification while idle, by customizing variables in the -customization group @samp{jit-lock}. @xref{Specific Customization}. +@node Parser-based Font Lock +@subsection Parser-based Font Lock +@cindex font-lock via tree-sitter +@cindex parser-based font-lock + If your Emacs was built with the tree-sitter library, it can use the +results of parsing the buffer text by that library for the purposes of +fontification. This is usually faster and more accurate than the +``traditional'' methods described in the previous subsection, since +the tree-sitter library provides full-blown parsers for programming +languages and other kinds of formatted text which it supports. Major +modes which utilize the tree-sitter library are named +@code{@var{foo}-ts-mode}, with the @samp{-ts-} part indicating the use +of the library. This subsection documents the Font Lock support based +on the tree-sitter library. + +@vindex treesit-font-lock-level + You can control the amount of fontification applied by Font Lock +mode of major modes based on tree-sitter by customizing the variable +@code{treesit-font-lock-level}. Its value is a number between 1 and +4: + +@table @asis +@item Level 1 +This level usually fontifies only comments and function names in +function definitions. +@item Level 2 +This level adds fontification of keywords, strings, and data types. +@item Level 3 +This is the default level; it adds fontification of assignments, +numbers, properties, etc. +@item Level 4 +This level adds everything else that can be fontified: operators, +delimiters, brackets, other punctuation, function names in function +calls, variables, etc. +@end table + +@vindex treesit-font-lock-feature-list +@noindent +What exactly constitutes each of the syntactical categories mentioned +above depends on the major mode and the parser grammar used by +tree-sitter for the major-mode's language. However, in general the +categories follow the conventions of the programming language or the +file format supported by the major mode. The buffer-local value of +the variable @code{treesit-font-lock-feature-list} holds the +fontification features supported by a tree-sitter based major mode, +where each sub-list shows the features provided by the corresponding +fontification level. + + Once you change the value of @code{treesit-font-lock-level} via +@w{@kbd{M-x customize-variable}} (@pxref{Specific Customization}), it +takes effect immediately in all the existing buffers and for files you +visit in the future in the same session. + @node Highlight Interactively @section Interactive Highlighting diff --git a/doc/emacs/emacs.texi b/doc/emacs/emacs.texi index b6d149eb3ef..7071ea44edd 100644 --- a/doc/emacs/emacs.texi +++ b/doc/emacs/emacs.texi @@ -383,6 +383,10 @@ Controlling the Display * Visual Line Mode:: Word wrap and screen line-based editing. * Display Custom:: Information on variables for customizing display. +Font Lock +* Traditional Font Lock:: Font Lock based on regexps and syntax tables. +* Parser-based Font Lock:: Font Lock based on external parser. + Searching and Replacement * Incremental Search:: Search happens as you type the string. diff --git a/doc/emacs/files.texi b/doc/emacs/files.texi index 6d666831612..c0e702da947 100644 --- a/doc/emacs/files.texi +++ b/doc/emacs/files.texi @@ -215,6 +215,17 @@ by the integers that Emacs can represent (@pxref{Buffers}). If you try, Emacs displays an error message saying that the maximum buffer size has been exceeded. +@vindex treesit-max-buffer-size + If you try to visit a file whose major mode (@pxref{Major Modes}) +uses the tree-sitter parsing library, Emacs will display a warning if +the file's size in bytes is larger than the value of the variable +@code{treesit-max-buffer-size}. The default value is 40 megabytes for +64-bit Emacs and 15 megabytes for 32-bit Emacs. This avoids the +danger of having Emacs run out of memory by preventing the activation +of major modes based on tree-sitter in such large buffers, because a +typical tree-sitter parser needs about 10 times as much memory as the +text it parses. + @cindex wildcard characters in file names @vindex find-file-wildcards If the file name you specify contains shell-style wildcard diff --git a/doc/emacs/programs.texi b/doc/emacs/programs.texi index 4aac150934b..e9268ff2a0d 100644 --- a/doc/emacs/programs.texi +++ b/doc/emacs/programs.texi @@ -254,6 +254,17 @@ they do their standard jobs in a way better fitting a particular language. Other major modes may replace any or all of these key bindings for that purpose. +@cindex nested defuns +@vindex treesit-defun-tactic + Some programming languages supported @dfn{nested defuns}, whereby a +defun (such as a function or a method or a class) can be defined +inside (i.e., as part of the body) of another defun. The commands +described above by default find the beginning and the end of the +@emph{innermost} defun around point. Major modes based on the +tree-sitter library provide control of this behavior: if the variable +@code{treesit-defun-tactic} is set to the value @code{top-level}, the +defun commands will find the @emph{outermost} defuns instead. + @node Imenu @subsection Imenu @cindex index of buffer definitions @@ -520,15 +531,19 @@ then indent it like this: @item C-c C-q @kindex C-c C-q @r{(C mode)} @findex c-indent-defun +@findex c-ts-mode-indent-defun Reindent the current top-level function definition or aggregate type -declaration (@code{c-indent-defun}). +declaration (@code{c-indent-defun} in CC mode, +@code{c-ts-mode-indent-defun} in @code{c-ts-mode} based on tree-sitter). @item C-M-q @kindex C-M-q @r{(C mode)} @findex c-indent-exp -Reindent each line in the balanced expression that follows point -(@code{c-indent-exp}). A prefix argument inhibits warning messages -about invalid syntax. +@findex prog-indent-sexp +Reindent each line in the balanced expression that follows point. In +CC mode, this invokes @code{c-indent-exp}; in tree-sitter based +@code{c-ts-mode} this invokes a more general @code{prog-indent-sexp}. +A prefix argument inhibits warning messages about invalid syntax. @item @key{TAB} @findex c-indent-line-or-region @@ -568,7 +583,8 @@ onto the indentation of the @dfn{anchor statement}. @table @kbd @item C-c . @var{style} @key{RET} -Select a predefined style @var{style} (@code{c-set-style}). +Select a predefined style @var{style} (@code{c-set-style} in CC mode, +@code{c-ts-mode-set-style} in @code{c-ts-mode} based on tree-sitter). @end table A @dfn{style} is a named collection of customizations that can be @@ -584,6 +600,7 @@ typing @kbd{C-M-q} at the start of a function definition. @kindex C-c . @r{(C mode)} @findex c-set-style +@findex c-ts-mode-set-style To choose a style for the current buffer, use the command @w{@kbd{C-c .}}. Specify a style name as an argument (case is not significant). This command affects the current buffer only, and it affects only @@ -592,11 +609,11 @@ the code already in the buffer. To reindent the whole buffer in the new style, you can type @kbd{C-x h C-M-\}. @vindex c-default-style - You can also set the variable @code{c-default-style} to specify the -default style for various major modes. Its value should be either the -style's name (a string) or an alist, in which each element specifies -one major mode and which indentation style to use for it. For -example, + When using CC mode, you can also set the variable +@code{c-default-style} to specify the default style for various major +modes. Its value should be either the style's name (a string) or an +alist, in which each element specifies one major mode and which +indentation style to use for it. For example, @example (setq c-default-style @@ -613,6 +630,11 @@ one of the C-like major modes; thus, if you specify a new default style for Java mode, you can make it take effect in an existing Java mode buffer by typing @kbd{M-x java-mode} there. +@vindex c-ts-mode-indent-style + When using the tree-sitter based @code{c-ts-mode}, you can set the +default indentation style by customizing the variable +@code{c-ts-mode-indent-style}. + The @code{gnu} style specifies the formatting recommended by the GNU Project for C; it is the default, so as to encourage use of our recommended style. diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el index b2f92b93193..612c41bf073 100644 --- a/lisp/progmodes/c-ts-mode.el +++ b/lisp/progmodes/c-ts-mode.el @@ -700,7 +700,8 @@ the semicolon. This function skips the semicolon." (defvar-keymap c-ts-mode-map :doc "Keymap for the C language with tree-sitter" :parent prog-mode-map - "C-c C-q" #'c-ts-mode-indent-defun) + "C-c C-q" #'c-ts-mode-indent-defun + "C-c ." #'c-ts-mode-set-style) ;;;###autoload (define-derived-mode c-ts-base-mode prog-mode "C" diff --git a/lisp/treesit.el b/lisp/treesit.el index 5fb6a2eef6e..92833fb007c 100644 --- a/lisp/treesit.el +++ b/lisp/treesit.el @@ -580,16 +580,21 @@ from 1 which is the absolute minimum, to 4 that yields the maximum fontifications. Level 1 usually contains only comments and definitions. -Level 2 usually adds keywords, strings, constants, types, etc. -Level 3 usually represents a full-blown fontification, including -assignment, constants, numbers, properties, etc. +Level 2 usually adds keywords, strings, data types, etc. +Level 3 usually represents full-blown fontifications, including +assignments, constants, numbers and literals, properties, etc. Level 4 adds everything else that can be fontified: delimiters, -operators, brackets, all functions and variables, etc. +operators, brackets, punctuation, all functions and variables, etc. In addition to the decoration level, individual features can be turned on/off by calling `treesit-font-lock-recompute-features'. Changing the decoration level requires calling -`treesit-font-lock-recompute-features' to have an effect." +`treesit-font-lock-recompute-features' to have an effect, unless +done via `customize-variable'. + +To see which syntactical categories are fontified by each level +in a particular major mode, examine the buffer-local value of the +variable `treesit-font-lock-feature-list'." :type 'integer :set #'treesit--font-lock-level-setter :version "29.1") -- 2.39.2