From: Eli Zaretskii Date: Sat, 22 Oct 2022 15:48:42 +0000 (+0300) Subject: Clean up tree-sitter sections of the ELisp manual X-Git-Tag: emacs-29.0.90~1805 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=6f28810f6ba112059c09bc3bfbb2ef8e5c3f15ee;p=emacs.git Clean up tree-sitter sections of the ELisp manual * doc/lispref/parsing.texi (Parsing Program Source): * doc/lispref/modes.texi (Font Lock Mode) (Parser-based Font Lock): Fix wording, punctuation, and markup. Add index entries. * lisp/treesit.el (treesit-node-at, treesit-language-at): Rename argument POINT to POS. --- diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi index 09e7aad714e..e79c8cef180 100644 --- a/doc/lispref/elisp.texi +++ b/doc/lispref/elisp.texi @@ -938,6 +938,7 @@ Font Lock Mode * Syntactic Font Lock:: Fontification based on syntax tables. * Multiline Font Lock:: How to coerce Font Lock into properly highlighting multiline constructs. +* Parser-based Font Lock:: Use parse data for fontification. Multiline Font Lock Constructs @@ -948,6 +949,7 @@ Multiline Font Lock Constructs Automatic Indentation of code * SMIE:: A simple minded indentation engine. +* Parser-based Indentation:: Parser-based indentation engine. Simple Minded Indentation Engine @@ -1365,9 +1367,10 @@ Parsing Program Source * Language Definitions:: Loading tree-sitter language definitions. * Using Parser:: Introduction to parsers. * Retrieving Node:: Retrieving node from syntax tree. -* Accessing Node:: Accessing node information. +* Accessing Node Information:: Accessing node information. * Pattern Matching:: Pattern matching with query patterns. * Multiple Languages:: Parse text written in multiple languages. +* Tree-sitter major modes:: Develop major modes using tree-sitter. * Tree-sitter C API:: Compare the C API and the ELisp API. Syntax Descriptors diff --git a/doc/lispref/modes.texi b/doc/lispref/modes.texi index ed232556b28..24892077d1a 100644 --- a/doc/lispref/modes.texi +++ b/doc/lispref/modes.texi @@ -2852,12 +2852,13 @@ in which contexts. This section explains how to customize Font Lock for a particular major mode. Font Lock mode finds text to highlight in three ways: through -syntactic parsing based on the syntax table, through searching -(usually for regular expressions), and through parsing based on a -full-blown parser. Syntactic fontification happens first; it finds -comments and string constants and highlights them. Search-based -fontification happens second. Parser-based fontification can be -optionally enabled and it will precede the other two fontifications. +parsing based on a full-blown parser (usually, via an external library +or program), through syntactic parsing based on the Emacs's built-in +syntax table, or through searching (usually for regular expressions). +If enabled, parser-based fontification happens first +(@pxref{Parser-based Font Lock}). Syntactic fontification happens +next; it finds comments and string constants and highlights them. +Search-based fontification happens last. @menu * Font Lock Basics:: Overview of customizing Font Lock. @@ -2872,7 +2873,7 @@ optionally enabled and it will precede the other two fontifications. * Syntactic Font Lock:: Fontification based on syntax tables. * Multiline Font Lock:: How to coerce Font Lock into properly highlighting multiline constructs. -* Parser-based Font Lock:: Use a parser for fontification. +* Parser-based Font Lock:: Use parse data for fontification. @end menu @node Font Lock Basics @@ -3878,34 +3879,40 @@ reasonably fast. @node Parser-based Font Lock @subsection Parser-based Font Lock +@cindex parser-based font-lock -@c This node is written when the only parser Emacs has is tree-sitter, -@c if in the future more parser are supported, feel free to reorganize -@c and rewrite this node to describe multiple parsers in parallel. +@c This node is written when the only parser Emacs has is tree-sitter; +@c if in the future more parser are supported, this should be +@c reorganized and rewritten to describe multiple parsers in parallel. Besides simple syntactic font lock and regexp-based font lock, Emacs -also provides complete syntactic font lock with the help of a parser, -currently provided by the tree-sitter library (@pxref{Parsing Program -Source}). +also provides complete syntactic font lock with the help of a parser. +Currently, Emacs uses the tree-sitter library (@pxref{Parsing Program +Source}) for this purpose. @defun treesit-font-lock-enable This function enables parser-based font lock in the current buffer. @end defun -Parser-based font lock and other font lock mechanism are not mutually +Parser-based font lock and other font lock mechanisms are not mutually exclusive. By default, if enabled, parser-based font lock runs first, -then the simple syntactic font lock (if enabled), then regexp-based +then the syntactic font lock (if enabled), then the regexp-based font lock. Although parser-based font lock doesn't share the same customization -variables with regexp-based font lock, parser-based font lock uses -similar customization schemes. The tree-sitter counterpart of -@var{font-lock-keywords} is @var{treesit-font-lock-settings}. - +variables with regexp-based font lock, it uses similar customization +schemes. The tree-sitter counterpart of @var{font-lock-keywords} is +@var{treesit-font-lock-settings}. + +@c FIXME: The ``query'' part here and thereafter comes ``out of the +@c blue''. There should be some text here explaining what those +@c ``queries'' are and how are they related to fontifications, or a +@c cross-reference to another place with such an explanation. @defun treesit-font-lock-rules :keyword value query... This function is used to set @var{treesit-font-lock-settings}. It -takes care of compiling queries and other post-processing and outputs -a value that @var{treesit-font-lock-settings} accepts. An example: +takes care of compiling queries and other post-processing, and outputs +a value that @var{treesit-font-lock-settings} accepts. Here's an +example: @example @group @@ -3922,8 +3929,8 @@ a value that @var{treesit-font-lock-settings} accepts. An example: @end example This function takes a list of text or s-exp queries. Before each -query, there are @var{:keyword} and @var{value} pairs that configure -that query. The @code{:lang} keyword sets the query’s language and +query, there are @var{:keyword}-@var{value} pairs that configure +that query. The @code{:lang} keyword sets the query's language and every query must specify the language. The @code{:feature} keyword sets the feature name of the query. Users can control which features are enabled with @code{font-lock-maximum-decoration} and @@ -3941,34 +3948,37 @@ Other keywords are optional: @item @tab @code{keep} @tab Fill-in regions without an existing face @end multitable +@c FIXME: The ``capture names'' part should be expl,ained before it is +@c first used: what it is and how it's related to fontifications. Capture names in @var{query} should be face names like @code{font-lock-keyword-face}. The captured node will be fontified with that face. Capture names can also be function names, in which -case the function is called with (@var{start} @var{end} @var{node}), -where @var{start} and @var{end} are the start and end position of the -node in buffer, and @var{node} is the node itself. If a capture name -is both a face and a function, the face takes priority. If a capture -name is not a face name nor a function name, it is ignored. +case the function is called with 3 arguments: @var{start}, @var{end}, +and @var{node}, where @var{start} and @var{end} are the start and end +position of the node in buffer, and @var{node} is the node itself. If +a capture name is both a face and a function, the face takes priority. +If a capture name is neither a face nor a function, it is ignored. @end defun @defvar treesit-font-lock-feature-list -This is a list of lists of feature symbols. - -Each sublist represents a decoration level. +This is a list of lists of feature symbols. Each element of the list +is a list that represents a decoration level. @code{font-lock-maximum-decoration} controls which levels are activated. -Inside each sublist are feature symbols, which corresponds to the +@c FIXME: This should be rewritten using our style: ``each element of +@c the list is a list of the form (FOO BAR BAZ), where FOO...'' etc. +Inside each sublist are feature symbols, which correspond to the @code{:feature} value of a query defined in @code{treesit-font-lock-rules}. Removing a feature symbol from this list disables the corresponding query during font-lock. -Common feature names (for general programming language) include -function-name, type, variable-name (LHS of assignments), builtin, -constant, keyword, string-interpolation, comment, doc, string, -operator, preprocessor, escape-sequence, key (in key-value -pairs). Major modes are free to subdivide or extend on these -common features. +Common feature names, for many programming languages, include +function-name, type, variable-name (left-hand-side or @acronym{LHS} of +assignments), builtin, constant, keyword, string-interpolation, +comment, doc, string, operator, preprocessor, escape-sequence, and key +(in key-value pairs). Major modes are free to subdivide or extend +these common features. For example, the value of this variable could be: @example @@ -3982,16 +3992,20 @@ For example, the value of this variable could be: Major modes should set this variable before calling @code{treesit-font-lock-enable}. +@c FIXME: ``for further changes''? This should clarify when this +@c function has to be called. @findex treesit-font-lock-recompute-features -In addition, for further changes to this variable to take effect, run +In addition, for further changes to this variable to take effect, call @code{treesit-font-lock-recompute-features}. @end defvar @defvar treesit-font-lock-settings -A list of @var{setting}s for tree-sitter font lock. The exact format +A list of settings for tree-sitter based font lock. The exact format of this variable is considered internal. One should always use @code{treesit-font-lock-rules} to set this variable. +@c FIXME: If the format is considered ``internal'', why do we need to +@c describe it here? Each @var{setting} is of form @example @@ -4001,15 +4015,17 @@ Each @var{setting} is of form @var{query} must be a compiled query (@pxref{Pattern Matching}). For @var{setting} to be activated for font-lock, @var{enable} must be -t. To disable this @var{setting}, set @var{enable} to nil. +@code{t}. To disable this @var{setting}, set @var{enable} to +@code{nil}. @var{feature} is the ``feature name'' of the query, users can control which features are enabled with @code{font-lock-maximum-decoration} and @code{treesit-font-lock-feature-list}. @var{override} is the override flag for this query. Its value can be -t, nil, append, prepend, keep. See more in -@code{treesit-font-lock-rules}. +@code{t}, @code{nil}, @code{append}, @code{prepend}, or @code{keep}. +@c FIXME: See where? +See more in @code{treesit-font-lock-rules}. @end defvar Multi-language major modes should provide range functions in @@ -4077,7 +4093,7 @@ to rely on a full-blown parser, for example, the tree-sitter library. @menu * SMIE:: A simple minded indentation engine. -* Parser-based indentation:: Parser-based indentation engine. +* Parser-based Indentation:: Parser-based indentation engine. @end menu @node SMIE @@ -4739,108 +4755,100 @@ to the file's local variables of the form: @node Parser-based Indentation @subsection Parser-based Indentation +@cindex parser-based indentation -@c This node is written when the only parser Emacs has is tree-sitter, -@c if in the future more parser are supported, feel free to reorganize -@c and rewrite this node to describe multiple parsers in parallel. +@c This node is written when the only parser Emacs has is tree-sitter; +@c if in the future more parsers are supported, this should be +@c reorganized and rewritten to describe multiple parsers in parallel. When built with the tree-sitter library (@pxref{Parsing Program -Source}), Emacs could parse program source and produce a syntax tree. -And this syntax tree can be used for indentation. For maximum -flexibility, we could write a custom indent function that queries the -syntax tree and indents accordingly for each language, but that would -be a lot of work. It is more convenient to use the simple indentation -engine described below: we only need to write some indentation rules +Source}), Emacs is capable of parsing the program source and producing +a syntax tree. This syntax tree can be used for guiding the program +source indentation commands. For maximum flexibility, it is possible +to write a custom indentation function that queries the syntax tree +and indents accordingly for each language, but that is a lot of work. +It is more convenient to use the simple indentation engine described +below: then the major mode needs only to write some indentation rules and the engine takes care of the rest. -To enable the indentation engine, set the value of +To enable the parser-based indentation engine, set the value of @code{indent-line-function} to @code{treesit-indent}. @defvar treesit-indent-function This variable stores the actual function called by @code{treesit-indent}. By default, its value is -@code{treesit-simple-indent}. In the future we might add other +@code{treesit-simple-indent}. In the future we might add other, more complex indentation engines. @end defvar @heading Writing indentation rules +@cindex indentation rules, for parser-based indentation @defvar treesit-simple-indent-rules -This local variable stores indentation rules for every language. It is -a list of - -@example -(@var{language} . @var{rules}) -@end example - -where @var{language} is a language symbol, and @var{rules} is a list -of - -@example -(@var{matcher} @var{anchor} @var{offset}) -@end example - -First Emacs passes the node at point to @var{matcher}, if it return -non-nil, this rule applies. Then Emacs passes the node to -@var{anchor}, it returns a point. Emacs takes the column number of -that point, add @var{offset} to it, and the result is the indent for -the current line. +This local variable stores indentation rules for every language. It is +a list of the form: @w{@code{(@var{language} . @var{rules})}}, where +@var{language} is a language symbol, and @var{rules} is a list of the +form @w{@code{(@var{matcher} @var{anchor} @var{offset})}}. + +@c FIXME: ``node''? +First, Emacs passes the node at point to @var{matcher}; if it returns +non-@code{nil}, this rule is applicable. Then Emacs passes the node +to @var{anchor}, which returns a buffer position. Emacs takes the +column number of that position, adds @var{offset} to it, and the +result is the indentation column for the current line. The @var{matcher} and @var{anchor} are functions, and Emacs provides -convenient presets for them. You can skip over to -@code{treesit-simple-indent-presets} below, those presets should be -more than enough. - -A @var{matcher} or an @var{anchor} is a function that takes three -arguments (@var{node} @var{parent} @var{bol}). Argument @var{bol} is -the point at where we are indenting: the position of the first -non-whitespace character from the beginning of line; @var{node} is the -largest (highest-in-tree) node that starts at that point; @var{parent} -is the parent of @var{node}. A @var{matcher} returns nil/non-nil, and -@var{anchor} returns a point. +convenient defaults for them. + +@c FIXME: Clarify the following description. In particular, how to +@c find/compute ``the largest node'' and its ``parent''? +Each @var{matcher} or @var{anchor} is a function that takes three +arguments: @var{node}, @var{parent}, and @var{bol}. The argument +@var{bol} is the buffer position whose indentation is required: the +position of the first non-whitespace character after the beginning of +the line. The argument @var{node} is the largest (highest-in-tree) +node that starts at that position; and @var{parent} is the parent of +@var{node}. @var{matcher} should return non-@code{nil} if the rule is +applicable, and @var{anchor} should return a buffer position that is +the basis of the indentation. @end defvar @defvar treesit-simple-indent-presets -This is a list of presets for @var{matcher}s and @var{anchor}s in -@code{treesit-simple-indent-rules}. Each of them represent a function -that takes @var{node}, @var{parent} and @var{bol} as arguments. - -@example -no-node -@end example - -This matcher matches the case where @var{node} is nil, i.e., there is -no node that starts at @var{bol}. This is the case when @var{bol} is -at an empty line or inside a multi-line string, etc. - -@example -(parent-is @var{type}) -@end example - -This matcher matches if @var{parent}'s type is @var{type}. - -@example -(node-is @var{type}) -@end example - -This matcher matches if @var{node}'s type is @var{type}. - -@example -(query @var{query}) -@end example +This is a list of defaults for @var{matcher}s and @var{anchor}s in +@code{treesit-simple-indent-rules}. Each of them represents a function +that takes 3 arguments: @var{node}, @var{parent} and @var{bol}. The +available default functions are: -This matcher matches if querying @var{parent} with @var{query} -captures @var{node}. The capture name does not matter. - -@example -(match @var{node-type} @var{parent-type} - @var{node-field} @var{node-index-min} @var{node-index-max}) -@end example - -This matcher checks if @var{node}'s type is @var{node-type}, +@ftable @code +@item no-node +This matcher is a symbol that matches the case where @var{node} is +@code{nil}, i.e., there is no node that starts at @var{bol}. This is +the case when @var{bol} is on an empty line or inside a multi-line +string, etc. + +@item parent-is +This matcher is a function of one argument, @var{type}; it matches if +the type of the parent node is @var{type}. + +@item node-is +This matcher is a function of one argument, @var{type}; it matches if +the node's type is @var{type}. + +@c FIXME: The description of this matcher is unclear. What is +@c ``parent'' and what does it mean ``captures NODE''? +@item query +This matcher is a function of one argument, @var{query}; it matches if +querying @var{parent} with @var{query} captures @var{node}. The +capture name does not matter. @c Why is this bit important? + +@item match +This matcher is a function of 5 arguments: @var{node-type}, +@var{parent-type}, @var{node-field}, @var{node-index-min}, and +@var{node-index-max}). It matches if @var{node}'s type is @var{node-type}, @var{parent}'s type is @var{parent-type}, @var{node}'s field name in @var{parent} is @var{node-field}, and @var{node}'s index among its siblings is between @var{node-index-min} and @var{node-index-max}. If +@c FIXME: ``constraint''? the value of a constraint is nil, this matcher doesn't check for that constraint. For example, to match the first child where parent is @code{argument_list}, use @@ -4849,60 +4857,48 @@ constraint. For example, to match the first child where parent is (match nil "argument_list" nil nil 0 0) @end example -@example -first-sibling -@end example - +@c FIXME: ``PARENT''? is that an argument of the anchor function +@item first-sibling This anchor returns the start of the first child of @var{parent}. -@example -parent -@end example - -This anchor returns the start of @var{parent}. - -@example -parent-bol -@end example +@item parent +This anchor returns the start of @var{parent}. @c FIXME: Likewise. -This anchor returns the beginning of non-space characters on the line -where @var{parent} is on. - -@example -prev-sibling -@end example +@item parent-bol +This anchor returns the first non-space character on the line of +@var{parent}. +@c FIXME: ``NODE''? +@item prev-sibling This anchor returns the start of the previous sibling of @var{node}. -@example -no-indent -@end example - -This anchor returns the start of @var{node}, i.e., no indent. - -@example -prev-line -@end example +@item no-indent +This anchor returns the start of @var{node}, i.e., no indent. @c ??? +@item prev-line This anchor returns the first non-whitespace charater on the previous line. +@end ftable + @end defvar @heading Indentation utilities +@cindex utility functions for parser-based indentation -Here are some utility functions that can help writing indentation -rules. +Here are some utility functions that can help writing parser-based +indentation rules. @defun treesit-check-indent mode -This function checks current buffer's indentation against major mode -@var{mode}. It indents the current buffer in @var{mode} and compares -the indentation with the current indentation. Then it pops up a diff -buffer showing the difference. Correct indentation (target) is in -green, current indentation is in red. +This function checks the current buffer's indentation against major +mode @var{mode}. It indents the current buffer according to +@var{mode} and compares the results with the current indentation. +Then it pops up a buffer showing the differences. Correct +indentation (target) is shown in green color, current indentation is +shown in red color. @c Are colors customizable? faces? @end defun -It is also helpful to use @code{treesit-inspect-mode} when writing -indentation rules. +It is also helpful to use @code{treesit-inspect-mode} (@pxref{Language +Definitions}) when writing indentation rules. @node Desktop Save Mode @section Desktop Save Mode diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi index ae3724dd4a8..9079e0f7817 100644 --- a/doc/lispref/parsing.texi +++ b/doc/lispref/parsing.texi @@ -5,40 +5,44 @@ @node Parsing Program Source @chapter Parsing Program Source +@cindex syntax tree, from parsing program source Emacs provides various ways to parse program source text and produce a -@dfn{syntax tree}. In a syntax tree, text is no longer a -one-dimensional stream but a structured tree of nodes, where each node -representing a piece of text. Thus a syntax tree can enable -interesting features like precise fontification, indentation, +@dfn{syntax tree}. In a syntax tree, text is no longer considered a +one-dimensional stream of characters, but a structured tree of nodes, +where each node representing a piece of text. Thus, a syntax tree can +enable interesting features like precise fontification, indentation, navigation, structured editing, etc. Emacs has a simple facility for parsing balanced expressions -(@pxref{Parsing Expressions}). There is also SMIE library for generic -navigation and indentation (@pxref{SMIE}). +(@pxref{Parsing Expressions}). There is also the SMIE library for +generic navigation and indentation (@pxref{SMIE}). -Emacs also provides integration with tree-sitter library -(@uref{https://tree-sitter.github.io/tree-sitter}) if compiled with -it. The tree-sitter library implements an incremental parser and has -support from a wide range of programming languages. +In addition to those, Emacs also provides integration with +@uref{https://tree-sitter.github.io/tree-sitter, the tree-sitter +library}) if support for it was compiled in. The tree-sitter library +implements an incremental parser and has support from a wide range of +programming languages. @defun treesit-available-p This function returns non-nil if tree-sitter features are available -for this Emacs instance. +for the current Emacs session. @end defun -To access the syntax tree of the text in a buffer, we need to first -load a language definition and create a parser with it. Next, we can -query the parser for specific nodes in the syntax tree. Then, we can -access various information about the node, and we can pattern-match a -node with a powerful syntax. Finally, we explain how to work with -source files that mixes multiple languages. The following sections -explain how to do each of the tasks in detail. +To be able to parse the program source using the tree-sitter library +and access the syntax tree of the program, a Lisp program needs to +load a language definition library, and create a parser for that +language and the current buffer. After that, the Lisp program can +query the parser about specific nodes of the syntax tree. Then, it +can access various kinds of information about each node, and search +for nodes using a powerful pattern-matching syntax. This chapter +explains how to do all this, and also how a Lisp program can work with +source files that mix multiple programming languages. @menu * Language Definitions:: Loading tree-sitter language definitions. * Using Parser:: Introduction to parsers. * Retrieving Node:: Retrieving node from syntax tree. -* Accessing Node:: Accessing node information. +* Accessing Node Information:: Accessing node information. * Pattern Matching:: Pattern matching with query patterns. * Multiple Languages:: Parse text written in multiple languages. * Tree-sitter major modes:: Develop major modes using tree-sitter. @@ -47,14 +51,17 @@ explain how to do each of the tasks in detail. @node Language Definitions @section Tree-sitter Language Definitions +@cindex language definitions, for tree-sitter @heading Loading a language definition +@cindex loading language definition for tree-sitter +@cindex language argument, for tree-sitter Tree-sitter relies on language definitions to parse text in that -language. In Emacs, A language definition is represented by a symbol. -For example, C language definition is represented as @code{c}, and -@code{c} can be passed to tree-sitter functions as the @var{language} -argument. +language. In Emacs, a language definition is represented by a symbol. +For example, C language definition is represented as the symbol +@code{c}, and @code{c} can be passed to tree-sitter functions as the +@var{language} argument. @vindex treesit-extra-load-path @vindex treesit-load-language-error @@ -62,63 +69,92 @@ argument. Tree-sitter language definitions are distributed as dynamic libraries. In order to use a language definition in Emacs, you need to make sure that the dynamic library is installed on the system. Emacs looks for -language definitions under load paths in -@code{treesit-extra-load-path}, -@code{user-emacs-directory}/tree-sitter, and system default locations -for dynamic libraries, in that order. Emacs tries each extensions in -@code{treesit-load-suffixes}. If Emacs cannot find the library or has -problem loading it, Emacs signals @code{treesit-load-language-error}. - -The signal data could be @code{(not-found @var{error-msg} ...)} if -Emacs cannot find the language definition, or @code{(symbol-error -@var{error-msg})} if the Emacs cannot find the correct symbol in the -language definition, or @code{(version_mismatch @var{error-msg})} if -the language definition's version does match that of the tree-sitter -library. - -@defun treesit-language-available-p language -This function returns non-nil if @var{language} exists and is -loadable. +language definitions in several places, in the following order: + +@itemize @bullet +@item +first, in the list of directories specified by the variable +@code{treesit-extra-load-path}; +@item +then, in the @file{tree-sitter} subdirectory of the directory +specified by @code{user-emacs-directory} (@pxref{Init File}); +@item +and finally, in the system's default locations for dynamic libraries. +@end itemize + +In each of these directories, Emacs looks for a file with file-name +extensions specified by the variable @code{treesit-load-suffixes}. + +If Emacs cannot find the library or has problems loading it, Emacs +signals the @code{treesit-load-language-error} error. The data of +that signal could be one of the following: + +@table @code +@item (not-found @var{error-msg} @dots{}) +This means Emacs could not find the language definition library. +@item (symbol-error @var{error-msg}) +This means Emacs could not find in the library the expected function +that every language definition library should export. +@item (version_mismatch @var{error-msg}) +This means the version of language definition library is incompatible +with that of the tree-sitter library. +@end table + +@noindent +In all of these cases, @var{error-msg} might provide additional +details about the failure. + +@defun treesit-language-available-p language &optional detail +This function returns non-nil if the language definitions for +@var{language} exist and can be loaded. If @var{detail} is non-nil, return @code{(t . nil)} when -@var{language} is available, @code{(nil . DATA)} when unavailable. -@var{data} is the signal data of @code{treesit-load-language-error}. +@var{language} is available, and @code{(nil . @var{date})} when it's +unavailable. @var{data} is the signal data of +@code{treesit-load-language-error}. @end defun @vindex treesit-load-name-override-list -By convention, the dynamic library for @var{language} is -@code{libtree-sitter-@var{language}.@var{ext}}, where @var{ext} is the -system-specific extension for dynamic libraries. Also by convention, +By convention, the file name of the dynamic library for @var{language} is +@file{libtree-sitter-@var{language}.@var{ext}}, where @var{ext} is the +system-specific extension for dynamic libraries. Also by convention, the function provided by that library is named -@code{tree_sitter_@var{language}}. If a language definition doesn't -follow this convention, you should add an entry +@code{tree_sitter_@var{language}}. If a language definition library +doesn't follow this convention, you should add an entry @example (@var{language} @var{library-base-name} @var{function-name}) @end example -to @code{treesit-load-name-override-list}, where -@var{library-base-name} is the base filename for the dynamic library -(conventionally @code{libtree-sitter-@var{language}}), and +to the list in the variable @code{treesit-load-name-override-list}, where +@var{library-base-name} is the basename of the dynamic library's file name, +(usually, @file{libtree-sitter-@var{language}}), and @var{function-name} is the function provided by the library -(conventionally @code{tree_sitter_@var{language}}). For example, +(usually, @code{tree_sitter_@var{language}}). For example, @example (cool-lang "libtree-sitter-coool" "tree_sitter_cooool") @end example -for a language too cool to abide by conventions. +@noindent +for a language that considers itself too ``cool'' to abide by +conventions. +@cindex language-definition version, compatibility @defun treesit-language-version &optional min-compatible -Tree-sitter library has a @dfn{language version}, a language -definition's version needs to match this version to be compatible. - -This function returns tree-sitter library’s language version. If -@var{min-compatible} is non-nil, it returns the minimal compatible -version. +This function returns the version of the language-definition +Application Binary Interface (@acronym{ABI}) supported by the +tree-sitter library. By default, it returns the latest ABI version +supported by the library, but if @var{min-compatible} is +non-@code{nil}, it returns the oldest ABI version which the library +still can support. Language definition libraries must be built for +ABI versions between the oldest and the latest versions supported by +the tree-sitter library, otherwise the library will be unable to load +them. @end defun @heading Concrete syntax tree +@cindex syntax tree, concrete A syntax tree is what a parser generates. In a syntax tree, each node represents a piece of text, and is connected to each other by a @@ -147,33 +183,37 @@ its syntax tree could be @end group @end example -We can also represent it in s-expression: +We can also represent it as an s-expression: @example (root (expression (number) (operator) (number))) @end example @subheading Node types +@cindex node types, in a syntax tree -@cindex tree-sitter node type +@cindex type of node, tree-sitter @anchor{tree-sitter node type} -@cindex tree-sitter named node +@cindex named node, tree-sitter @anchor{tree-sitter named node} -@cindex tree-sitter anonymous node -Names like @code{root}, @code{expression}, @code{number}, -@code{operator} are nodes' @dfn{type}. However, not all nodes in a -syntax tree have a type. Nodes that don't are @dfn{anonymous nodes}, -and nodes with a type are @dfn{named nodes}. Anonymous nodes are -tokens with fixed spellings, including punctuation characters like -bracket @samp{]}, and keywords like @code{return}. +@cindex anonymous node, tree-sitter +Names like @code{root}, @code{expression}, @code{number}, and +@code{operator} specify the @dfn{type} of the nodes. However, not all +nodes in a syntax tree have a type. Nodes that don't have a type are +known as @dfn{anonymous nodes}, and nodes with a type are @dfn{named +nodes}. Anonymous nodes are tokens with fixed spellings, including +punctuation characters like bracket @samp{]}, and keywords like +@code{return}. @subheading Field names +@cindex field name, tree-sitter @cindex tree-sitter node field name -@anchor{tree-sitter node field name} To make the syntax tree easier to -analyze, many language definitions assign @dfn{field names} to child -nodes. For example, a @code{function_definition} node could have a -@code{declarator} and a @code{body}: +@anchor{tree-sitter node field name} +To make the syntax tree easier to analyze, many language definitions +assign @dfn{field names} to child nodes. For example, a +@code{function_definition} node could have a @code{declarator} and a +@code{body}: @example @group @@ -184,39 +224,40 @@ nodes. For example, a @code{function_definition} node could have a @end example @deffn Command treesit-inspect-mode -This minor mode displays the node that @emph{starts} at point in -mode-line. The mode-line will display +This minor mode displays on the mode-line the node that @emph{starts} +at point. The mode-line will display @example -@var{parent} @var{field-name}: (@var{child} (@var{grand-child} (...))) +@var{parent} @var{field}: (@var{child} (@var{grandchild} (@dots{}))) @end example -@var{child}, @var{grand-child}, and @var{grand-grand-child}, etc, are -nodes that have their beginning at point. And @var{parent} is the -parent of @var{child}. +@var{child}, @var{grand}, @var{grand-grandchild}, etc., are nodes that +begin at point. @var{parent} is the parent node of @var{child}. If there is no node that starts at point, i.e., point is in the middle of a node, then the mode-line only displays the smallest node that -spans point, and its immediate parent. +spans the position of point, and its immediate parent. This minor mode doesn't create parsers on its own. It simply uses the first parser in @code{(treesit-parser-list)} (@pxref{Using Parser}). @end deffn @heading Reading the grammar definition +@cindex reading grammar definition, tree-sitter Authors of language definitions define the @dfn{grammar} of a -language, and this grammar determines how does a parser construct a -concrete syntax tree out of the text. In order to use the syntax -tree effectively, we need to read the @dfn{grammar file}. +programming language, which determines how a parser constructs a +concrete syntax tree out of the program text. In order to use the +syntax tree effectively, you need to consult the @dfn{grammar file}. -The grammar file is usually @code{grammar.js} in a language -definition’s project repository. The link to a language definition’s -home page can be found in tree-sitter’s homepage -(@uref{https://tree-sitter.github.io/tree-sitter}). +The grammar file is usually @file{grammar.js} in a language +definition's project repository. The link to a language definition's +home page can be found on +@uref{https://tree-sitter.github.io/tree-sitter, the tree-sitter's +homepage}. -The grammar is written in JavaScript syntax. For example, the rule -matching a @code{function_definition} node looks like +The grammar definition is written in JavaScript. For example, the +rule matching a @code{function_definition} node looks like @example @group @@ -228,12 +269,13 @@ function_definition: $ => seq( @end group @end example -The rule is represented by a function that takes a single argument +@noindent +The rules are represented by functions that take a single argument @var{$}, representing the whole grammar. The function itself is -constructed by other functions: the @code{seq} function puts together a -sequence of children; the @code{field} function annotates a child with -a field name. If we write the above definition in BNF syntax, it -would look like +constructed by other functions: the @code{seq} function puts together +a sequence of children; the @code{field} function annotates a child +with a field name. If we write the above definition in the so-called +@dfn{Backus-Naur Form} (@acronym{BNF}) syntax, it would look like @example @group @@ -254,88 +296,74 @@ and the node returned by the parser would look like @end group @end example -Below is a list of functions that one will see in a grammar -definition. Each function takes other rules as arguments and returns -a new rule. - -@itemize @bullet -@item -@code{seq(rule1, rule2, ...)} matches each rule one after another. - -@item -@code{choice(rule1, rule2, ...)} matches one of the rules in its -arguments. +Below is a list of functions that one can see in a grammar definition. +Each function takes other rules as arguments and returns a new rule. -@item -@code{repeat(rule)} matches @var{rule} for @emph{zero or more} times. +@table @code +@item seq(@var{rule1}, @var{rule2}, @dots{}) +matches each rule one after another. +@item choice(@var{rule1}, @var{rule2}, @dots{}) +matches one of the rules in its arguments. +@item repeat(@var{rule}) +matches @var{rule} for @emph{zero or more} times. This is like the @samp{*} operator in regular expressions. - -@item -@code{repeat1(rule)} matches @var{rule} for @emph{one or more} times. +@item repeat1(@var{rule}) +matches @var{rule} for @emph{one or more} times. This is like the @samp{+} operator in regular expressions. - -@item -@code{optional(rule)} matches @var{rule} for @emph{zero or one} time. +@item optional(@var{rule}) +matches @var{rule} for @emph{zero or one} time. This is like the @samp{?} operator in regular expressions. - -@item -@code{field(name, rule)} assigns field name @var{name} to the child -node matched by @var{rule}. - -@item -@code{alias(rule, alias)} makes nodes matched by @var{rule} appear as -@var{alias} in the syntax tree generated by the parser. For example, +@item field(@var{name}, @var{rule}) +assigns field name @var{name} to the child node matched by @var{rule}. +@item alias(@var{rule}, @var{alias}) +makes nodes matched by @var{rule} appear as @var{alias} in the syntax +tree generated by the parser. For example, @example alias(preprocessor_call_exp, call_expression) @end example +@noindent makes any node matched by @code{preprocessor_call_exp} to appear as @code{call_expression}. -@end itemize +@end table -Below are grammar functions less interesting for a reader of a +Below are grammar functions of lesser importance for reading a language definition. -@itemize -@item -@code{token(rule)} marks @var{rule} to produce a single leaf node. -That is, instead of generating a parent node with individual child -nodes under it, everything is combined into a single leaf node. - -@item -Normally, grammar rules ignore preceding whitespaces, -@code{token.immediate(rule)} changes @var{rule} to match only when -there is no preceding whitespaces. - -@item -@code{prec(n, rule)} gives @var{rule} a level @var{n} precedence. - -@item -@code{prec.left([n,] rule)} marks @var{rule} as left-associative, -optionally with level @var{n}. - -@item -@code{prec.right([n,] rule)} marks @var{rule} as right-associative, -optionally with level @var{n}. - -@item -@code{prec.dynamic(n, rule)} is like @code{prec}, but the precedence -is applied at runtime instead. -@end itemize - -The tree-sitter project talks about writing a grammar in more detail: -@uref{https://tree-sitter.github.io/tree-sitter/creating-parsers}. -Read especially ``The Grammar DSL'' section. +@table @code +@item token(@var{rule}) +marks @var{rule} to produce a single leaf node. That is, instead of +generating a parent node with individual child nodes under it, +everything is combined into a single leaf node. +@item token.immediate(@var{rule}) +Normally, grammar rules ignore preceding whitespace; this +changes @var{rule} to match only when there is no preceding +whitespaces. +@item prec(@var{n}, @var{rule}) +gives @var{rule} the level-@var{n} precedence. +@item prec.left([@var{n},] @var{rule}) +marks @var{rule} as left-associative, optionally with level @var{n}. +@item prec.right([@var{n},] @var{rule}) +marks @var{rule} as right-associative, optionally with level @var{n}. +@item prec.dynamic(@var{n}, @var{rule}) +this is like @code{prec}, but the precedence is applied at runtime +instead. +@end table + +The documentation of the tree-sitter project has +@uref{https://tree-sitter.github.io/tree-sitter/creating-parsers, more +about writing a grammar}. Read especially ``The Grammar DSL'' +section. @node Using Parser @section Using Tree-sitter Parser -@cindex Tree-sitter parser +@cindex tree-sitter parser, using This section described how to create and configure a tree-sitter parser. In Emacs, each tree-sitter parser is associated with a -buffer. As we edit the buffer, the associated parser and the syntax -tree is automatically kept up-to-date. +buffer. As the user edits the buffer, the associated parser and the +syntax tree are automatically kept up-to-date. @defvar treesit-max-buffer-size This variable contains the maximum size of buffers in which @@ -349,44 +377,45 @@ activating tree-sitter features. It basically checks @code{treesit-available-p} and @code{treesit-max-buffer-size}. @end defun -@cindex Creating tree-sitter parsers +@cindex creating tree-sitter parsers +@cindex tree-sitter parser, creating @defun treesit-parser-create language &optional buffer no-reuse -To create a parser, we provide a @var{buffer} and the @var{language} -to use (@pxref{Language Definitions}). If @var{buffer} is nil, the -current buffer is used. +Create a parser for the specified @var{buffer} and @var{language} +(@pxref{Language Definitions}). If @var{buffer} is omitted or +@code{nil}, it stands for the current buffer. By default, this function reuses a parser if one already exists for -@var{language} in @var{buffer}, if @var{no-reuse} is non-nil, this -function always creates a new parser. +@var{language} in @var{buffer}, but if @var{no-reuse} is +non-@code{nil}, this function always creates a new parser. @end defun -Given a parser, we can query information about it: +Given a parser, we can query information about it. @defun treesit-parser-buffer parser -Returns the buffer associated with @var{parser}. +This function returns the buffer associated with @var{parser}. @end defun @defun treesit-parser-language parser -Returns the language that @var{parser} uses. +This function returns the language used by @var{parser}. @end defun @defun treesit-parser-p object -Checks if @var{object} is a tree-sitter parser. Return non-nil if it -is, return nil otherwise. +This function checks if @var{object} is a tree-sitter parser, and +returns non-@code{nil} if it is, @code{nil} otherwise. @end defun There is no need to explicitly parse a buffer, because parsing is done -automatically and lazily. A parser only parses when we query for a -node in its syntax tree. Therefore, when a parser is first created, -it doesn't parse the buffer; it waits until we query for a node for -the first time. Similarly, when some change is made in the buffer, a -parser doesn't re-parse immediately. +automatically and lazily. A parser only parses when the mode queris +for a node in its syntax tree. Therefore, when a parser is first +created, it doesn't parse the buffer; it waits until the mode queries +for a node for the first time. Similarly, when some change is made in +the buffer, a parser doesn't re-parse immediately. @vindex treesit-buffer-too-large -When a parser do parse, it checks for the size of the buffer. +When a parser does parse, it checks for the size of the buffer. Tree-sitter can only handle buffer no larger than about 4GB. If the -size exceeds that, Emacs signals @code{treesit-buffer-too-large} -with signal data being the buffer size. +size exceeds that, Emacs signals the @code{treesit-buffer-too-large} +error with signal data being the buffer size. Once a parser is created, Emacs automatically adds it to the internal parser list. Every time a change is made to the buffer, @@ -394,8 +423,9 @@ Emacs updates parsers in this list so they can update their syntax tree incrementally. @defun treesit-parser-list &optional buffer -This function returns the parser list of @var{buffer}. And -@var{buffer} defaults to the current buffer. +This function returns the parser list of @var{buffer}. If +@var{buffer} is @code{nil} or omitted, it defaults to the current +buffer. @end defun @defun treesit-parser-delete parser @@ -403,100 +433,108 @@ This function deletes @var{parser}. @end defun @cindex tree-sitter narrowing -@anchor{tree-sitter narrowing} Normally, a parser ``sees'' the whole -buffer, but when the buffer is narrowed (@pxref{Narrowing}), the -parser will only see the visible region. As far as the parser can -tell, the hidden region is deleted. And when the buffer is later -widened, the parser thinks text is inserted in the beginning and in -the end. Although parsers respect narrowing, narrowing shouldn't be -the mean to handle a multi-language buffer; instead, set the ranges in -which a parser should operate in. @xref{Multiple Languages}. - -Because a parser parses lazily, when we narrow the buffer, the parser -is not affected immediately; as long as we don't query for a node -while the buffer is narrowed, the parser is oblivious of the -narrowing. +@anchor{tree-sitter narrowing} +Normally, a parser ``sees'' the whole buffer, but when the buffer is +narrowed (@pxref{Narrowing}), the parser will only see the accessible +portion of the buffer. As far as the parser can tell, the hidden +region was deleted. When the buffer is later widened, the parser +thinks text is inserted at the beginning and at the end. Although +parsers respect narrowing, modes should not use narrowing as a means +to handle a multi-language buffer; instead, set the ranges in which the +parser should operate. @xref{Multiple Languages}. + +Because a parser parses lazily, when the user or a Lisp programs +narrows the buffer, the parser is not affected immediately; as long as +the mode doesn't query for a node while the buffer is narrowed, the +parser is oblivious of the narrowing. @cindex tree-sitter parse string -@defun treesit-parse-string string language -Besides creating a parser for a buffer, we can also just parse a -string. Unlike a buffer, parsing a string is a one-time deal, and +@cindex parse string, tree-sitter +Besides creating a parser for a buffer, a Lisp program can also parse a +string. Unlike a buffer, parsing a string is a one-off operation, and there is no way to update the result. -This function parses @var{string} with @var{language}, and returns the -root node of the generated syntax tree. +@defun treesit-parse-string string language +This function parses @var{string} using @var{language}, and returns +the root node of the generated syntax tree. @end defun @node Retrieving Node @section Retrieving Node +@cindex retrieve node, tree-sitter +@cindex tree-sitter, find node +@cindex get node, tree-sitter -@cindex tree-sitter find node -@cindex tree-sitter get node -Before we continue, lets go over some conventions of tree-sitter -functions. +@cindex terminology, for tree-sitter functions +Here's some terminology and conventions we use when documenting +tree-sitter functions. We talk about a node being ``smaller'' or ``larger'', and ``lower'' or ``higher''. A smaller and lower node is lower in the syntax tree and -therefore spans a smaller piece of text; a larger and higher node is -higher up in the syntax tree, containing many smaller nodes as its -children, and therefore spans a larger piece of text. +therefore spans a smaller portion of buffer text; a larger and higher +node is higher up in the syntax tree, it contains many smaller nodes +as its children, and therefore spans a larger portion of text. -When a function cannot find a node, it returns nil. And for the -convenience for function chaining, all the functions that take a node -as argument and returns a node accept the node to be nil; in that -case, the function just returns nil. +When a function cannot find a node, it returns @code{nil}. For +convenience, all the functions that take a node as argument and return +a node, also accept the node argument of @code{nil} and in that case +just return @code{nil}. @vindex treesit-node-outdated Nodes are not automatically updated when the associated buffer is -modified. And there is no way to update a node once it is retrieved. -Using an outdated node throws @code{treesit-node-outdated} error. +modified, and there is no way to update a node once it is retrieved. +Using an outdated node signals the @code{treesit-node-outdated} error. @heading Retrieving node from syntax tree +@cindex retrieving tree-sitter nodes +@cindex syntax tree, retrieving nodes -@defun treesit-node-at beg end &optional parser-or-lang named +@defun treesit-node-at pos &optional parser-or-lang named This function returns the @emph{smallest} node that starts at or after -the @var{point}. In other words, the start of the node is equal or -greater than @var{point}. +the buffer position @var{pos}. In other words, the start of the node +is greater or equal to @var{pos}. -When @var{parser-or-lang} is nil, this function uses the first parser -in @code{(treesit-parser-list)} in the current buffer. If -@var{parser-or-lang} is a parser object, it use that parser; if -@var{parser-or-lang} is a language, it finds the first parser using -that language in @code{(treesit-parser-list)} and use that. +When @var{parser-or-lang} is @code{nil} or omitted, this function uses +the first parser in @code{(treesit-parser-list)} of the current +buffer. If @var{parser-or-lang} is a parser object, it uses that +parser; if @var{parser-or-lang} is a language, it finds the first +parser using that language in @code{(treesit-parser-list)}, and uses +that. -If @var{named} is non-nil, this function looks for a named node +If @var{named} is non-@code{nil}, this function looks for a named node only (@pxref{tree-sitter named node, named node}). Example: + @example @group ;; Find the node at point in a C parser's syntax tree. (treesit-node-at (point) 'c) - @c @result{} # + @result{} # @end group @end example @end defun @defun treesit-node-on beg end &optional parser-or-lang named -This function returns the @emph{smallest} node that covers the span -from @var{beg} to @var{end}. In other words, the start of the node is -less or equal to @var{beg}, and the end of the node is greater or -equal to @var{end}. +This function returns the @emph{smallest} node that covers the region +of buffer text between @var{beg} and @var{end}. In other words, the +start of the node is before or at @var{beg}, and the end of the node +is at or after @var{end}. -@emph{Beware} that calling this function on an empty line that is not -inside any top-level construct (function definition, etc) most +@emph{Beware:} calling this function on an empty line that is not +inside any top-level construct (function definition, etc.) most probably will give you the root node, because the root node is the smallest node that covers that empty line. Most of the time, you want -to use @code{treesit-node-at}. +to use @code{treesit-node-at}, described above, instead. -When @var{parser-or-lang} is nil, this function uses the first parser -in @code{(treesit-parser-list)} in the current buffer. If -@var{parser-or-lang} is a parser object, it use that parser; if +When @var{parser-or-lang} is @code{nil}, this function uses the first +parser in @code{(treesit-parser-list)} of the current buffer. If +@var{parser-or-lang} is a parser object, it uses that parser; if @var{parser-or-lang} is a language, it finds the first parser using -that language in @code{(treesit-parser-list)} and use that. +that language in @code{(treesit-parser-list)}, and uses that. -If @var{named} is non-nil, this function looks for a named node only -(@pxref{tree-sitter named node, named node}). +If @var{named} is non-@code{nil}, this function looks for a named node +only (@pxref{tree-sitter named node, named node}). @end defun @defun treesit-parser-root-node parser @@ -506,17 +544,21 @@ This function returns the root node of the syntax tree generated by @defun treesit-buffer-root-node &optional language This function finds the first parser that uses @var{language} in -@code{(treesit-parser-list)} in the current buffer, and returns the -root node of that buffer. If it cannot find an appropriate parser, -nil is returned. +@code{(treesit-parser-list)} of the current buffer, and returns the +root node generated by that parser. If it cannot find an appropriate +parser, it returns @code{nil}. @end defun -Once we have a node, we can retrieve other nodes from it, or query for -information about this node. +Given a node, a Lisp program can retrieve other nodes starting from +it, or query for information about this node. @heading Retrieving node from other nodes +@cindex syntax tree nodes, retrieving from other nodes @subheading By kinship +@cindex kinship, syntax tree nodes +@cindex nodes, by kinship +@cindex syntax tree nodes, by kinship @defun treesit-node-parent node This function returns the immediate parent of @var{node}. @@ -524,87 +566,95 @@ This function returns the immediate parent of @var{node}. @defun treesit-node-child node n &optional named This function returns the @var{n}'th child of @var{node}. If -@var{named} is non-nil, then it only counts named nodes +@var{named} is non-@code{nil}, it counts only named nodes (@pxref{tree-sitter named node, named node}). For example, in a node -that represents a string: @code{"text"}, there are three children -nodes: the opening quote @code{"}, the string content @code{text}, and -the enclosing quote @code{"}. Among these nodes, the first child is -the opening quote @code{"}, the first named child is the string -content @code{text}. +that represents a string @code{"text"}, there are three children +nodes: the opening quote @code{"}, the string text @code{text}, and +the closing quote @code{"}. Among these nodes, the first child is the +opening quote @code{"}, and the first named child is the string text. @end defun @defun treesit-node-children node &optional named -This function returns all of @var{node}'s children in a list. If -@var{named} is non-nil, then it only retrieves named nodes. +This function returns all of @var{node}'s children as a list. If +@var{named} is non-@code{nil}, then it retrieves only named nodes. @end defun @defun treesit-next-sibling node &optional named This function finds the next sibling of @var{node}. If @var{named} is -non-nil, it finds the next named sibling. +non-@code{nil}, it finds the next named sibling. @end defun @defun treesit-prev-sibling node &optional named This function finds the previous sibling of @var{node}. If -@var{named} is non-nil, it finds the previous named sibling. +@var{named} is non-@code{nil}, it finds the previous named sibling. @end defun @subheading By field name +@cindex nodes, by field name +@cindex syntax tree nodes, by field name To make the syntax tree easier to analyze, many language definitions assign @dfn{field names} to child nodes (@pxref{tree-sitter node field name, field name}). For example, a @code{function_definition} node -could have a @code{declarator} and a @code{body}. +could have a @code{declarator} node and a @code{body} node. @defun treesit-child-by-field-name node field-name -This function finds the child of @var{node} that has @var{field-name} -as its field name. +This function finds the child of @var{node} whose field name is +@var{field-name}, a string. @example @group ;; Get the child that has "body" as its field name. (treesit-child-by-field-name node "body") - @c @result{} # + @result{} # @end group @end example @end defun @subheading By position +@cindex nodes, by position +@cindex syntax tree nodes, by position @defun treesit-first-child-for-pos node pos &optional named This function finds the first child of @var{node} that extends beyond -@var{pos}. ``Extend beyond'' means the end of the child node >= -@var{pos}. This function only looks for immediate children of -@var{node}, and doesn't look in its grand children. If @var{named} is -non-nil, it only looks for named child (@pxref{tree-sitter named node, -named node}). +buffer position @var{pos}. ``Extends beyond'' means the end of the +child node is greater or equal to @var{pos}. This function only looks +for immediate children of @var{node}, and doesn't look in its +grandchildren. If @var{named} is non-@code{nil}, it looks for the +first named child (@pxref{tree-sitter named node, named node}). @end defun @defun treesit-node-descendant-for-range node beg end &optional named -This function finds the @emph{smallest} child/grandchild... of -@var{node} that spans the range from @var{beg} to @var{end}. It is -similar to @code{treesit-node-at}. If @var{named} is non-nil, it only -looks for named child. +This function finds the @emph{smallest} descendant node of @var{node} +that spans the region of text between positions @var{beg} and +@var{end}. It is similar to @code{treesit-node-at}. If @var{named} +is non-@code{nil}, it looks for smallest named child. @end defun @heading Searching for node @defun treesit-search-subtree node predicate &optional all backward limit This function traverses the subtree of @var{node} (including -@var{node}), and match @var{predicate} with each node along the way. -And @var{predicate} is a regexp that matches (case-insensitively) -against each node's type, or a function that takes a node and returns -nil/non-nil. If a node matches, that node is returned, if no node -ever matches, nil is returned. - -By default, this function only traverses named nodes, if @var{all} is -non-nil, it traverses all nodes. If @var{backward} is non-nil, it -traverses backwards. If @var{limit} is non-nil, it only traverses -that number of levels down in the tree. +@var{node} itself), looking for a node for which @var{predicate} +returns non-@code{nil}. @var{predicate} is a regexp that is matched +(case-insensitively) against each node's type, or a predicate function +that takes a node and returns non-@code{nil} if the node matches. The +function returns the first node that matches, or @code{nil} if none +does. + +By default, this function only traverses named nodes, but if @var{all} +is non-@code{nil}, it traverses all the nodes. If @var{backward} is +@c FIXME: What does it mean to ``traverse backward''? +non-nil, it traverses backwards. If @var{limit} is non-@code{nil}, it +must be a number that limits the tree traversal to that many levels +down the tree. @end defun @defun treesit-search-forward start predicate &optional all backward up +@c FIXME: Explain better what is the differencve between this function +@c and the previous one. This function is somewhat similar to @code{treesit-search-subtree}. -It also traverse the parse tree and match each node with +It also traverse the parse tree and matches each node with @var{predicate} (except for @var{start}), where @var{predicate} can be a (case-insensitive) regexp or a function. For a tree like the below where @var{start} is marked 1, this function traverses as numbered: @@ -623,30 +673,35 @@ o o 2 7 +-+-+ +--+--+ @end group @end example -Same as in @code{treesit-search-subtree}, this function only searches -for named nodes by default. But if @var{all} is non-nil, it searches -for all nodes. If @var{backward} is non-nil, it searches backwards. +Like @code{treesit-search-subtree}, this function only searches for +named nodes by default, but if @var{all} is non-@code{nil}, it +searches for all nodes. If @var{backward} is non-@code{nil}, it +searches backwards. -If @var{up} is non-nil, this function will only traverse to siblings -and parents. In that case, only 1 3 4 8 would be traversed. +If @var{up} is non-@code{nil}, this function will only traverse to +siblings and parents. In that case, only the nodes marked by 1, 3, 4, +and 8 above would be traversed. @end defun @defun treesit-search-forward-goto predicate side &optional all backward up -This function jumps to the start or end of the next node in buffer -that matches @var{predicate}. Parameters @var{predicate}, @var{all}, -@var{backward}, and @var{up} are the same as in -@code{treesit-search-forward}. And @var{side} controls which side of -the matched no do we stop at, it can be @code{start} or @code{end}. +This function moves point to the beginning or end of the next node in +the buffer that matches @var{predicate}. Arguments @var{predicate}, +@var{all}, @var{backward}, and @var{up} are the same as in +@code{treesit-search-forward}. @var{side} controls on which side of +the matched node we stop: it can be @code{start} or @code{end}. +@c FIXME: Wouldn't it be convenient to make SIDE optional argument, +@c and by default stop and the beginning (or end), whichever happens +@c most frequently? @end defun @defun treesit-induce-sparse-tree root predicate &optional process-fn limit This function creates a sparse tree from @var{root}'s subtree. -Basically, it takes the subtree under @var{root}, and combs it so only -the nodes that match @var{predicate} are left, like picking out grapes -on the vine. Like previous functions, @var{predicate} can be a regexp -string that matches against each node's type case-insensitively, or a -function that takes a node and return nil/non-nil. +It takes the subtree under @var{root}, and combs it so only the nodes +that match @var{predicate} are left. Like previous functions, the +@var{predicate} can be a regexp string that matches against each +node's type case-insensitively, or a function that takes a node and +return non-@code{nil} if it matches. For example, for a subtree on the left that consist of both numbers and letters, if @var{predicate} is ``letter only'', the returned tree @@ -670,50 +725,51 @@ b 1 2 b | | b c d If @var{process-fn} is non-nil, instead of returning the matched nodes, this function passes each node to @var{process-fn} and uses the -returned value instead. If non-nil, @var{limit} is the number of +returned value instead. If non-@code{nil}, @var{limit} is the number of levels to go down from @var{root}. -Each node in the returned tree looks like @code{(@var{tree-sitter -node} . (@var{child} ...))}. The @var{tree-sitter node} of the root -of this tree will be nil if @var{ROOT} doesn't match @var{pred}. If -no node matches @var{predicate}, return nil. +Each node in the returned tree looks like +@w{@code{(@var{tree-sitter-node} . (@var{child} @dots{}))}}. The +@var{tree-sitter-node} of the root of this tree will be nil if +@var{root} doesn't match @var{predicate}. If no node matches +@var{predicate}, the function returns @code{nil}. @end defun -@heading More convenient functions +@heading More convenience functions -@defun treesit-filter-child node pred &optional named -This function finds immediate children of @var{node} that satisfies -@var{pred}. +@defun treesit-filter-child node predicate &optional named +This function finds immediate children of @var{node} that satisfy +@var{predicate}. -Function @var{pred} takes the child node as the argument and should -return non-nil to indicated keeping the child. If @var{named} -non-nil, this function only searches for named nodes. +The @var{predicate} function takes a node as the argument and should +return non-@code{nil} to indicate that the node should be kept. If +@var{named} is non-@code{nil}, this function only examines the named +nodes. @end defun -@defun treesit-parent-until node pred -This function repeatedly finds the parent of @var{node}, and returns -the parent if it satisfies @var{pred} (which takes the parent as the -argument). If no parent satisfies @var{pred}, this function returns -nil. +@defun treesit-parent-until node predicate +This function repeatedly finds the parents of @var{node}, and returns +the parent that satisfies @var{predicate}, a function that takes a +node as the argument. If no parent satisfies @var{predicate}, this +function returns @code{nil}. @end defun -@defun treesit-parent-while +@defun treesit-parent-while node predicate This function repeatedly finds the parent of @var{node}, and keeps -doing so as long as the parent satisfies @var{pred} (which takes the -parent as the single argument). I.e., this function returns the -farthest parent that still satisfies @var{pred}. +doing so as long as the nodes satisfy @var{predicate}, a function that +takes a node as the argument. That is, this function returns the +farthest parent that still satisfies @var{predicate}. @end defun -@node Accessing Node +@node Accessing Node Information @section Accessing Node Information +@cindex information of node, syntax trees +@cindex syntax trees, node information -Before going further, make sure you have read the basic conventions -about tree-sitter nodes in the previous node. - -@heading Basic information +@heading Basic information of Node Every node is associated with a parser, and that parser is associated -with a buffer. The following functions let you retrieve them. +with a buffer. The following functions retrieve them. @defun treesit-node-parser node This function returns @var{node}'s associated parser. @@ -727,8 +783,8 @@ This function returns @var{node}'s parser's associated buffer. This function returns @var{node}'s parser's associated language. @end defun -Each node represents a piece of text in the buffer. Functions below -finds relevant information about that text. +Each node represents a portion of text in the buffer. Functions below +find relevant information about that text. @defun treesit-node-start node Return the start position of @var{node}. @@ -739,11 +795,13 @@ Return the end position of @var{node}. @end defun @defun treesit-node-text node &optional object -Returns the buffer text that @var{node} represents. (If @var{node} is -retrieved from parsing a string, it will be text from that string.) +Return the buffer text that @var{node} represents, as a string. (If +@var{node} is retrieved from parsing a string, it will be the text +from that string.) @end defun -Here are some basic checks on tree-sitter nodes. +@cindex predicates for syntax tree nodes +Here are some predicates on tree-sitter nodes: @defun treesit-node-p object Checks if @var{object} is a tree-sitter syntax node. @@ -762,31 +820,36 @@ or anonymous is determined by the language definition (@pxref{tree-sitter named node, named node}). @cindex tree-sitter missing node -Apart from being named/anonymous, a node can have other properties. A -node can be ``missing'': missing nodes are inserted by the parser in +@cindex missing node, tree-sitter +Apart from being named or anonymous, a node can have other properties. +A node can be ``missing'': such nodes are inserted by the parser in order to recover from certain kinds of syntax errors, i.e., something -should probably be there according to the grammar, but not there. +should probably be there according to the grammar, but is not there. +This can happen during editing of the program source, when the source +is not yet in its final form. @cindex tree-sitter extra node +@cindex extra node, tree-sitter A node can be ``extra'': extra nodes represent things like comments, which can appear anywhere in the text. @cindex tree-sitter node that has changes -A node ``has changes'' if the buffer changed since when the node is -retrieved, i.e., outdated. +@cindex has changes, tree-sitter node +A node ``has changes'' if the buffer changed since the last time the +node was retrieved, i.e., the node is outdated. @cindex tree-sitter node that has error +@cindex has error, tree-sitter node A node ``has error'' if the text it spans contains a syntax error. It -can be the node itself has an error, or one of its -children/grandchildren... has an error. +can be that the node itself has an error, or one of its descendants +has an error. @defun treesit-node-check node property -This function checks if @var{node} has @var{property}. @var{property} -can be @code{'named}, @code{'missing}, @code{'extra}, -@code{'has-changes}, or @code{'has-error}. +This function checks if @var{node} has the specified @var{property}. +@var{property} can be @code{named}, @code{missing}, @code{extra}, +@code{has-changes}, or @code{has-error}. @end defun - @defun treesit-node-type node Named nodes have ``types'' (@pxref{tree-sitter node type, node type}). For example, a named node can be a @code{string_literal} node, where @@ -799,7 +862,7 @@ This function returns @var{node}'s type as a string. @defun treesit-node-index node &optional named This function returns the index of @var{node} as a child node of its -parent. If @var{named} is non-nil, it only count named nodes +parent. If @var{named} is non-@code{nil}, it only counts named nodes (@pxref{tree-sitter named node, named node}). @end defun @@ -816,35 +879,34 @@ This function returns the field name of the @var{n}'th child of @defun treesit-child-count node &optional named This function finds the number of children of @var{node}. If -@var{named} is non-nil, it only counts named child (@pxref{tree-sitter -named node, named node}). +@var{named} is non-@code{nil}, it only counts named children +(@pxref{tree-sitter named node, named node}). @end defun @node Pattern Matching @section Pattern Matching Tree-sitter Nodes +@cindex pattern matching with tree-sitter nodes -Tree-sitter let us pattern match with a small declarative language. -Pattern matching consists of two steps: first tree-sitter matches a -@dfn{pattern} against nodes in the syntax tree, then it @dfn{captures} -specific nodes in that pattern and returns the captured nodes. +@cindex capturing, tree-sitter node +Tree-sitter lets Lisp programs match patterns using a small +declarative language. This pattern matching consists of two steps: +first tree-sitter matches a @dfn{pattern} against nodes in the syntax +tree, then it @dfn{captures} specific nodes that matched the pattern +and returns the captured nodes. We describe first how to write the most basic query pattern and how to -capture nodes in a pattern, then the pattern-match function, finally -more advanced pattern syntax. +capture nodes in a pattern, then the pattern-matching function, and +finally the more advanced pattern syntax. @heading Basic query syntax -@cindex Tree-sitter query syntax -@cindex Tree-sitter query pattern +@cindex tree-sitter query pattern syntax +@cindex pattern syntax, tree-sitter query +@cindex query, tree-sitter A @dfn{query} consists of multiple @dfn{patterns}. Each pattern is an s-expression that matches a certain node in the syntax node. A -pattern has the following shape: +pattern has the form @w{@code{(@var{type} (@var{child}@dots{}))}} -@example -(@var{type} @var{child}...) -@end example - -@noindent For example, a pattern that matches a @code{binary_expression} node that contains @code{number_literal} child nodes would look like @@ -852,9 +914,9 @@ contains @code{number_literal} child nodes would look like (binary_expression (number_literal)) @end example -To @dfn{capture} a node in the query pattern above, append -@code{@@capture-name} after the node pattern you want to capture. For -example, +To @dfn{capture} a node using the query pattern above, append +@code{@@@var{capture-name}} after the node pattern you want to +capture. For example, @example (binary_expression (number_literal) @@number-in-exp) @@ -862,10 +924,11 @@ example, @noindent captures @code{number_literal} nodes that are inside a -@code{binary_expression} node with capture name @code{number-in-exp}. +@code{binary_expression} node with the capture name +@code{number-in-exp}. -We can capture the @code{binary_expression} node too, with capture -name @code{biexp}: +We can capture the @code{binary_expression} node as well, with, for +example, the capture name @code{biexp}: @example (binary_expression @@ -874,33 +937,37 @@ name @code{biexp}: @heading Query function -Now we can introduce the query functions. +@cindex query functions, tree-sitter +Now we can introduce the @dfn{query functions}. @defun treesit-query-capture node query &optional beg end node-only -This function matches patterns in @var{query} in @var{node}. -Parameter @var{query} can be either a string, a s-expression, or a +This function matches patterns in @var{query} within @var{node}. +The argument @var{query} can be either a string, a s-expression, or a compiled query object. For now, we focus on the string syntax; s-expression syntax and compiled query are described at the end of the section. -Parameter @var{node} can also be a parser or a language symbol. A +The argument @var{node} can also be a parser or a language symbol. A parser means using its root node, a language symbol means find or create a parser for that language in the current buffer, and use the root node. -The function returns all captured nodes in a list of -@code{(@var{capture_name} . @var{node})}. If @var{node-only} is -non-nil, a list of node is returned instead. If @var{beg} and -@var{end} are both non-nil, this function only pattern matches nodes -in that range. +The function returns all the captured nodes in a list of the form +@w{@code{(@var{capture_name} . @var{node})}}. If @var{node-only} is +non-@code{nil}, it returns the list of nodes instead. By default the +entire text of @var{node} is searched, but if @var{beg} and @var{end} +are both non-@code{nil}, they specify the region of buffer text where +this function should match nodes. @vindex treesit-query-error -This function raise a @var{treesit-query-error} if @var{query} is -malformed. The signal data contains a description of the specific -error. You can use @code{treesit-query-validate} to debug the query. +@findex treesit-query-validate +This function raises the @code{treesit-query-error} error if +@var{query} is malformed. The signal data contains a description of +the specific error. You can use @code{treesit-query-validate} to +validate and debug the query. @end defun -For example, suppose @var{node}'s content is @code{1 + 2}, and +For example, suppose @var{node}'s text is @code{1 + 2}, and @var{query} is @example @@ -911,7 +978,7 @@ For example, suppose @var{node}'s content is @code{1 + 2}, and @end group @end example -Querying that query would return +Matching that query would return @example @group @@ -922,8 +989,8 @@ Querying that query would return @end group @end example -As we mentioned earlier, a @var{query} could contain multiple -patterns. For example, it could have two top-level patterns: +As mentioned earlier, @var{query} could contain multiple patterns. For +example, it could have two top-level patterns: @example @group @@ -934,8 +1001,8 @@ patterns. For example, it could have two top-level patterns: @end example @defun treesit-query-string string query language -This function parses @var{string} with @var{language}, pattern matches -its root node with @var{query}, and returns the result. +This function parses @var{string} with @var{language}, matches its +root node with @var{query}, and returns the result. @end defun @heading More query syntax @@ -965,8 +1032,10 @@ named child of a @code{binary_expression} node, the pattern would be @subheading Field name -We can capture child nodes that has specific field names: +It is possible to capture child nodes that have specific field names: +@c FIXME: The significance of ``:'' should be explained, and also what +@c are ``declarator'' and ``body''. @example @group (function_definition @@ -975,8 +1044,8 @@ We can capture child nodes that has specific field names: @end group @end example -We can also capture a node that doesn't have certain field, say, a -@code{function_definition} without a @code{body} field. +It is also possible to capture a node that doesn't have a certain +field, say, a @code{function_definition} without a @code{body} field. @example (function_definition !body) @@func-no-body @@ -984,19 +1053,21 @@ We can also capture a node that doesn't have certain field, say, a @subheading Quantify node +@cindex quantify node, tree-sitter Tree-sitter recognizes quantification operators @samp{*}, @samp{+} and @samp{?}. Their meanings are the same as in regular expressions: @samp{*} matches the preceding pattern zero or more times, @samp{+} matches one or more times, and @samp{?} matches zero or one time. -For example, this pattern matches @code{type_declaration} nodes -that has @emph{zero or more} @code{long} keyword. +@c FIXME: ``pattern'' or :''query''? Or maybe ``query pattern''? +For example, the following pattern matches @code{type_declaration} +nodes that has @emph{zero or more} @code{long} keyword. @example (type_declaration "long"*) @@long-type @end example -And this pattern matches a type declaration that has zero or one +The following pattern matches a type declaration that has zero or one @code{long} keyword: @example @@ -1005,8 +1076,8 @@ And this pattern matches a type declaration that has zero or one @subheading Grouping -Similar to groups in regular expression, we can bundle patterns into a -group and apply quantification operators to it. For example, to +Similar to groups in regular expression, we can bundle patterns into +groups and apply quantification operators to them. For example, to express a comma separated list of identifiers, one could write @example @@ -1043,10 +1114,14 @@ adjacent children: @group ;; Anchor the child with the end of its parent. (compound_expression (_) @@last-child .) +@end group +@group ;; Anchor the child with the beginning of its parent. (compound_expression . (_) @@first-child) +@end group +@group ;; Anchor two adjacent children. (compound_expression (_) @@prev-child @@ -1060,8 +1135,8 @@ nodes. @subheading Predicate -We can add predicate constraints to a pattern. For example, if we use -the following query pattern +It is possible to add predicate constraints to a pattern. For +example, with the following query pattern: @example @group @@ -1072,30 +1147,33 @@ the following query pattern @end group @end example -Then tree-sitter only matches arrays where the first element equals to +@noindent +tree-sitter only matches arrays where the first element equals to the last element. To attach a predicate to a pattern, we need to group then together. A predicate always starts with a @samp{#}. Currently there are two predicates, @code{#equal} and @code{#match}. @deffn Predicate equal arg1 arg2 -Matches if @var{arg1} equals to @var{arg2}. Arguments can be either a -string or a capture name. Capture names represent the text that the +Matches if @var{arg1} equals to @var{arg2}. Arguments can be either +strings or capture names. Capture names represent the text that the captured node spans in the buffer. @end deffn @deffn Predicate match regexp capture-name -Matches if the text that @var{capture-name}’s node spans in the buffer +Matches if the text that @var{capture-name}'s node spans in the buffer matches regular expression @var{regexp}. Matching is case-sensitive. @end deffn -Note that a predicate can only refer to capture names appeared in the -same pattern. Indeed, it makes little sense to refer to capture names -in other patterns anyway. +Note that a predicate can only refer to capture names that appear in +the same pattern. Indeed, it makes little sense to refer to capture +names in other patterns. @heading S-expression patterns +@cindex query patterns as sexps +@cindex patterns, tree-sitter, in sexp form Besides strings, Emacs provides a s-expression based syntax for query -patterns. It largely resembles the string-based syntax. For example, +patterns. It largely resembles the string-based syntax. For example, the following pattern @example @@ -1125,9 +1203,8 @@ is equivalent to @end group @end example -Most pattern syntax can be written directly as strange but -never-the-less valid s-expressions. Only a few of them needs -modification: +Most patterns can be written directly as strange but nevertheless +valid s-expressions. Only a few of them needs modification: @itemize @item @@ -1154,6 +1231,7 @@ For example, @end group @end example +@noindent is written in s-expression as @example @@ -1167,18 +1245,21 @@ is written in s-expression as @heading Compiling queries -If a query will be used repeatedly, especially in tight loops, it is -important to compile that query, because a compiled query is much -faster than an uncompiled one. A compiled query can be used anywhere -a query is accepted. +@cindex compiling tree-sitter queries +@cindex queries, compiling +If a query is intended to be used repeatedly, especially in tight +loops, it is important to compile that query, because a compiled query +is much faster than an uncompiled one. A compiled query can be used +anywhere a query is accepted. @defun treesit-query-compile language query This function compiles @var{query} for @var{language} into a compiled query object and returns it. -This function raise a @var{treesit-query-error} if @var{query} is -malformed. The signal data contains a description of the specific -error. You can use @code{treesit-query-validate} to debug the query. +This function raises the @code{treesit-query-error} error if +@var{query} is malformed. The signal data contains a description of +the specific error. You can use @code{treesit-query-validate} to +validate and debug the query. @end defun @defun treesit-query-language query @@ -1186,38 +1267,40 @@ This function return the language of @var{query}. @end defun @defun treesit-query-expand query -This function expands the s-expression @var{query} into a string -query. +This function converts the s-expression @var{query} into the string +format. @end defun @defun treesit-pattern-expand pattern -This function expands the s-expression @var{pattern} into a string -pattern. +This function converts the s-expression @var{pattern} into the string +format. @end defun -Finally, tree-sitter project's documentation about -pattern-matching can be found at +For more details, read the tree-sitter project's documentation about +pattern-matching, which can be found at @uref{https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries}. @node Multiple Languages @section Parsing Text in Multiple Languages -Sometimes, the source of a programming language could contain sources -of other languages, HTML + CSS + JavaScript is one example. In that -case, we need to assign individual parsers to text segments written in -different languages. Traditionally this is achieved by using +@cindex multiple languages, parsing with tree-sitter +@cindex parsing multiple languages with tree-sitter +Sometimes, the source of a programming language could contain snippets +of other languages; HTML + CSS + JavaScript is one example. In that +case, text segments written in different languages need to be assigned +different parsers. Traditionally, this is achieved by using narrowing. While tree-sitter works with narrowing (@pxref{tree-sitter -narrowing, narrowing}), the recommended way is to set ranges in which -a parser will operate. +narrowing, narrowing}), the recommended way is instead to set regions +of buffer text in which a parser will operate. @defun treesit-parser-set-included-ranges parser ranges -This function sets the range of @var{parser} to @var{ranges}. Then -@var{parser} will only read the text covered in each range. Each -range in @var{ranges} is a list of cons @code{(@var{beg} -. @var{end})}. +This function sets up @var{parser} to operate on @var{ranges}. The +@var{parser} will only read the text of the specified ranges. Each +range in @var{ranges} is a list of the form @w{@code{(@var{beg} +. @var{end})}}. -Each range in @var{ranges} must come in order and not overlap. That -is, in pseudo code: +The ranges in @var{ranges} must come in order and must not overlap. +That is, in pseudo code: @example @group @@ -1231,12 +1314,12 @@ is, in pseudo code: @vindex treesit-range-invalid If @var{ranges} violates this constraint, or something else went -wrong, this function signals a @code{treesit-range-invalid}. The -signal data contains a specific error message and the ranges we are -trying to set. +wrong, this function signals the @code{treesit-range-invalid} error. +The signal data contains a specific error message and the ranges we +are trying to set. This function can also be used for disabling ranges. If @var{ranges} -is nil, the parser is set to parse the whole buffer. +is @code{nil}, the parser is set to parse the whole buffer. Example: @@ -1251,9 +1334,9 @@ Example: @defun treesit-parser-included-ranges parser This function returns the ranges set for @var{parser}. The return value is the same as the @var{ranges} argument of -@code{treesit-parser-included-ranges}: a list of cons -@code{(@var{beg} . @var{end})}. And if @var{parser} doesn't have any -ranges, the return value is nil. +@code{treesit-parser-included-ranges}: a list of cons cells of the form +@w{@code{(@var{beg} . @var{end})}}. If @var{parser} doesn't have any +ranges, the return value is @code{nil}. @example @group @@ -1269,7 +1352,7 @@ the ranges of @var{parser-or-lang} to @var{ranges}. Conveniently, @var{parser-or-lang} could be either a parser or a language. If it is a language, this function looks for the first parser in @code{(treesit-parser-list)} for that language in the current buffer, -and set range for it. +and sets the ranges for it. @end defun @defun treesit-get-ranges parser-or-lang @@ -1281,52 +1364,54 @@ a language symbol. @defun treesit-query-range source query &optional beg end This function matches @var{source} with @var{query} and returns the -ranges of captured nodes. The return value has the same shape of -other functions: a list of @code{(@var{beg} . @var{end})}. +ranges of captured nodes. The return value is a list of cons cells of +the form @w{@code{(@var{beg} . @var{end})}}, where @var{beg} and +@var{end} specify the beginning and the end of a region of text. For convenience, @var{source} can be a language symbol, a parser, or a -node. If a language symbol, this function matches in the root node of -the first parser using that language; if a parser, this function -matches in the root node of that parser; if a node, this function -matches in that node. - -Parameter @var{query} is the query used to capture nodes -(@pxref{Pattern Matching}). The capture names don't matter. Parameter -@var{beg} and @var{end}, if both non-nil, limits the range in which -this function queries. - -Like other query functions, this function raises an -@var{treesit-query-error} if @var{query} is malformed. +node. If it's a language symbol, this function matches in the root +node of the first parser using that language; if a parser, this +function matches in the root node of that parser; if a node, this +function matches in that node. + +The argument @var{query} is the query used to capture nodes +(@pxref{Pattern Matching}). The capture names don't matter. The +arguments @var{beg} and @var{end}, if both non-@code{nil}, limit the +range in which this function queries. + +Like other query functions, this function raises the +@code{treesit-query-error} error if @var{query} is malformed. @end defun -@defun treesit-language-at point +@defun treesit-language-at pos This function tries to figure out which language is responsible for -the text at @var{point}. It goes over each parser in -@code{(treesit-parser-list)} and see if that parser's range covers -@var{point}. +the text at buffer position @var{pos}. It goes over each parser in +@code{(treesit-parser-list)} to find a parser whose ranges cover +@var{pos}. @end defun @defvar treesit-range-functions -A list of range functions. Font-locking and indenting code uses -functions in this alist to set correct ranges for a language parser -before using it. +This variable holds the list of range functions. Font-locking and +indenting code use functions in this list to set correct ranges for +a language parser before using it. -The signature of each function should be +The signature of each function in the list should be: @example (@var{start} @var{end} &rest @var{_}) @end example -where @var{start} and @var{end} marks the region that is about to be -used. A range function only need to (but not limited to) update +@noindent +where @var{start} and @var{end} specify the region that is about to be +used. A range function only needs to (but is not limited to) update ranges in that region. -Each function in the list is called in-order. +The functions in the list are called in order. @end defvar @defun treesit-update-ranges &optional start end -This function is used by font-lock and indent to update ranges before -using any parser. Each range function in +This function is used by font-lock and indentation to update ranges +before using any parser. Each range function in @var{treesit-range-functions} is called in-order. Arguments @var{start} and @var{end} are passed to each range function. @end defun @@ -1334,11 +1419,12 @@ using any parser. Each range function in @heading An example Normally, in a set of languages that can be mixed together, there is a -major language and several embedded languages. We first parse the -whole document with the major language’s parser, set ranges for the -embedded languages, then parse the embedded languages. +major language and several embedded languages. A Lisp program usually +first parses the whole document with the major language's parser, sets +ranges for the embedded languages, and then parses the embedded +languages. -Suppose we want to parse a very simple document that mixes HTML, CSS +Suppose we need to parse a very simple document that mixes HTML, CSS and JavaScript: @example @@ -1358,14 +1444,18 @@ We first parse with HTML, then set ranges for CSS and JavaScript: (setq html (treesit-get-parser-create 'html)) (setq css (treesit-get-parser-create 'css)) (setq js (treesit-get-parser-create 'javascript)) +@end group +@group ;; Set CSS ranges. (setq css-range (treesit-query-range 'html "(style_element (raw_text) @@capture)")) (treesit-parser-set-included-ranges css css-range) +@end group +@group ;; Set JavaScript ranges. (setq js-range (treesit-query-range @@ -1375,21 +1465,24 @@ We first parse with HTML, then set ranges for CSS and JavaScript: @end group @end example -We use a query pattern @code{(style_element (raw_text) @@capture)} to -find CSS nodes in the HTML parse tree. For how to write query +We use a query pattern @w{@code{(style_element (raw_text) @@capture)}} to +find CSS nodes in the HTML parse tree. For how to write query patterns, @pxref{Pattern Matching}. @node Tree-sitter major modes @section Developing major modes with tree-sitter +@cindex major mode, developing with tree-sitter This section covers some general guidelines on developing tree-sitter integration for a major mode. For tree-sitter integration with -specific Emacs features, @pxref{Parser-based Font Lock}, +specific Emacs features, see @ref{Parser-based Font Lock}, and see @ref{Parser-based Indentation}. -Emacs provides @code{treesit-mode} and @code{global-treesit-mode}, -when these two modes are on, major modes should turn on their -tree-sitter support, should they have one. Major modes works with +@findex treesit-mode +@findex global-treesit-mode +Emacs provides @code{treesit-mode} and @code{global-treesit-mode}. +When one of these two modes is turned on, major modes should turn on +their tree-sitter support, if they have one. Major modes works with @code{treesit-mode} by setting @code{major-mode-backend-function}. @defvar major-mode-backend-function @@ -1397,24 +1490,25 @@ This is a buffer-local variable that holds a function. @code{treesit-mode} uses this function to turn on/off tree-sitter support. -This function is passed two argument @var{backend} and @var{warn}. +This function is passed two arguments: @var{backend} and @var{warn}. @var{backend} is a symbol representing the backend we want to activate. Currently it can be @code{treesit} or @code{elisp}. -If @var{warn} is non-nil, display a warning if a @code{backend} can't -activate, if @var{warn} is nil, just print an message and don't -display any warning. +If @var{warn} is non-@code{nil}, display a warning if a @code{backend} +can't activated; if @var{warn} is @code{nil}, just print a message +and don't display any warning. @end defvar @defun treesit-ready-p warn &rest languages -This is a convenient function that checks for conditions for +This is a convenience function that checks for conditions for activating tree-sitter. It checks for whether tree-sitter is built with Emacs, the buffer's size, and whether each @var{language} is available. -If all conditions are met, it returns non-nil. If not, it signals a -warning or displays a message depending on the value of @var{warn}. -If @var{warn} is non-nil, signal warning, if nil, display message. +If all conditions are met, it returns non-@code{nil}. If not, it +shows a warning or displays a message depending on the value of +@var{warn}. If @var{warn} is non-@code{nil}, show a warning, +otherwise display an echo-area message. @end defun @@ -1422,7 +1516,7 @@ If @var{warn} is non-nil, signal warning, if nil, display message. @section Tree-sitter C API Correspondence Emacs' tree-sitter integration doesn't expose every feature -tree-sitter's C API provides. Missing features include: +provided by the tree-sitter's C API. Missing features include: @itemize @item diff --git a/lisp/treesit.el b/lisp/treesit.el index 3096a21f6fe..f0a46e17c6a 100644 --- a/lisp/treesit.el +++ b/lisp/treesit.el @@ -60,10 +60,10 @@ Return the root node of the syntax tree." (treesit-parser-root-node (treesit-parser-create language)))) -(defun treesit-language-at (point) - "Return the language used at POINT." +(defun treesit-language-at (pos) + "Return the language used at position POS." (cl-loop for parser in (treesit-parser-list) - if (treesit-node-on point point parser) + if (treesit-node-on pos pos parser) return (treesit-parser-language parser))) (defun treesit-set-ranges (parser-or-lang ranges) @@ -101,12 +101,13 @@ Return the root node of the syntax tree." (treesit-parser-language (treesit-node-parser node))) -(defun treesit-node-at (point &optional parser-or-lang named) - "Return the smallest node that starts at or after POINT. +(defun treesit-node-at (pos &optional parser-or-lang named) + "Return the smallest node that starts at or after buffer position POS. -\"Starts at or after POINT\" means the start of the node is -greater or larger than POINT. Return nil if none find. If NAMED -non-nil, only look for named node. +\"Starts at or after POS\" means the start of the node is greater or +equal than POS. + +Return nil if none find. If NAMED is non-nil, only look for named node. If PARSER-OR-LANG is nil, use the first parser in \(`treesit-parser-list'); if PARSER-OR-LANG is a parser, use @@ -118,7 +119,7 @@ that language in the current buffer, and use that." next) ;; This is very fast so no need for C implementation. (while (setq next (treesit-node-first-child-for-pos - node point named)) + node pos named)) (setq node next)) node))