Besides simple syntactic font lock and regexp-based font lock, Emacs
also provides complete syntactic font lock with the help of a parser,
currently provided by the tree-sitter library (@pxref{Parsing Program
-Source}). Because it is an optional feature, parser-based font lock
-is less integrated with Emacs. Most variables introduced in previous
-sections only apply to regexp-based font lock, except for
-@var{font-lock-maximum-decoration}.
+Source}).
@defun treesit-font-lock-enable
This function enables parser-based font lock in the current buffer.
This function takes a list of text or s-exp queries. Before each
query, there are @var{:keyword} and @var{value} pairs that configures
-that query. The @var{:lang} keyword sets the query’s language, and is
+that query. The @code{:lang} keyword sets the query’s language, and is
currently the only recognized keyword.
Capture names in @var{query} should be face names like
with that face. Capture names can also be function names, in which
case the function is called with (@var{start} @var{end} @var{node}),
where @var{start} and @var{end} are the start and end position of the
-node in buffer, and @var{node} is the tree-sitter node object. If a
-capture name is both a face and a function, the face takes priority.
+node in buffer, and @var{node} is the node itself. If a capture name
+is both a face and a function, the face takes priority.
@end defun
@defvar treesit-font-lock-settings
Each @var{setting} controls one parser (often of different language).
And @var{language} is the language symbol (@pxref{Language
-Definitions}); @var{query} is either a string query or a sexp query
-(@pxref{Pattern Matching}).
+Definitions}); @var{query} is the query (@pxref{Pattern Matching}).
@end defvar
Multi-language major modes should provide range functions in
-@var{treesit-range-functions}, and Emacs will set the ranges
+@code{treesit-range-functions}, and Emacs will set the ranges
accordingly before fontifing a region (@pxref{Multiple Languages}).
@node Auto-Indentation
This variable stores the actual function called by
@code{treesit-indent}. By default, its value is
@code{treesit-simple-indent}. In the future we might add other
-more complex indentation engines, if @code{treesit-simple-indent}
-proves to be insufficient.
+more complex indentation engines.
@end defvar
@heading Writing indentation rules
(@var{language} . @var{rules})
@end example
-where @var{language} is a language symbol, @var{rules} is a list of
+where @var{language} is a language symbol, and @var{rules} is a list
+of
@example
(@var{matcher} @var{anchor} @var{offset})
@end example
-The @var{matcher} determines whether this rule applies, @var{anchor}
-and @var{offset} together determines which column to indent to.
-
-A @var{matcher} is a function that takes three arguments (@var{node}
-@var{parent} @var{bol}). Argument @var{bol} is the point at where we
-are indenting: the position of the first non-whitespace character from
-the beginning of line; @var{node} is the largest (highest-in-tree)
-node that starts at that point; @var{parent} is the parent of
-@var{node};
-
-If @var{matcher} returns non-nil, meaning the rule matches, Emacs then
-uses @var{anchor} to find an anchor, it should be a function that
-takes the same argument (@var{node} @var{parent} @var{bol}) and
-returns a point.
-
-Finally Emacs computes the column of that point returned by
-@var{anchor} and adds @var{offset} to it, and indents to that column.
-
-For @var{matcher} and @var{anchor}, Emacs provides some convenient
-presets to spare us from writing these functions ourselves. They are
-stored in @var{treesit-simple-indent-presets}, see below.
+First Emacs passes the node at point to @var{matcher}, if it return
+non-nil, this rule applies. Then Emacs passes the node to
+@var{anchor}, it returns a point. Emacs takes the column number of
+that point, add @var{offset} to it, and the result is the indent for
+the current line.
+
+The @var{matcher} and @var{anchor} are functions, and Emacs provides
+convenient presets for them. You can skip over to
+@code{treesit-simple-indent-presets} below, those presets should be
+more than enough.
+
+A @var{matcher} or an @var{anchor} is a function that takes three
+arguments (@var{node} @var{parent} @var{bol}). Argument @var{bol} is
+the point at where we are indenting: the position of the first
+non-whitespace character from the beginning of line; @var{node} is the
+largest (highest-in-tree) node that starts at that point; @var{parent}
+is the parent of @var{node}. A @var{matcher} returns nil/non-nil, and
+@var{anchor} returns a point.
@end defvar
@defvar treesit-simple-indent-presets
This is a list of presets for @var{matcher}s and @var{anchor}s in
-@var{treesit-simple-indent-rules}. Each of them represent a
-function that takes @var{node}, @var{parent} and @var{bol} as
-arguments.
-
-@example
-(match @var{node-type} @var{parent-type}
- @var{node-field} @var{node-index-min} @var{node-index-max})
-@end example
-
-This matcher checks if @var{node}'s type is @var{node-type},
-@var{parent}'s type is @var{parent-type}, @var{node}'s field name in
-@var{parent} is @var{node-field}, and @var{node}'s index among its
-siblings is between @var{node-index-min} and @var{node-index-max}. If
-the value of a constraint is nil, this matcher doesn't check for that
-constraint. For example, to match the first child where parent is
-@code{argument_list}, use
-
-@example
-(match nil "argument_list" nil nil 0 0)
-@end example
+@code{treesit-simple-indent-rules}. Each of them represent a function
+that takes @var{node}, @var{parent} and @var{bol} as arguments.
@example
no-node
This matcher matches if querying @var{parent} with @var{query}
captures @var{node}. The capture name does not matter.
+@example
+(match @var{node-type} @var{parent-type}
+ @var{node-field} @var{node-index-min} @var{node-index-max})
+@end example
+
+This matcher checks if @var{node}'s type is @var{node-type},
+@var{parent}'s type is @var{parent-type}, @var{node}'s field name in
+@var{parent} is @var{node-field}, and @var{node}'s index among its
+siblings is between @var{node-index-min} and @var{node-index-max}. If
+the value of a constraint is nil, this matcher doesn't check for that
+constraint. For example, to match the first child where parent is
+@code{argument_list}, use
+
+@example
+(match nil "argument_list" nil nil 0 0)
+@end example
+
@example
first-sibling
@end example
no-indent
@end example
-This anchor returns the start of @var{node}, i.e., do not indent.
+This anchor returns the start of @var{node}, i.e., no indent.
@example
prev-line
rules.
@defun treesit-check-indent mode
-This function check current buffer's indentation against major mode
-@var{mode}. It indents the current line in @var{mode} and compares
+This function checks current buffer's indentation against major mode
+@var{mode}. It indents the current buffer in @var{mode} and compares
the indentation with the current indentation. Then it pops up a diff
buffer showing the difference. Correct indentation (target) is in
green, current indentation is in red.
In order to use a language definition in Emacs, you need to make sure
that the dynamic library is installed on the system. Emacs looks for
language definitions under load paths in
-@var{treesit-extra-load-path}, @var{user-emacs-directory}/tree-sitter,
+@code{treesit-extra-load-path}, @code{user-emacs-directory}/tree-sitter,
and system default locations for dynamic libraries, in that order.
-Emacs tries each extensions in @var{treesit-load-suffixes}. If Emacs
+Emacs tries each extensions in @code{treesit-load-suffixes}. If Emacs
cannot find the library or has problem loading it, Emacs signals
-@var{treesit-load-language-error}. The signal data is a list of
+@code{treesit-load-language-error}. The signal data is a list of
specific error messages.
@defun treesit-language-available-p language
@code{libtree-sitter-@var{language}.@var{ext}}, where @var{ext} is the
system-specific extension for dynamic libraries. Also by convention,
the function provided by that library is named
-@code{tree_sitter_<language>}. If a language definition doesn't
+@code{tree_sitter_@var{language}. If a language definition doesn't
follow this convention, you should add an entry
@example
(@var{language} @var{library-base-name} @var{function-name})
@end example
-to @var{treesit-load-name-override-list}, where
+to @code{treesit-load-name-override-list}, where
@var{library-base-name} is the base filename for the dynamic library
(conventionally @code{libtree-sitter-@var{language}}), and
@var{function-name} is the function provided by the library
(cool-lang "libtree-sitter-coool" "tree_sitter_cooool")
@end example
-for a language too cool to abide by the rules.
+for a language too cool to abide by conventions.
@heading Concrete syntax tree
-A syntax tree is what a language definition defines (more or less) and
-what a parser generates. In a syntax tree, each node represents a
-piece of text, and is connected to each other by a parent-child
-relationship. For example, if the source text is
+A syntax tree is what a parser generates. In a syntax tree, each node
+represents a piece of text, and is connected to each other by a
+parent-child relationship. For example, if the source text is
@example
1 + 2
Authors of language definitions define the @dfn{grammar} of a
language, and this grammar determines how does a parser construct a
-concrete syntax tree out of the text. In order to used the syntax
+concrete syntax tree out of the text. In order to use the syntax
tree effectively, we need to read the @dfn{grammar file}.
The grammar file is usually @code{grammar.js} in a language
This section described how to create and configure a tree-sitter
parser. In Emacs, each tree-sitter parser is associated with a
-buffer. As we edit the buffer, the associated parser is automatically
-kept up-to-date.
+buffer. As we edit the buffer, the associated parser and the syntax
+tree is automatically kept up-to-date.
@defvar treesit-max-buffer-size
This variable contains the maximum size of buffers in which
@defun treesit-can-enable-p
This function checks whether the current buffer is suitable for
activating tree-sitter features. It basically checks
-@code{treesit-available-p} and @var{treesit-max-buffer-size}.
+@code{treesit-available-p} and @code{treesit-max-buffer-size}.
@end defun
@cindex Creating tree-sitter parsers
@defun treesit-parser-create language &optional buffer no-reuse
-To create a parser, we provide a @var{buffer} to keep track of and the
-@var{language} to use (@pxref{Language Definitions}). If @var{buffer}
-is nil, the current buffer is used.
+To create a parser, we provide a @var{buffer} and the @var{language}
+to use (@pxref{Language Definitions}). If @var{buffer} is nil, the
+current buffer is used.
By default, this function reuses a parser if one already exists for
@var{language} in @var{buffer}, if @var{no-reuse} is non-nil, this
There is no need to explicitly parse a buffer, because parsing is done
automatically and lazily. A parser only parses when we query for a
node in its syntax tree. Therefore, when a parser is first created,
-it doesn't parse the buffer; instead, it waits until we query for a
-node for the first time. Similarly, when some change is made in the
-buffer, a parser doesn't re-parse immediately and only records some
-necessary information to later re-parse when necessary.
+it doesn't parse the buffer; it waits until we query for a node for
+the first time. Similarly, when some change is made in the buffer, a
+parser doesn't re-parse immediately.
@vindex treesit-buffer-too-large
When a parser do parse, it checks for the size of the buffer.
Tree-sitter can only handle buffer no larger than about 4GB. If the
-size exceeds that, Emacs signals @var{treesit-buffer-too-large}
+size exceeds that, Emacs signals @code{treesit-buffer-too-large}
with signal data being the buffer size.
Once a parser is created, Emacs automatically adds it to the
@cindex tree-sitter find node
@cindex tree-sitter get node
-There are two ways to retrieve a node: directly from the syntax tree,
-or by traveling from other nodes. But before we continue, lets go
-over some conventions of tree-sitter functions.
+Before we continue, lets go over some conventions of tree-sitter
+functions.
We talk about a node being ``smaller'' or ``larger'', and ``lower'' or
``higher''. A smaller and lower node is lower in the syntax tree and
@vindex treesit-node-outdated
Nodes are not automatically updated when the associated buffer is
-modified. In fact, there is no way to update a node once it is
-retrieved. It is best to use a node and throw it away and not save
-it. A node is @dfn{outdated} if the buffer has changed since the node
-is retrieved. Using an outdated node throws
-@var{treesit-node-outdated} error.
+modified. And there is no way to update a node once it is retrieved.
+Using an outdated node throws @code{treesit-node-outdated} error.
@heading Retrieving node from syntax tree
that language in @code{(treesit-parser-list)} and use that.
If @var{named} is non-nil, this function looks for a named node
-instead (@pxref{tree-sitter named node, named node}).
+only (@pxref{tree-sitter named node, named node}).
+Example:
@example
@group
;; Find the node at point in a C parser's syntax tree.
-(treesit-node-on (point) 'c)
+(treesit-node-at (point) 'c)
@c @result{} #<treesit-node from 1 to 4 in *scratch*>
@end group
@end example
less or equal to @var{beg}, and the end of the node is greater or
equal to @var{end}.
-@emph{Beware}, Calling this function on an empty line that is not
+@emph{Beware} that calling this function on an empty line that is not
inside any top-level construct (function definition, etc) most
probably will give you the root node, because the root node is the
-smallest node that covers that empty line. You probably want to use
-@code{treesit-node-at} instead.
+smallest node that covers that empty line. Most of the time, you want
+to use @code{treesit-node-at}.
When @var{parser-or-lang} is nil, this function uses the first parser
in @code{(treesit-parser-list)} in the current buffer. If
@var{parser-or-lang} is a language, it finds the first parser using
that language in @code{(treesit-parser-list)} and use that.
-If @var{named} is non-nil, this function looks for a named node
-instead (@pxref{tree-sitter named node, named node}).
+If @var{named} is non-nil, this function looks for a named node only
+(@pxref{tree-sitter named node, named node}).
@end defun
@defun treesit-parser-root-node parser
@defun treesit-buffer-root-node &optional language
This function finds the first parser that uses @var{language} in
@code{(treesit-parser-list)} in the current buffer, and returns the
-root node of that buffer. If it cannot find an appropriate parser, it
-returns nil.
+root node of that buffer. If it cannot find an appropriate parser,
+nil is returned.
@end defun
Once we have a node, we can retrieve other nodes from it, or query for
@defun treesit-node-children node &optional named
This function returns all of @var{node}'s children in a list. If
-@var{named} is non-nil, then it only retrieves named nodes
-(@pxref{tree-sitter named node, named node}).
+@var{named} is non-nil, then it only retrieves named nodes.
@end defun
@defun treesit-next-sibling node &optional named
This function finds the next sibling of @var{node}. If @var{named} is
-non-nil, it finds the next named sibling (@pxref{tree-sitter named
-node, named node}).
+non-nil, it finds the next named sibling.
@end defun
@defun treesit-prev-sibling node &optional named
This function finds the previous sibling of @var{node}. If
-@var{named} is non-nil, it finds the previous named sibling
-(@pxref{tree-sitter named node, named node}).
+@var{named} is non-nil, it finds the previous named sibling.
@end defun
@subheading By field name
@defun treesit-first-child-for-pos node pos &optional named
This function finds the first child of @var{node} that extends beyond
-@var{pos}. ``Extend beyond'' means the end of the child node
-@code{>=} @var{pos}. This function only looks for immediate children of
+@var{pos}. ``Extend beyond'' means the end of the child node >=
+@var{pos}. This function only looks for immediate children of
@var{node}, and doesn't look in its grand children. If @var{named} is
non-nil, it only looks for named child (@pxref{tree-sitter named node,
named node}).
@end defun
@defun treesit-node-descendant-for-range node beg end &optional named
-This function finds the @emph{smallest} (grand)child of @var{node}
-that spans the range from @var{beg} to @var{end}. It is similar to
-@code{treesit-node-at}. If @var{named} is non-nil, it only looks
-for named child (@pxref{tree-sitter named node, named node}).
+This function finds the @emph{smallest} child/grandchild... of
+@var{node} that spans the range from @var{beg} to @var{end}. It is
+similar to @code{treesit-node-at}. If @var{named} is non-nil, it only
+looks for named child.
@end defun
@heading Searching for node
@defun treesit-search-subtree node predicate &optional all backward limit
-This function traverses the subtree of @var{node}, and match
-@var{predicate} with each node along the way. And @var{predicate} is
-a regexp that matches against each node's type, or a function that
-takes a node and returns nil/non-nil. If a node matches, that node is
-returned, if no node ever matches, nil is returned.
+This function traverses the subtree of @var{node} (including
+@var{node}), and match @var{predicate} with each node along the way.
+And @var{predicate} is a regexp that matches against each node's type,
+or a function that takes a node and returns nil/non-nil. If a node
+matches, that node is returned, if no node ever matches, nil is
+returned.
By default, this function only traverses named nodes, if @var{all} is
non-nil, it traverses all nodes. If @var{backward} is non-nil, it
-traverse backwards. If @var{limit} is non-nil, it only traverses that
-number of levels down in the tree.
+traverses backwards. If @var{limit} is non-nil, it only traverses
+that number of levels down in the tree.
@end defun
@defun treesit-search-forward start predicate &optional all backward up
It also traverse the parse tree and match each node with
@var{predicate} (except for @var{start}), where @var{predicate} can be
a regexp or a function. For a tree like the below where @var{start}
-is marked 1, this function will traverse as numbered:
+is marked 1, this function traverses as numbered:
@example
@group
Same as in @code{treesit-search-subtree}, this function only searches
for named nodes by default. But if @var{all} is non-nil, it searches
-for all nodes. And If @var{backward} is non-nil, it searches
-backwards.
+for all nodes. If @var{backward} is non-nil, it searches backwards.
If @var{up} is non-nil, this function will only traverse to siblings
and parents. In that case, only 1 3 4 8 would be traversed.
that matches @var{predicate}. Parameters @var{predicate}, @var{all},
@var{backward}, and @var{up} are the same as in
@code{treesit-search-forward}. And @var{side} controls which side of
-the matched no do we stop at, it can be @code{'start} or @code{'end}.
+the matched no do we stop at, it can be @code{start} or @code{end}.
@end defun
@defun treesit-induce-sparse-tree root predicate &optional process-fn limit
-This function creates a sparse tree of @var{root}'s subtree.
+This function creates a sparse tree from @var{root}'s subtree.
Basically, it takes the subtree under @var{root}, and combs it so only
the nodes that match @var{predicate} are left, like picking out grapes
a node and return nil/non-nil.
For example, for a subtree on the left that consist of both numbers
-and letters, if @var{predicate} is ``is letter'', the returned tree is
-the one on the right.
+and letters, if @var{predicate} is ``letter only'', the returned tree
+is the one on the right.
@example
@group
@end example
If @var{process-fn} is non-nil, instead of returning the matched
-nodes, pass each node to @var{process-fn} use the return value
-instead. If non-nil, @var{limit} is the number of levels to go down
-from @var{root}.
-
-Each node in the returned tree looks like @code{(@var{node}
-. (@var{child} ...))}. The root of this tree might be nil, if
-@var{root} doesn't match @var{pred}. If no node matches
-@var{predicate}, return nil.
+nodes, this function passes each node to @var{process-fn} and uses the
+returned value instead. If non-nil, @var{limit} is the number of
+levels to go down from @var{root}.
+
+Each node in the returned tree looks like @code{(@var{tree-sitter
+node} . (@var{child} ...))}. The @var{tree-sitter node} of the root
+of this tree will be nil if @var{ROOT} doesn't match @var{pred}. If
+no node matches @var{predicate}, return nil.
@end defun
@heading More convenient functions
@defun treesit-filter-child node pred &optional named
-This function finds children of @var{node} that satisfies @var{pred}.
+This function finds immediate children of @var{node} that satisfies
+@var{pred}.
Function @var{pred} takes the child node as the argument and should
return non-nil to indicated keeping the child. If @var{named}
@defun treesit-node-text node &optional object
Returns the buffer text that @var{node} represents. (If @var{node} is
-retrieved from parsing a string, it will be the text from that
-string.)
+retrieved from parsing a string, it will be text from that string.)
@end defun
Here are some basic checks on tree-sitter nodes.
@cindex tree-sitter node that has changes
A node ``has changes'' if the buffer changed since when the node is
-retrieved. In this case, the node's start and end position would be
-off and we better throw it away and retrieve a new one.
+retrieved, i.e., outdated.
@cindex tree-sitter node that has error
A node ``has error'' if the text it spans contains a syntax error. It
-can be the node itself has an error, or one of its (grand)children has
-an error.
+can be the node itself has an error, or one of its
+children/grandchildren... has an error.
@defun treesit-node-check node property
This function checks if @var{node} has @var{property}. @var{property}
@code{'has-changes}, or @code{'has-error}.
@end defun
+
+@defun treesit-node-type node
Named nodes have ``types'' (@pxref{tree-sitter node type, node type}).
For example, a named node can be a @code{string_literal} node, where
@code{string_literal} is its type.
-@defun treesit-node-type node
-Return @var{node}'s type as a string.
+This function returns @var{node}'s type as a string.
@end defun
@heading Information as a child or parent
@end defun
@defun treesit-node-field-name-for-child node n
-This is a more primitive function that returns the field name of the
-@var{n}'th child of @var{node}.
+This function returns the field name of the @var{n}'th child of
+@var{node}.
@end defun
@defun treesit-child-count node &optional named
Now we can introduce the query functions.
@defun treesit-query-capture node query &optional beg end node-only
-This function matches patterns in @var{query} in @var{node}. Argument
-@var{query} can be either a string, a s-expression, or a compiled
-query object. For now, we focus on the string syntax; s-expression
-syntax and compiled query are described at the end of the section.
+This function matches patterns in @var{query} in @var{node}.
+Parameter @var{query} can be either a string, a s-expression, or a
+compiled query object. For now, we focus on the string syntax;
+s-expression syntax and compiled query are described at the end of the
+section.
The function returns all captured nodes in a list of
@code{(@var{capture_name} . @var{node})}. If @var{node-only} is
@end group
@end example
-@noindent
Querying that query would return
@example
that has @emph{zero or more} @code{long} keyword.
@example
-(type_declaration "long"* @@long-in-type)
+(type_declaration "long"*) @@long-type
@end example
-@noindent
And this pattern matches a type declaration that has zero or one
@code{long} keyword:
@example
-(type_declaration "long"?) @@type-decl
+(type_declaration "long"?) @@long-type
@end example
@subheading Grouping
error. You can use @code{treesit-query-validate} to debug the query.
@end defun
-@defun treesit-expand-query query
+@defun treesit-query-expand query
This function expands the s-expression @var{query} into a string
query.
@end defun
-@defun treesit-expand-pattern pattern
+@defun treesit-pattern-expand pattern
This function expands the s-expression @var{pattern} into a string
pattern.
@end defun
@vindex treesit-range-invalid
If @var{ranges} violates this constraint, or something else went
-wrong, this function signals a @var{treesit-range-invalid}. The
+wrong, this function signals a @code{treesit-range-invalid}. The
signal data contains a specific error message and the ranges we are
trying to set.
a language symbol.
@end defun
-@defun treesit-query-range source pattern &optional beg end
-This function matches @var{source} with @var{pattern} and returns the
+@defun treesit-query-range source query &optional beg end
+This function matches @var{source} with @var{query} and returns the
ranges of captured nodes. The return value has the same shape of
other functions: a list of @code{(@var{beg} . @var{end})}.
matches in the root node of that parser; if a node, this function
matches in that node.
-Parameter @var{pattern} is the query pattern used to capture nodes
+Parameter @var{query} is the query used to capture nodes
(@pxref{Pattern Matching}). The capture names don't matter. Parameter
@var{beg} and @var{end}, if both non-nil, limits the range in which
this function queries.
Like other query functions, this function raises an
-@var{treesit-query-error} if @var{pattern} is malformed.
+@var{treesit-query-error} if @var{query} is malformed.
@end defun
@defun treesit-language-at point
@heading An example
Normally, in a set of languages that can be mixed together, there is a
-major language and several embedded languages. The major language
-parses the whole document, and skips the embedded languages. Then the
-parser for the major language knows the ranges of the embedded
-languages. So we first parse the whole document with the major
-language’s parser, set ranges for the embedded languages, then parse
-the embedded languages.
+major language and several embedded languages. We first parse the
+whole document with the major language’s parser, set ranges for the
+embedded languages, then parse the embedded languages.
Suppose we want to parse a very simple document that mixes HTML, CSS
and JavaScript: