From eecc2d45b94513ba95789dfe0ef58aeb8b029049 Mon Sep 17 00:00:00 2001
From: Yuan Fu
parent field: (child (grandchild (…))) +parent field: (node (child (…)))
child, grand, grand-grandchild, etc., are nodes that -begin at point. parent is the parent node of child. +
where node, child, etc, are nodes which begin at point. +parent is the parent of node. node is displayed in +bold typeface. field-names are field names of node and +child, etc.
-If there is no node that starts at point, i.e., point is in the middle -of a node, then the mode-line only displays the smallest node that -spans the position of point, and its immediate parent. +
If no node starts at point, i.e., point is in the middle of a node, +then the mode line displays the earliest node that spans point, and +its immediate parent.
-This minor mode doesn’t create parsers on its own. It simply uses the
-first parser in (treesit-parser-list)
(see Using Tree-sitter Parser).
+
This minor mode doesn’t create parsers on its own. It uses the first
+parser in (treesit-parser-list)
(see Using Tree-sitter Parser).
Sometimes, the source of a programming language could contain snippets @@ -76,8 +75,22 @@ example. In that case, text segments written in different languages need to be assigned different parsers. Traditionally, this is achieved by using narrowing. While tree-sitter works with narrowing (see narrowing), the recommended way is -instead to set regions of buffer text in which a parser will operate. +instead to set regions of buffer text (i.e., ranges) in which a parser +will operate. This section describes functions for setting and +getting ranges for a parser. +
+Lisp programs should call treesit-update-ranges
to make sure
+the ranges for each parser are correct before using parsers in a
+buffer, and call treesit-language-at
to figure out the language
+responsible for the text at some position. These two functions don’t
+work by themselves, they need major modes to set
+treesit-range-settings
and
+treesit-language-at-point-function
, which do the actual work.
+These functions and variables are explained in more detail towards the
+end of the section.
This function sets up parser to operate on ranges. The
@@ -126,24 +139,6 @@ ranges, the return value is nil
.
Like treesit-parser-set-included-ranges
, this function sets
-the ranges of parser-or-lang to ranges. Conveniently,
-parser-or-lang could be either a parser or a language. If it is
-a language, this function looks for the first parser in
-(treesit-parser-list)
for that language in the current buffer,
-and sets the ranges for it.
-
This function returns the ranges of parser-or-lang, like
-treesit-parser-included-ranges
. And like
-treesit-set-ranges
, parser-or-lang can be a parser or
-a language symbol.
-
This function matches source with query and returns the
@@ -166,57 +161,56 @@ range in which this function queries.
treesit-query-error
error if query is malformed.
This variable holds the list of range functions. Font-locking and -indenting code use functions in this list to set correct ranges for -a language parser before using it. -
-The signature of each function in the list should be: -
-(start end &rest _) -
where start and end specify the region that is about to be -used. A range function only needs to (but is not limited to) update -ranges in that region. +
It should suffice for general Lisp programs to call the following two +functions in order to support program sources that mixes multiple +languages.
-The functions in the list are called in order. -
This function is used by font-lock and indentation to update ranges -before using any parser. Each range function in -treesit-range-functions is called in-order. Arguments -start and end are passed to each range function. +
This function updates ranges for parsers in the buffer. It makes sure
+the parsers’ ranges are set correctly between beg and end,
+according to treesit-range-settings
. If omitted, beg
+defaults to the beginning of the buffer, and end defaults to the
+end of the buffer.
+
For example, fontification functions use this function before querying +for nodes in a region.
This function tries to figure out which language is responsible for
-the text at buffer position pos. Under the hood it just calls
-treesit-language-at-point-function
.
-
Various Lisp programs use this function. For example, the indentation
-program uses this function to determine which language’s rule to use
-in a multi-language buffer. So it is important to provide
-treesit-language-at-point-function
for a multi-language major
-mode.
+
This function returns the language of the text at buffer position
+pos. Under the hood it calls
+treesit-language-at-point-function
and returns its return
+value. If treesit-language-at-point-function
is nil
,
+this function returns the language of the first parser in the returned
+value of treesit-parser-list
. If there is no parser in the
+buffer, it returns nil
.
Normally, in a set of languages that can be mixed together, there is a -major language and several embedded languages. A Lisp program usually -first parses the whole document with the major language’s parser, sets -ranges for the embedded languages, and then parses the embedded +host language and one or more embedded languages. A Lisp +program usually first parses the whole document with the host +language’s parser, retrieves some information, sets ranges for the +embedded languages with that information, and then parses the embedded languages.
-Suppose we need to parse a very simple document that mixes -HTML, CSS and JavaScript: +
Take a buffer containing HTML, CSS and JavaScript
+as an example. A Lisp program will first parse the whole buffer with
+an HTML parser, then query the parser for
+style_element
and script_element
nodes, which
+correspond to CSS and JavaScript text, respectively. Then
+it sets the range of the CSS and JavaScript parser to the
+ranges in which their corresponding nodes span.
+
Given a simple HTML document:
<html> @@ -225,8 +219,8 @@ languages. </html>
We first parse with HTML, then set ranges for CSS -and JavaScript: +
a Lisp program will first parse with a HTML parser, then set +ranges for CSS and JavaScript parsers:
;; Create parsers. @@ -251,10 +245,76 @@ and JavaScript: (treesit-parser-set-included-ranges js js-range)
We use a query pattern (style_element (raw_text) @capture)
-to find CSS nodes in the HTML parse tree. For how
-to write query patterns, see Pattern Matching Tree-sitter Nodes.
+
Emacs automates this process in treesit-update-ranges
. A
+multi-language major mode should set treesit-range-settings
so
+that treesit-update-ranges
knows how to perform this process
+automatically. Major modes should use the helper function
+treesit-range-rules
to generate a value that can be assigned to
+treesit-range-settings
. The settings in the following example
+directly translate into operations shown above.
(setq-local treesit-range-settings + (treesit-range-rules + :embed 'javascript + :host 'html + '((script_element (raw_text) @capture)) +
+ +
:embed 'css + :host 'html + '((style_element (raw_text) @capture)))) +
This function is used to set treesit-range-settings. It +takes care of compiling queries and other post-processing, and outputs +a value that treesit-range-settings can have. +
+It takes a series of query-specs, where each query-spec is +a query preceded by zero or more pairs of keyword and +value. Each query is a tree-sitter query in either the +string, s-expression or compiled form, or a function. +
+If query is a tree-sitter query, it should be preceeded by two
+:keyword value pairs, where the :embed
keyword
+specifies the embedded language, and the :host
keyword
+specified the host language.
+
treesit-update-ranges
uses query to figure out how to set
+the ranges for parsers for the embedded language. It queries
+query in a host language parser, computes the ranges in which
+the captured nodes span, and applies these ranges to embedded
+language parsers.
+
If query is a function, it doesn’t need any :keyword and +value pair. It should be a function that takes 2 arguments, +start and end, and sets the ranges for parsers in the +current buffer in the region between start and end. It is +fine for this function to set ranges in a larger region that +encompasses the region between start and end. +
This variable helps treesit-update-ranges
in updating the
+ranges for parsers in the buffer. It is a list of settings
+where the exact format of a setting is considered internal. You
+should use treesit-range-rules
to generate a value that this
+variable can have.
+
This variable’s value should be a function that takes a single
+argument, pos, which is a buffer position, and returns the
+language of the buffer text at pos. This variable is used by
+treesit-language-at
.
+
font-lock-keyword
face.
treesit-major-mode-setup
.
This function is used to set treesit-font-lock-settings. It takes care of compiling queries and other post-processing, and outputs a value that treesit-font-lock-settings accepts. Here’s an @@ -129,13 +129,18 @@ example: "(script_element) @font-lock-builtin-face")
This function takes a list of text or s-exp queries. Before each
-query, there are :keyword-value pairs that configure
-that query. The :lang
keyword sets the query’s language and
-every query must specify the language. The :feature
keyword
-sets the feature name of the query. Users can control which features
-are enabled with font-lock-maximum-decoration
and
-treesit-font-lock-feature-list
(see below).
+
This function takes a series of query-specs, where each +query-spec is a query preceded by multiple pairs of +:keyword and value. Each query is a tree-sitter +query in either the string, s-expression or compiled form. +
+For each query, the :keyword and value pairs add
+meta information to it. The :lang
keyword declares
+query’s language. The :feature
keyword sets the feature
+name of query. Users can control which features are enabled
+with font-lock-maximum-decoration
and
+treesit-font-lock-feature-list
(described below). These two
+keywords are mandated.
Other keywords are optional:
@@ -148,7 +153,7 @@ are enabled withfont-lock-maximum-decoration
and
keep
Lisp programs mark patterns in the query with capture names (names +
Lisp programs mark patterns in query with capture names (names
that starts with @
), and tree-sitter will return matched nodes
tagged with those same capture names. For the purpose of
fontification, capture names in query should be face names like
@@ -230,9 +235,10 @@ these common features.
A list of settings for tree-sitter based font lock. The exact format
-of this variable is considered internal. One should always use
+of each setting is considered internal. One should always use
treesit-font-lock-rules
to set this variable.
-
Multi-language major modes should provide range functions in
treesit-range-functions
, and Emacs will set the ranges
diff --git a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
index 2fdb50df7c1..5ea1f9bc332 100644
--- a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
+++ b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
@@ -106,7 +106,8 @@ the current line to matcher; if it returns non-nil
, this
rule is applicable. Then Emacs passes the node to anchor, which
returns a buffer position. Emacs takes the column number of that
position, adds offset to it, and the result is the indentation
-column for the current line.
+column for the current line. offset can be an integer or a
+variable whose value is an integer.
The matcher and anchor are functions, and Emacs provides
convenient defaults for them.
@@ -117,8 +118,8 @@ arguments: node, parent, and bol. The argument
position of the first non-whitespace character after the beginning of
the line. The argument node is the largest (highest-in-tree)
node that starts at that position; and parent is the parent of
-node. However, when that position is on a whitespace or inside
-a multi-line string, no node that starts at that position, so
+node. However, when that position is in a whitespace or inside
+a multi-line string, no node can start at that position, so
node is nil
. In that case, parent would be the
smallest node that spans that position.
This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the first non-whitespace charater on the previous line. +
+point-min
¶This anchor is a function is called with 3 arguments: node, +parent, and bol, and returns the beginning of the buffer. +This is useful as the beginning of the buffer is always at column 0.
nil
, it looks for smallest named child.
This function traverses the subtree of node (including
node itself), looking for a node for which predicate
returns non-nil
. predicate is a regexp that is matched
-(case-insensitively) against each node’s type, or a predicate function
-that takes a node and returns non-nil
if the node matches. The
-function returns the first node that matches, or nil
if none
-does.
+against each node’s type, or a predicate function that takes a node
+and returns non-nil
if the node matches. The function returns
+the first node that matches, or nil
if none does.
By default, this function only traverses named nodes, but if all
is non-nil
, it traverses all the nodes. If backward is
@@ -279,9 +278,9 @@ down the tree.
Like treesit-search-subtree
, this function also traverses the
parse tree and matches each node with predicate (except for
-start), where predicate can be a (case-insensitive) regexp
-or a function. For a tree like the below where start is marked
-S, this function traverses as numbered from 1 to 12:
+start), where predicate can be a regexp or a function.
+For a tree like the below where start is marked S, this function
+traverses as numbered from 1 to 12:
12 @@ -336,8 +335,8 @@ as intreesit-search-forward
.It takes the subtree under root, and combs it so only the nodes that match predicate are left. Like previous functions, the predicate can be a regexp string that matches against each -node’s type case-insensitively, or a function that takes a node and -return non-
nil
if it matches. +node’s type, or a function that takes a node and return non-nil
+if it matches.For example, for a subtree on the left that consist of both numbers and letters, if predicate is “letter only”, the returned tree -- 2.39.5