From eecc2d45b94513ba95789dfe0ef58aeb8b029049 Mon Sep 17 00:00:00 2001 From: Yuan Fu Date: Wed, 9 Nov 2022 14:50:39 -0800 Subject: [PATCH] ; Update tree-sitter HTML manuals in admin/notes * admin/notes/tree-sitter/html-manual/Language-Definitions.html * admin/notes/tree-sitter/html-manual/Multiple-Languages.html * admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html * admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html * admin/notes/tree-sitter/html-manual/Retrieving-Node.html: Update. --- .../html-manual/Language-Definitions.html | 18 +- .../html-manual/Multiple-Languages.html | 186 ++++++++++++------ .../Parser_002dbased-Font-Lock.html | 28 +-- .../Parser_002dbased-Indentation.html | 13 +- .../html-manual/Retrieving-Node.html | 17 +- 5 files changed, 168 insertions(+), 94 deletions(-) diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html index 6df676b1680..4fd7eb5687f 100644 --- a/admin/notes/tree-sitter/html-manual/Language-Definitions.html +++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html @@ -236,18 +236,20 @@ assign field names to child nodes. For example, a at point. The mode-line will display

-
parent field: (child (grandchild (…)))
+
parent field: (node (child (…)))
 
-

child, grand, grand-grandchild, etc., are nodes that -begin at point. parent is the parent node of child. +

where node, child, etc, are nodes which begin at point. +parent is the parent of node. node is displayed in +bold typeface. field-names are field names of node and +child, etc.

-

If there is no node that starts at point, i.e., point is in the middle -of a node, then the mode-line only displays the smallest node that -spans the position of point, and its immediate parent. +

If no node starts at point, i.e., point is in the middle of a node, +then the mode line displays the earliest node that spans point, and +its immediate parent.

-

This minor mode doesn’t create parsers on its own. It simply uses the -first parser in (treesit-parser-list) (see Using Tree-sitter Parser). +

This minor mode doesn’t create parsers on its own. It uses the first +parser in (treesit-parser-list) (see Using Tree-sitter Parser).

Reading the grammar definition

diff --git a/admin/notes/tree-sitter/html-manual/Multiple-Languages.html b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html index eac142921f1..6d1800fad72 100644 --- a/admin/notes/tree-sitter/html-manual/Multiple-Languages.html +++ b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html @@ -67,7 +67,6 @@ Next:

narrowing), the recommended way is -instead to set regions of buffer text in which a parser will operate. +instead to set regions of buffer text (i.e., ranges) in which a parser +will operate. This section describes functions for setting and +getting ranges for a parser. +

+

Lisp programs should call treesit-update-ranges to make sure +the ranges for each parser are correct before using parsers in a +buffer, and call treesit-language-at to figure out the language +responsible for the text at some position. These two functions don’t +work by themselves, they need major modes to set +treesit-range-settings and +treesit-language-at-point-function, which do the actual work. +These functions and variables are explained in more detail towards the +end of the section.

+

Getting and setting ranges

+
Function: treesit-parser-set-included-ranges parser ranges

This function sets up parser to operate on ranges. The @@ -126,24 +139,6 @@ ranges, the return value is nil.

-
-
Function: treesit-set-ranges parser-or-lang ranges
-

Like treesit-parser-set-included-ranges, this function sets -the ranges of parser-or-lang to ranges. Conveniently, -parser-or-lang could be either a parser or a language. If it is -a language, this function looks for the first parser in -(treesit-parser-list) for that language in the current buffer, -and sets the ranges for it. -

- -
-
Function: treesit-get-ranges parser-or-lang
-

This function returns the ranges of parser-or-lang, like -treesit-parser-included-ranges. And like -treesit-set-ranges, parser-or-lang can be a parser or -a language symbol. -

-
Function: treesit-query-range source query &optional beg end

This function matches source with query and returns the @@ -166,57 +161,56 @@ range in which this function queries. treesit-query-error error if query is malformed.

-
-
Variable: treesit-range-functions
-

This variable holds the list of range functions. Font-locking and -indenting code use functions in this list to set correct ranges for -a language parser before using it. -

-

The signature of each function in the list should be: -

-
-
(start end &rest _)
-
+

Supporting multiple languages in Lisp programs

-

where start and end specify the region that is about to be -used. A range function only needs to (but is not limited to) update -ranges in that region. +

It should suffice for general Lisp programs to call the following two +functions in order to support program sources that mixes multiple +languages.

-

The functions in the list are called in order. -

-
-
Function: treesit-update-ranges &optional start end
-

This function is used by font-lock and indentation to update ranges -before using any parser. Each range function in -treesit-range-functions is called in-order. Arguments -start and end are passed to each range function. +

Function: treesit-update-ranges &optional beg end
+

This function updates ranges for parsers in the buffer. It makes sure +the parsers’ ranges are set correctly between beg and end, +according to treesit-range-settings. If omitted, beg +defaults to the beginning of the buffer, and end defaults to the +end of the buffer. +

+

For example, fontification functions use this function before querying +for nodes in a region.

-
Function: treesit-language-at pos
-

This function tries to figure out which language is responsible for -the text at buffer position pos. Under the hood it just calls -treesit-language-at-point-function. -

-

Various Lisp programs use this function. For example, the indentation -program uses this function to determine which language’s rule to use -in a multi-language buffer. So it is important to provide -treesit-language-at-point-function for a multi-language major -mode. +

This function returns the language of the text at buffer position +pos. Under the hood it calls +treesit-language-at-point-function and returns its return +value. If treesit-language-at-point-function is nil, +this function returns the language of the first parser in the returned +value of treesit-parser-list. If there is no parser in the +buffer, it returns nil.

-

An example

+

Supporting multiple languages in major modes

+ + +

Normally, in a set of languages that can be mixed together, there is a -major language and several embedded languages. A Lisp program usually -first parses the whole document with the major language’s parser, sets -ranges for the embedded languages, and then parses the embedded +host language and one or more embedded languages. A Lisp +program usually first parses the whole document with the host +language’s parser, retrieves some information, sets ranges for the +embedded languages with that information, and then parses the embedded languages.

-

Suppose we need to parse a very simple document that mixes -HTML, CSS and JavaScript: +

Take a buffer containing HTML, CSS and JavaScript +as an example. A Lisp program will first parse the whole buffer with +an HTML parser, then query the parser for +style_element and script_element nodes, which +correspond to CSS and JavaScript text, respectively. Then +it sets the range of the CSS and JavaScript parser to the +ranges in which their corresponding nodes span. +

+

Given a simple HTML document:

<html>
@@ -225,8 +219,8 @@ languages.
 </html>
 
-

We first parse with HTML, then set ranges for CSS -and JavaScript: +

a Lisp program will first parse with a HTML parser, then set +ranges for CSS and JavaScript parsers:

;; Create parsers.
@@ -251,10 +245,76 @@ and JavaScript:
 (treesit-parser-set-included-ranges js js-range)
 
-

We use a query pattern (style_element (raw_text) @capture) -to find CSS nodes in the HTML parse tree. For how -to write query patterns, see Pattern Matching Tree-sitter Nodes. +

Emacs automates this process in treesit-update-ranges. A +multi-language major mode should set treesit-range-settings so +that treesit-update-ranges knows how to perform this process +automatically. Major modes should use the helper function +treesit-range-rules to generate a value that can be assigned to +treesit-range-settings. The settings in the following example +directly translate into operations shown above.

+
+
(setq-local treesit-range-settings
+            (treesit-range-rules
+             :embed 'javascript
+             :host 'html
+             '((script_element (raw_text) @capture))
+
+
+
             :embed 'css
+             :host 'html
+             '((style_element (raw_text) @capture))))
+
+ +
+
Function: treesit-range-rules &rest query-specs
+

This function is used to set treesit-range-settings. It +takes care of compiling queries and other post-processing, and outputs +a value that treesit-range-settings can have. +

+

It takes a series of query-specs, where each query-spec is +a query preceded by zero or more pairs of keyword and +value. Each query is a tree-sitter query in either the +string, s-expression or compiled form, or a function. +

+

If query is a tree-sitter query, it should be preceeded by two +:keyword value pairs, where the :embed keyword +specifies the embedded language, and the :host keyword +specified the host language. +

+

treesit-update-ranges uses query to figure out how to set +the ranges for parsers for the embedded language. It queries +query in a host language parser, computes the ranges in which +the captured nodes span, and applies these ranges to embedded +language parsers. +

+

If query is a function, it doesn’t need any :keyword and +value pair. It should be a function that takes 2 arguments, +start and end, and sets the ranges for parsers in the +current buffer in the region between start and end. It is +fine for this function to set ranges in a larger region that +encompasses the region between start and end. +

+ +
+
Variable: treesit-range-settings
+

This variable helps treesit-update-ranges in updating the +ranges for parsers in the buffer. It is a list of settings +where the exact format of a setting is considered internal. You +should use treesit-range-rules to generate a value that this +variable can have. +

+
+ + +
+
Variable: treesit-language-at-point-function
+

This variable’s value should be a function that takes a single +argument, pos, which is a buffer position, and returns the +language of the buffer text at pos. This variable is used by +treesit-language-at. +

+
diff --git a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html index 4f2933c985d..72d82e6ee6d 100644 --- a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html +++ b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html @@ -111,7 +111,7 @@ would be highlighted in font-lock-keyword face. treesit-major-mode-setup.

-
Function: treesit-font-lock-rules :keyword value query...
+
Function: treesit-font-lock-rules &rest query-specs

This function is used to set treesit-font-lock-settings. It takes care of compiling queries and other post-processing, and outputs a value that treesit-font-lock-settings accepts. Here’s an @@ -129,13 +129,18 @@ example: "(script_element) @font-lock-builtin-face")

-

This function takes a list of text or s-exp queries. Before each -query, there are :keyword-value pairs that configure -that query. The :lang keyword sets the query’s language and -every query must specify the language. The :feature keyword -sets the feature name of the query. Users can control which features -are enabled with font-lock-maximum-decoration and -treesit-font-lock-feature-list (see below). +

This function takes a series of query-specs, where each +query-spec is a query preceded by multiple pairs of +:keyword and value. Each query is a tree-sitter +query in either the string, s-expression or compiled form. +

+

For each query, the :keyword and value pairs add +meta information to it. The :lang keyword declares +query’s language. The :feature keyword sets the feature +name of query. Users can control which features are enabled +with font-lock-maximum-decoration and +treesit-font-lock-feature-list (described below). These two +keywords are mandated.

Other keywords are optional:

@@ -148,7 +153,7 @@ are enabled with font-lock-maximum-decoration and keepFill-in regions without an existing face -

Lisp programs mark patterns in the query with capture names (names +

Lisp programs mark patterns in query with capture names (names that starts with @), and tree-sitter will return matched nodes tagged with those same capture names. For the purpose of fontification, capture names in query should be face names like @@ -230,9 +235,10 @@ these common features.

Variable: treesit-font-lock-settings

A list of settings for tree-sitter based font lock. The exact format -of this variable is considered internal. One should always use +of each setting is considered internal. One should always use treesit-font-lock-rules to set this variable. -

+

+

Multi-language major modes should provide range functions in treesit-range-functions, and Emacs will set the ranges diff --git a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html index 2fdb50df7c1..5ea1f9bc332 100644 --- a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html +++ b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html @@ -106,7 +106,8 @@ the current line to matcher; if it returns non-nil, this rule is applicable. Then Emacs passes the node to anchor, which returns a buffer position. Emacs takes the column number of that position, adds offset to it, and the result is the indentation -column for the current line. +column for the current line. offset can be an integer or a +variable whose value is an integer.

The matcher and anchor are functions, and Emacs provides convenient defaults for them. @@ -117,8 +118,8 @@ arguments: node, parent, and bol. The argument position of the first non-whitespace character after the beginning of the line. The argument node is the largest (highest-in-tree) node that starts at that position; and parent is the parent of -node. However, when that position is on a whitespace or inside -a multi-line string, no node that starts at that position, so +node. However, when that position is in a whitespace or inside +a multi-line string, no node can start at that position, so node is nil. In that case, parent would be the smallest node that spans that position.

@@ -215,6 +216,12 @@ sibling of node.

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the first non-whitespace charater on the previous line. +

+
+
point-min
+

This anchor is a function is called with 3 arguments: node, +parent, and bol, and returns the beginning of the buffer. +This is useful as the beginning of the buffer is always at column 0.

diff --git a/admin/notes/tree-sitter/html-manual/Retrieving-Node.html b/admin/notes/tree-sitter/html-manual/Retrieving-Node.html index e1de2007077..0c086dab91d 100644 --- a/admin/notes/tree-sitter/html-manual/Retrieving-Node.html +++ b/admin/notes/tree-sitter/html-manual/Retrieving-Node.html @@ -262,10 +262,9 @@ is non-nil, it looks for smallest named child.

This function traverses the subtree of node (including node itself), looking for a node for which predicate returns non-nil. predicate is a regexp that is matched -(case-insensitively) against each node’s type, or a predicate function -that takes a node and returns non-nil if the node matches. The -function returns the first node that matches, or nil if none -does. +against each node’s type, or a predicate function that takes a node +and returns non-nil if the node matches. The function returns +the first node that matches, or nil if none does.

By default, this function only traverses named nodes, but if all is non-nil, it traverses all the nodes. If backward is @@ -279,9 +278,9 @@ down the tree.

Function: treesit-search-forward start predicate &optional backward all

Like treesit-search-subtree, this function also traverses the parse tree and matches each node with predicate (except for -start), where predicate can be a (case-insensitive) regexp -or a function. For a tree like the below where start is marked -S, this function traverses as numbered from 1 to 12: +start), where predicate can be a regexp or a function. +For a tree like the below where start is marked S, this function +traverses as numbered from 1 to 12:

              12
@@ -336,8 +335,8 @@ as in treesit-search-forward.
 

It takes the subtree under root, and combs it so only the nodes that match predicate are left. Like previous functions, the predicate can be a regexp string that matches against each -node’s type case-insensitively, or a function that takes a node and -return non-nil if it matches. +node’s type, or a function that takes a node and return non-nil +if it matches.

For example, for a subtree on the left that consist of both numbers and letters, if predicate is “letter only”, the returned tree -- 2.39.5