From eb1a35adc1c5a1a9d14ec8594580c5eb0e3d28fe Mon Sep 17 00:00:00 2001 From: Yuan Fu Date: Mon, 21 Nov 2022 13:33:03 -0800 Subject: [PATCH] ; Update tree-sitter starter guide * admin/notes/tree-sitter/starter-guide: Reflect recent changes. * admin/notes/tree-sitter/html-manual/Using-Parser.html: * admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html: * admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html: * admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html: * admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html: * admin/notes/tree-sitter/html-manual/Multiple-Languages.html: * admin/notes/tree-sitter/html-manual/Language-Definitions.html: Update. --- .../html-manual/Language-Definitions.html | 29 +++++++-- .../html-manual/Multiple-Languages.html | 6 +- .../Parser_002dbased-Font-Lock.html | 57 ++++++++---------- .../Parser_002dbased-Indentation.html | 28 ++++++++- .../html-manual/Parsing-Program-Source.html | 2 +- .../html-manual/Tree_002dsitter-C-API.html | 2 +- .../tree-sitter/html-manual/Using-Parser.html | 48 ++++++++++++++- admin/notes/tree-sitter/starter-guide | 59 ++++++++----------- 8 files changed, 149 insertions(+), 82 deletions(-) diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html index 4fd7eb5687f..6dd589f8259 100644 --- a/admin/notes/tree-sitter/html-manual/Language-Definitions.html +++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html @@ -230,19 +230,38 @@ assign field names to child nodes. For example, a body: (compound_statement)) +

Exploring the syntax tree

+ + + +

To aid in understanding the syntax of a language and in debugging of +Lisp program that use the syntax tree, Emacs provides an “explore” +mode, which displays the syntax tree of the source in the current +buffer in real time. Emacs also comes with an “inspect mode”, which +displays information of the nodes at point in the mode-line. +

+
+
Command: treesit-explore-mode
+

This mode pops up a window displaying the syntax tree of the source in +the current buffer. Selecting text in the source buffer highlights +the corresponding nodes in the syntax tree display. Clicking +on nodes in the syntax tree highlights the corresponding text in the +source buffer. +

+
Command: treesit-inspect-mode

This minor mode displays on the mode-line the node that starts -at point. The mode-line will display +at point. For example, the mode-line can display

parent field: (node (child (…)))
 
-

where node, child, etc, are nodes which begin at point. +

where node, child, etc., are nodes which begin at point. parent is the parent of node. node is displayed in -bold typeface. field-names are field names of node and -child, etc. +a bold typeface. field-names are field names of node and +of child, etc.

If no node starts at point, i.e., point is in the middle of a node, then the mode line displays the earliest node that spans point, and @@ -343,7 +362,7 @@ language definition.

token(rule)

marks rule to produce a single leaf node. That is, instead of generating a parent node with individual child nodes under it, -everything is combined into a single leaf node. +everything is combined into a single leaf node. See Retrieving Nodes.

token.immediate(rule)

Normally, grammar rules ignore preceding whitespace; this diff --git a/admin/notes/tree-sitter/html-manual/Multiple-Languages.html b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html index 6d1800fad72..0ae0b1897e1 100644 --- a/admin/notes/tree-sitter/html-manual/Multiple-Languages.html +++ b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html @@ -273,12 +273,12 @@ takes care of compiling queries and other post-processing, and outputs a value that treesit-range-settings can have.

It takes a series of query-specs, where each query-spec is -a query preceded by zero or more pairs of keyword and -value. Each query is a tree-sitter query in either the +a query preceded by zero or more keyword/value +pairs. Each query is a tree-sitter query in either the string, s-expression or compiled form, or a function.

If query is a tree-sitter query, it should be preceeded by two -:keyword value pairs, where the :embed keyword +:keyword/value pairs, where the :embed keyword specifies the embedded language, and the :host keyword specified the host language.

diff --git a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html index 72d82e6ee6d..e04a730b05c 100644 --- a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html +++ b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html @@ -130,17 +130,17 @@ example:

This function takes a series of query-specs, where each -query-spec is a query preceded by multiple pairs of -:keyword and value. Each query is a tree-sitter -query in either the string, s-expression or compiled form. -

-

For each query, the :keyword and value pairs add -meta information to it. The :lang keyword declares -query’s language. The :feature keyword sets the feature -name of query. Users can control which features are enabled -with font-lock-maximum-decoration and +query-spec is a query preceded by one or more +:keyword/value pairs. Each query is a +tree-sitter query in either the string, s-expression or compiled form. +

+

For each query, the :keyword/value pairs that +precede it add meta information to it. The :lang keyword +declares query’s language. The :feature keyword sets the +feature name of query. Users can control which features are +enabled with font-lock-maximum-decoration and treesit-font-lock-feature-list (described below). These two -keywords are mandated. +keywords are mandatory.

Other keywords are optional:

@@ -177,24 +177,6 @@ priority. If a capture name is neither a face nor a function, it is ignored.

-

Contextual entities, like multi-line strings, or /* */ style -comments, need special care, because change in these entities might -cause change in a large portion of the buffer. For example, inserting -the closing comment delimiter */ will change all the text -between it and the opening delimiter to comment face. Such entities -should be captured in a special name contextual, so Emacs can -correctly update their fontification. Here is an example for -comments: -

-
-
(treesit-font-lock-rules
- :language 'javascript
- :feature 'comment
- :override t
- '((comment) @font-lock-comment-face)
-   (comment) @contextual))
-
-
Variable: treesit-font-lock-feature-list

This is a list of lists of feature symbols. Each element of the list @@ -208,11 +190,20 @@ activated. list disables the corresponding query during font-lock.

Common feature names, for many programming languages, include -function-name, type, variable-name (left-hand-side or LHS of -assignments), builtin, constant, keyword, string-interpolation, -comment, doc, string, operator, preprocessor, escape-sequence, and key -(in key-value pairs). Major modes are free to subdivide or extend -these common features. +definition, type, assignment, builtin, +constant, keyword, string-interpolation, +comment, doc, string, operator, +preprocessor, escape-sequence, and key. Major +modes are free to subdivide or extend these common features. +

+

Some of these features warrant some explanation: definition +highlights whatever is being defined, e.g., the function name in a +function definition, the struct name in a struct definition, the +variable name in a variable definition; assignment highlights +the whatever is being assigned to, e.g., the variable or field in an +assignment statement; key highlights keys in key-value pairs, +e.g., keys in a JSON object, or a Python dictionary; doc +highlights docstrings or doc-comments.

For example, the value of this variable could be:

diff --git a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html index 5ea1f9bc332..3027bbaae95 100644 --- a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html +++ b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html @@ -183,6 +183,14 @@ first child where parent is argument_list, use
(match nil "argument_list" nil nil 0 0)
 
+
+
comment-end
+

This matcher is a function that is called with 3 arguments: +node, parent, and bol, and returns non-nil if +point is before a comment ending token. Comment ending tokens are +defined by regular expression treesit-comment-end +(see treesit-comment-end). +

first-sibling

This anchor is a function that is called with 3 arguments: node, @@ -219,12 +227,28 @@ charater on the previous line.

point-min
-

This anchor is a function is called with 3 arguments: node, +

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the beginning of the buffer. This is useful as the beginning of the buffer is always at column 0. +

+
+
comment-start
+

This anchor is a function that is called with 3 arguments: node, +parent, and bol, and returns the position right after the +comment-start token. Comment-start tokens are defined by regular +expression treesit-comment-start (see treesit-comment-start). This function assumes parent is +the comment node. +

+
+
coment-start-skip
+

This anchor is a function that is called with 3 arguments: node, +parent, and bol, and returns the position after the +comment-start token and any whitespace characters following that +token. Comment-start tokens are defined by regular expression +treesit-comment-start. This function assumes parent is +the comment node.

-

Indentation utilities

diff --git a/admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html b/admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html index ea22421ac4c..a0b5775f11f 100644 --- a/admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html +++ b/admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html @@ -106,7 +106,7 @@ source files that mix multiple programming languages.