From: Yuan Fu Date: Wed, 5 Oct 2022 21:11:33 +0000 (-0700) Subject: Add tree-sitter admin notes X-Git-Tag: emacs-29.0.90~1857 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=cb183f6467401fb5ed2b7fc98ca75be9d943cbe3;p=emacs.git Add tree-sitter admin notes starter-guide: Guide on writing major mode features. build-module: Script for building official language definitions. html-manual: HTML version of the manual for easy access. * admin/notes/tree-sitter/build-module/README: New file. * admin/notes/tree-sitter/build-module/batch.sh: New file. * admin/notes/tree-sitter/build-module/build.sh: New file. * admin/notes/tree-sitter/starter-guide: New file. * admin/notes/tree-sitter/html-manual/Accessing-Node.html: New file. * admin/notes/tree-sitter/html-manual/Language-Definitions.html: New file. * admin/notes/tree-sitter/html-manual/Multiple-Languages.html: New file. * admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html: New file. * admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html: New file. * admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html: New file. * admin/notes/tree-sitter/html-manual/Pattern-Matching.html: New file. * admin/notes/tree-sitter/html-manual/Retrieving-Node.html: New file. * admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html: New file. * admin/notes/tree-sitter/html-manual/Using-Parser.html: New file. * admin/notes/tree-sitter/html-manual/build-manual.sh: New file. * admin/notes/tree-sitter/html-manual/manual.css: New file. --- diff --git a/admin/notes/tree-sitter/build-module/README b/admin/notes/tree-sitter/build-module/README new file mode 100644 index 00000000000..ee6076c119c --- /dev/null +++ b/admin/notes/tree-sitter/build-module/README @@ -0,0 +1,17 @@ +To build the language definition for a particular language, run + + ./build.sh + +eg, + + ./build.sh html + +The dynamic module will be in /dist directory + +To build all modules at once, run + + ./batch.sh + +This gives you C, JSON, Go, HTML, Javascript, CSS, Python, Typescript, +C#, C++, Rust. More can be added to batch.sh unless it's directory +strucure is not standard. \ No newline at end of file diff --git a/admin/notes/tree-sitter/build-module/batch.sh b/admin/notes/tree-sitter/build-module/batch.sh new file mode 100755 index 00000000000..deed18978a1 --- /dev/null +++ b/admin/notes/tree-sitter/build-module/batch.sh @@ -0,0 +1,20 @@ +#!/bin/bash + +languages=( + 'c' + 'cpp' + 'css' + 'c-sharp' + 'go' + 'html' + 'javascript' + 'json' + 'python' + 'rust' + 'typescript' +) + +for language in "${languages[@]}" +do + ./build.sh $language +done diff --git a/admin/notes/tree-sitter/build-module/build.sh b/admin/notes/tree-sitter/build-module/build.sh new file mode 100755 index 00000000000..16792d05cbb --- /dev/null +++ b/admin/notes/tree-sitter/build-module/build.sh @@ -0,0 +1,62 @@ +#!/bin/bash + +lang=$1 + +if [ $(uname) == "Darwin" ] +then + soext="dylib" +else + soext="so" +fi + +echo "Building ${lang}" + +# Retrieve sources. +git clone "https://github.com/tree-sitter/tree-sitter-${lang}.git" \ + --depth 1 --quiet +if [ "${lang}" == "typescript" ] +then + lang="typescript/tsx" +fi +cp tree-sitter-lang.in "tree-sitter-${lang}/src" +cp emacs-module.h "tree-sitter-${lang}/src" +cp "tree-sitter-${lang}/grammar.js" "tree-sitter-${lang}/src" +cd "tree-sitter-${lang}/src" + +if [ "${lang}" == "typescript/tsx" ] +then + lang="typescript" +fi + +# Build. +cc -c -I. parser.c +# Compile scanner.c. +if test -f scanner.c +then + cc -fPIC -c -I. scanner.c +fi +# Compile scanner.cc. +if test -f scanner.cc +then + c++ -fPIC -I. -c scanner.cc +fi +# Link. +if test -f scanner.cc +then + c++ -fPIC -shared *.o -o "libtree-sitter-${lang}.${soext}" +else + cc -fPIC -shared *.o -o "libtree-sitter-${lang}.${soext}" +fi + +# Copy out. + +if [ "${lang}" == "typescript" ] +then + cp "libtree-sitter-${lang}.${soext}" .. + cd .. +fi + +mkdir -p ../../dist +cp "libtree-sitter-${lang}.${soext}" ../../dist +cd ../../ +rm -rf "tree-sitter-${lang}" diff --git a/admin/notes/tree-sitter/html-manual/Accessing-Node.html b/admin/notes/tree-sitter/html-manual/Accessing-Node.html new file mode 100644 index 00000000000..00ac63b8339 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Accessing-Node.html @@ -0,0 +1,206 @@ + + + + + + +Accessing Node (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + + +
+ +
+

37.4 Accessing Node Information

+ +

Before going further, make sure you have read the basic conventions +about tree-sitter nodes in the previous node. +

+

Basic information

+ +

Every node is associated with a parser, and that parser is associated +with a buffer. The following functions let you retrieve them. +

+
+
Function: treesit-node-parser node
+

This function returns node’s associated parser. +

+ +
+
Function: treesit-node-buffer node
+

This function returns node’s parser’s associated buffer. +

+ +
+
Function: treesit-node-language node
+

This function returns node’s parser’s associated language. +

+ +

Each node represents a piece of text in the buffer. Functions below +finds relevant information about that text. +

+
+
Function: treesit-node-start node
+

Return the start position of node. +

+ +
+
Function: treesit-node-end node
+

Return the end position of node. +

+ +
+
Function: treesit-node-text node &optional object
+

Returns the buffer text that node represents. (If node is +retrieved from parsing a string, it will be text from that string.) +

+ +

Here are some basic checks on tree-sitter nodes. +

+
+
Function: treesit-node-p object
+

Checks if object is a tree-sitter syntax node. +

+ +
+
Function: treesit-node-eq node1 node2
+

Checks if node1 and node2 are the same node in a syntax +tree. +

+ +

Property information

+ +

In general, nodes in a concrete syntax tree fall into two categories: +named nodes and anonymous nodes. Whether a node is named +or anonymous is determined by the language definition +(see named node). +

+ +

Apart from being named/anonymous, a node can have other properties. A +node can be “missing”: missing nodes are inserted by the parser in +order to recover from certain kinds of syntax errors, i.e., something +should probably be there according to the grammar, but not there. +

+ +

A node can be “extra”: extra nodes represent things like comments, +which can appear anywhere in the text. +

+ +

A node “has changes” if the buffer changed since when the node is +retrieved, i.e., outdated. +

+ +

A node “has error” if the text it spans contains a syntax error. It +can be the node itself has an error, or one of its +children/grandchildren... has an error. +

+
+
Function: treesit-node-check node property
+

This function checks if node has property. property +can be 'named, 'missing, 'extra, +'has-changes, or 'has-error. +

+ + +
+
Function: treesit-node-type node
+

Named nodes have “types” (see node type). +For example, a named node can be a string_literal node, where +string_literal is its type. +

+

This function returns node’s type as a string. +

+ +

Information as a child or parent

+ +
+
Function: treesit-node-index node &optional named
+

This function returns the index of node as a child node of its +parent. If named is non-nil, it only count named nodes +(see named node). +

+ +
+
Function: treesit-node-field-name node
+

A child of a parent node could have a field name (see field name). This function returns the field name +of node as a child of its parent. +

+ +
+
Function: treesit-node-field-name-for-child node n
+

This function returns the field name of the n’th child of +node. +

+ +
+
Function: treesit-child-count node &optional named
+

This function finds the number of children of node. If +named is non-nil, it only counts named child (see named node). +

+ +
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html new file mode 100644 index 00000000000..ba3eeb9eeb9 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html @@ -0,0 +1,326 @@ + + + + + + +Language Definitions (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + +
+ +
+

37.1 Tree-sitter Language Definitions

+ +

Loading a language definition

+ +

Tree-sitter relies on language definitions to parse text in that +language. In Emacs, A language definition is represented by a symbol. +For example, C language definition is represented as c, and +c can be passed to tree-sitter functions as the language +argument. +

+ + + +

Tree-sitter language definitions are distributed as dynamic libraries. +In order to use a language definition in Emacs, you need to make sure +that the dynamic library is installed on the system. Emacs looks for +language definitions under load paths in +treesit-extra-load-path, user-emacs-directory/tree-sitter, +and system default locations for dynamic libraries, in that order. +Emacs tries each extensions in treesit-load-suffixes. If Emacs +cannot find the library or has problem loading it, Emacs signals +treesit-load-language-error. The signal data is a list of +specific error messages. +

+
+
Function: treesit-language-available-p language
+

This function checks whether the dynamic library for language is +present on the system, and return non-nil if it is. +

+ + +

By convention, the dynamic library for language is +libtree-sitter-language.ext, where ext is the +system-specific extension for dynamic libraries. Also by convention, +the function provided by that library is named +tree_sitter_language. If a language definition doesn’t +follow this convention, you should add an entry +

+
+
(language library-base-name function-name)
+
+ +

to treesit-load-name-override-list, where +library-base-name is the base filename for the dynamic library +(conventionally libtree-sitter-language), and +function-name is the function provided by the library +(conventionally tree_sitter_language). For example, +

+
+
(cool-lang "libtree-sitter-coool" "tree_sitter_cooool")
+
+ +

for a language too cool to abide by conventions. +

+
+
Function: treesit-language-version &optional min-compatible
+

Tree-sitter library has a language version, a language +definition’s version needs to match this version to be compatible. +

+

This function returns tree-sitter library’s language version. If +min-compatible is non-nil, it returns the minimal compatible +version. +

+ +

Concrete syntax tree

+ +

A syntax tree is what a parser generates. In a syntax tree, each node +represents a piece of text, and is connected to each other by a +parent-child relationship. For example, if the source text is +

+
+
1 + 2
+
+ +

its syntax tree could be +

+
+
                  +--------------+
+                  | root "1 + 2" |
+                  +--------------+
+                         |
+        +--------------------------------+
+        |       expression "1 + 2"       |
+        +--------------------------------+
+           |             |            |
++------------+   +--------------+   +------------+
+| number "1" |   | operator "+" |   | number "2" |
++------------+   +--------------+   +------------+
+
+ +

We can also represent it in s-expression: +

+
+
(root (expression (number) (operator) (number)))
+
+ +

Node types

+ + + + +

Names like root, expression, number, +operator are nodes’ type. However, not all nodes in a +syntax tree have a type. Nodes that don’t are anonymous nodes, +and nodes with a type are named nodes. Anonymous nodes are +tokens with fixed spellings, including punctuation characters like +bracket ‘]’, and keywords like return. +

+

Field names

+ + +

To make the syntax tree easier to +analyze, many language definitions assign field names to child +nodes. For example, a function_definition node could have a +declarator and a body: +

+
+
(function_definition
+ declarator: (declaration)
+ body: (compound_statement))
+
+ +
+
Command: treesit-inspect-mode
+

This minor mode displays the node that starts at point in +mode-line. The mode-line will display +

+
+
parent field-name: (child (grand-child (...)))
+
+ +

child, grand-child, and grand-grand-child, etc, are +nodes that have their beginning at point. And parent is the +parent of child. +

+

If there is no node that starts at point, i.e., point is in the middle +of a node, then the mode-line only displays the smallest node that +spans point, and its immediate parent. +

+

This minor mode doesn’t create parsers on its own. It simply uses the +first parser in (treesit-parser-list) (see Using Tree-sitter Parser). +

+ +

Reading the grammar definition

+ +

Authors of language definitions define the grammar of a +language, and this grammar determines how does a parser construct a +concrete syntax tree out of the text. In order to use the syntax +tree effectively, we need to read the grammar file. +

+

The grammar file is usually grammar.js in a language +definition’s project repository. The link to a language definition’s +home page can be found in tree-sitter’s homepage +(https://tree-sitter.github.io/tree-sitter). +

+

The grammar is written in JavaScript syntax. For example, the rule +matching a function_definition node looks like +

+
+
function_definition: $ => seq(
+  $.declaration_specifiers,
+  field('declarator', $.declaration),
+  field('body', $.compound_statement)
+)
+
+ +

The rule is represented by a function that takes a single argument +$, representing the whole grammar. The function itself is +constructed by other functions: the seq function puts together a +sequence of children; the field function annotates a child with +a field name. If we write the above definition in BNF syntax, it +would look like +

+
+
function_definition :=
+  <declaration_specifiers> <declaration> <compound_statement>
+
+ +

and the node returned by the parser would look like +

+
+
(function_definition
+  (declaration_specifier)
+  declarator: (declaration)
+  body: (compound_statement))
+
+ +

Below is a list of functions that one will see in a grammar +definition. Each function takes other rules as arguments and returns +a new rule. +

+
    +
  • seq(rule1, rule2, ...) matches each rule one after another. + +
  • choice(rule1, rule2, ...) matches one of the rules in its +arguments. + +
  • repeat(rule) matches rule for zero or more times. +This is like the ‘*’ operator in regular expressions. + +
  • repeat1(rule) matches rule for one or more times. +This is like the ‘+’ operator in regular expressions. + +
  • optional(rule) matches rule for zero or one time. +This is like the ‘?’ operator in regular expressions. + +
  • field(name, rule) assigns field name name to the child +node matched by rule. + +
  • alias(rule, alias) makes nodes matched by rule appear as +alias in the syntax tree generated by the parser. For example, + +
    +
    alias(preprocessor_call_exp, call_expression)
    +
    + +

    makes any node matched by preprocessor_call_exp to appear as +call_expression. +

+ +

Below are grammar functions less interesting for a reader of a +language definition. +

+
    +
  • token(rule) marks rule to produce a single leaf node. +That is, instead of generating a parent node with individual child +nodes under it, everything is combined into a single leaf node. + +
  • Normally, grammar rules ignore preceding whitespaces, +token.immediate(rule) changes rule to match only when +there is no preceding whitespaces. + +
  • prec(n, rule) gives rule a level n precedence. + +
  • prec.left([n,] rule) marks rule as left-associative, +optionally with level n. + +
  • prec.right([n,] rule) marks rule as right-associative, +optionally with level n. + +
  • prec.dynamic(n, rule) is like prec, but the precedence +is applied at runtime instead. +
+ +

The tree-sitter project talks about writing a grammar in more detail: +https://tree-sitter.github.io/tree-sitter/creating-parsers. +Read especially “The Grammar DSL” section. +

+
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/Multiple-Languages.html b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html new file mode 100644 index 00000000000..1ee2df7f442 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html @@ -0,0 +1,255 @@ + + + + + + +Multiple Languages (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + + +
+ +
+

37.6 Parsing Text in Multiple Languages

+ +

Sometimes, the source of a programming language could contain sources +of other languages, HTML + CSS + JavaScript is one example. In that +case, we need to assign individual parsers to text segments written in +different languages. Traditionally this is achieved by using +narrowing. While tree-sitter works with narrowing (see narrowing), the recommended way is to set ranges in which +a parser will operate. +

+
+
Function: treesit-parser-set-included-ranges parser ranges
+

This function sets the range of parser to ranges. Then +parser will only read the text covered in each range. Each +range in ranges is a list of cons (beg +. end). +

+

Each range in ranges must come in order and not overlap. That +is, in pseudo code: +

+
+
(cl-loop for idx from 1 to (1- (length ranges))
+         for prev = (nth (1- idx) ranges)
+         for next = (nth idx ranges)
+         should (<= (car prev) (cdr prev)
+                    (car next) (cdr next)))
+
+ + +

If ranges violates this constraint, or something else went +wrong, this function signals a treesit-range-invalid. The +signal data contains a specific error message and the ranges we are +trying to set. +

+

This function can also be used for disabling ranges. If ranges +is nil, the parser is set to parse the whole buffer. +

+

Example: +

+
+
(treesit-parser-set-included-ranges
+ parser '((1 . 9) (16 . 24) (24 . 25)))
+
+
+ +
+
Function: treesit-parser-included-ranges parser
+

This function returns the ranges set for parser. The return +value is the same as the ranges argument of +treesit-parser-included-ranges: a list of cons +(beg . end). And if parser doesn’t have any +ranges, the return value is nil. +

+
+
(treesit-parser-included-ranges parser)
+    ⇒ ((1 . 9) (16 . 24) (24 . 25))
+
+
+ +
+
Function: treesit-set-ranges parser-or-lang ranges
+

Like treesit-parser-set-included-ranges, this function sets +the ranges of parser-or-lang to ranges. Conveniently, +parser-or-lang could be either a parser or a language. If it is +a language, this function looks for the first parser in +(treesit-parser-list) for that language in the current buffer, +and set range for it. +

+ +
+
Function: treesit-get-ranges parser-or-lang
+

This function returns the ranges of parser-or-lang, like +treesit-parser-included-ranges. And like +treesit-set-ranges, parser-or-lang can be a parser or +a language symbol. +

+ +
+
Function: treesit-query-range source query &optional beg end
+

This function matches source with query and returns the +ranges of captured nodes. The return value has the same shape of +other functions: a list of (beg . end). +

+

For convenience, source can be a language symbol, a parser, or a +node. If a language symbol, this function matches in the root node of +the first parser using that language; if a parser, this function +matches in the root node of that parser; if a node, this function +matches in that node. +

+

Parameter query is the query used to capture nodes +(see Pattern Matching Tree-sitter Nodes). The capture names don’t matter. Parameter +beg and end, if both non-nil, limits the range in which +this function queries. +

+

Like other query functions, this function raises an +treesit-query-error if query is malformed. +

+ +
+
Function: treesit-language-at point
+

This function tries to figure out which language is responsible for +the text at point. It goes over each parser in +(treesit-parser-list) and see if that parser’s range covers +point. +

+ +
+
Variable: treesit-range-functions
+

A list of range functions. Font-locking and indenting code uses +functions in this alist to set correct ranges for a language parser +before using it. +

+

The signature of each function should be +

+
+
(start end &rest _)
+
+ +

where start and end marks the region that is about to be +used. A range function only need to (but not limited to) update +ranges in that region. +

+

Each function in the list is called in-order. +

+ +
+
Function: treesit-update-ranges &optional start end
+

This function is used by font-lock and indent to update ranges before +using any parser. Each range function in +treesit-range-functions is called in-order. Arguments +start and end are passed to each range function. +

+ +

An example

+ +

Normally, in a set of languages that can be mixed together, there is a +major language and several embedded languages. We first parse the +whole document with the major language’s parser, set ranges for the +embedded languages, then parse the embedded languages. +

+

Suppose we want to parse a very simple document that mixes HTML, CSS +and JavaScript: +

+
+
<html>
+  <script>1 + 2</script>
+  <style>body { color: "blue"; }</style>
+</html>
+
+ +

We first parse with HTML, then set ranges for CSS and JavaScript: +

+
+
;; Create parsers.
+(setq html (treesit-get-parser-create 'html))
+(setq css (treesit-get-parser-create 'css))
+(setq js (treesit-get-parser-create 'javascript))
+
+;; Set CSS ranges.
+(setq css-range
+      (treesit-query-range
+       'html
+       "(style_element (raw_text) @capture)"))
+(treesit-parser-set-included-ranges css css-range)
+
+;; Set JavaScript ranges.
+(setq js-range
+      (treesit-query-range
+       'html
+       "(script_element (raw_text) @capture)"))
+(treesit-parser-set-included-ranges js js-range)
+
+ +

We use a query pattern (style_element (raw_text) @capture) to +find CSS nodes in the HTML parse tree. For how to write query +patterns, see Pattern Matching Tree-sitter Nodes. +

+
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html new file mode 100644 index 00000000000..ec89b7749c8 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html @@ -0,0 +1,160 @@ + + + + + + +Parser-based Font Lock (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + +
+ +
+

24.6.10 Parser-based Font Lock

+ + +

Besides simple syntactic font lock and regexp-based font lock, Emacs +also provides complete syntactic font lock with the help of a parser, +currently provided by the tree-sitter library (see Parsing Program Source). +

+
+
Function: treesit-font-lock-enable
+

This function enables parser-based font lock in the current buffer. +

+ +

Parser-based font lock and other font lock mechanism are not mutually +exclusive. By default, if enabled, parser-based font lock runs first, +then the simple syntactic font lock (if enabled), then regexp-based +font lock. +

+

Although parser-based font lock doesn’t share the same customization +variables with regexp-based font lock, parser-based font lock uses +similar customization schemes. The tree-sitter counterpart of +font-lock-keywords is treesit-font-lock-settings. +

+
+
Function: treesit-font-lock-rules :keyword value query...
+

This function is used to set treesit-font-lock-settings. It +takes care of compiling queries and other post-processing and outputs +a value that treesit-font-lock-settings accepts. An example: +

+
+
(treesit-font-lock-rules
+ :language 'javascript
+ :override t
+ '((true) @font-lock-constant-face
+   (false) @font-lock-constant-face)
+ :language 'html
+ "(script_element) @font-lock-builtin-face")
+
+ +

This function takes a list of text or s-exp queries. Before each +query, there are :keyword and value pairs that configure +that query. The :lang keyword sets the query’s language and +every query must specify the language. Other keywords are optional: +

+ + + + + + + +
KeywordValueDescription
:overridenilIf the region already has a face, discard the new face
tAlways apply the new face
appendAppend the new face to existing ones
prependPrepend the new face to existing ones
keepFill-in regions without an existing face
+ +

Capture names in query should be face names like +font-lock-keyword-face. The captured node will be fontified +with that face. Capture names can also be function names, in which +case the function is called with (start end node), +where start and end are the start and end position of the +node in buffer, and node is the node itself. If a capture name +is both a face and a function, the face takes priority. If a capture +name is not a face name nor a function name, it is ignored. +

+ +
+
Variable: treesit-font-lock-settings
+

A list of settings for tree-sitter font lock. The exact format +of this variable is considered internal. One should always use +treesit-font-lock-rules to set this variable. +

+

Each setting is of form +

+
+
(language query)
+
+ +

Each setting controls one parser (often of different language). +And language is the language symbol (see Tree-sitter Language Definitions); query is the query (see Pattern Matching Tree-sitter Nodes). +

+ +

Multi-language major modes should provide range functions in +treesit-range-functions, and Emacs will set the ranges +accordingly before fontifing a region (see Parsing Text in Multiple Languages). +

+
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html new file mode 100644 index 00000000000..691c8fba8c7 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html @@ -0,0 +1,244 @@ + + + + + + +Parser-based Indentation (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + +
+ +
+

24.7.2 Parser-based Indentation

+ + +

When built with the tree-sitter library (see Parsing Program Source), Emacs could parse program source and produce a syntax tree. +And this syntax tree can be used for indentation. For maximum +flexibility, we could write a custom indent function that queries the +syntax tree and indents accordingly for each language, but that would +be a lot of work. It is more convenient to use the simple indentation +engine described below: we only need to write some indentation rules +and the engine takes care of the rest. +

+

To enable the indentation engine, set the value of +indent-line-function to treesit-indent. +

+
+
Variable: treesit-indent-function
+

This variable stores the actual function called by +treesit-indent. By default, its value is +treesit-simple-indent. In the future we might add other +more complex indentation engines. +

+ +

Writing indentation rules

+ +
+
Variable: treesit-simple-indent-rules
+

This local variable stores indentation rules for every language. It is +a list of +

+
+
(language . rules)
+
+ +

where language is a language symbol, and rules is a list +of +

+
+
(matcher anchor offset)
+
+ +

First Emacs passes the node at point to matcher, if it return +non-nil, this rule applies. Then Emacs passes the node to +anchor, it returns a point. Emacs takes the column number of +that point, add offset to it, and the result is the indent for +the current line. +

+

The matcher and anchor are functions, and Emacs provides +convenient presets for them. You can skip over to +treesit-simple-indent-presets below, those presets should be +more than enough. +

+

A matcher or an anchor is a function that takes three +arguments (node parent bol). Argument bol is +the point at where we are indenting: the position of the first +non-whitespace character from the beginning of line; node is the +largest (highest-in-tree) node that starts at that point; parent +is the parent of node. A matcher returns nil/non-nil, and +anchor returns a point. +

+ +
+
Variable: treesit-simple-indent-presets
+

This is a list of presets for matchers and anchors in +treesit-simple-indent-rules. Each of them represent a function +that takes node, parent and bol as arguments. +

+
+
no-node
+
+ +

This matcher matches the case where node is nil, i.e., there is +no node that starts at bol. This is the case when bol is +at an empty line or inside a multi-line string, etc. +

+
+
(parent-is type)
+
+ +

This matcher matches if parent’s type is type. +

+
+
(node-is type)
+
+ +

This matcher matches if node’s type is type. +

+
+
(query query)
+
+ +

This matcher matches if querying parent with query +captures node. The capture name does not matter. +

+
+
(match node-type parent-type
+       node-field node-index-min node-index-max)
+
+ +

This matcher checks if node’s type is node-type, +parent’s type is parent-type, node’s field name in +parent is node-field, and node’s index among its +siblings is between node-index-min and node-index-max. If +the value of a constraint is nil, this matcher doesn’t check for that +constraint. For example, to match the first child where parent is +argument_list, use +

+
+
(match nil "argument_list" nil nil 0 0)
+
+ +
+
first-sibling
+
+ +

This anchor returns the start of the first child of parent. +

+
+
parent
+
+ +

This anchor returns the start of parent. +

+
+
parent-bol
+
+ +

This anchor returns the beginning of non-space characters on the line +where parent is on. +

+
+
prev-sibling
+
+ +

This anchor returns the start of the previous sibling of node. +

+
+
no-indent
+
+ +

This anchor returns the start of node, i.e., no indent. +

+
+
prev-line
+
+ +

This anchor returns the first non-whitespace charater on the previous +line. +

+ +

Indentation utilities

+ +

Here are some utility functions that can help writing indentation +rules. +

+
+
Function: treesit-check-indent mode
+

This function checks current buffer’s indentation against major mode +mode. It indents the current buffer in mode and compares +the indentation with the current indentation. Then it pops up a diff +buffer showing the difference. Correct indentation (target) is in +green, current indentation is in red. +

+ +

It is also helpful to use treesit-inspect-mode when writing +indentation rules. +

+
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html b/admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html new file mode 100644 index 00000000000..7b6e51468a6 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html @@ -0,0 +1,125 @@ + + + + + + +Parsing Program Source (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + + +
+ +
+

37 Parsing Program Source

+ +

Emacs provides various ways to parse program source text and produce a +syntax tree. In a syntax tree, text is no longer a +one-dimensional stream but a structured tree of nodes, where each node +representing a piece of text. Thus a syntax tree can enable +interesting features like precise fontification, indentation, +navigation, structured editing, etc. +

+

Emacs has a simple facility for parsing balanced expressions +(see Parsing Expressions). There is also SMIE library for generic +navigation and indentation (see Simple Minded Indentation Engine). +

+

Emacs also provides integration with tree-sitter library +(https://tree-sitter.github.io/tree-sitter) if compiled with +it. The tree-sitter library implements an incremental parser and has +support from a wide range of programming languages. +

+
+
Function: treesit-available-p
+

This function returns non-nil if tree-sitter features are available +for this Emacs instance. +

+ +

For tree-sitter integration with existing Emacs features, +see Parser-based Font Lock, Parser-based Indentation, and +Moving over Balanced Expressions. +

+

To access the syntax tree of the text in a buffer, we need to first +load a language definition and create a parser with it. Next, we can +query the parser for specific nodes in the syntax tree. Then, we can +access various information about the node, and we can pattern-match a +node with a powerful syntax. Finally, we explain how to work with +source files that mixes multiple languages. The following sections +explain how to do each of the tasks in detail. +

+ + +
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/Pattern-Matching.html b/admin/notes/tree-sitter/html-manual/Pattern-Matching.html new file mode 100644 index 00000000000..e14efe71629 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Pattern-Matching.html @@ -0,0 +1,430 @@ + + + + + + +Pattern Matching (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + + +
+ +
+

37.5 Pattern Matching Tree-sitter Nodes

+ +

Tree-sitter let us pattern match with a small declarative language. +Pattern matching consists of two steps: first tree-sitter matches a +pattern against nodes in the syntax tree, then it captures +specific nodes in that pattern and returns the captured nodes. +

+

We describe first how to write the most basic query pattern and how to +capture nodes in a pattern, then the pattern-match function, finally +more advanced pattern syntax. +

+

Basic query syntax

+ + + +

A query consists of multiple patterns. Each pattern is an +s-expression that matches a certain node in the syntax node. A +pattern has the following shape: +

+
+
(type child...)
+
+ +

For example, a pattern that matches a binary_expression node that +contains number_literal child nodes would look like +

+
+
(binary_expression (number_literal))
+
+ +

To capture a node in the query pattern above, append +@capture-name after the node pattern you want to capture. For +example, +

+
+
(binary_expression (number_literal) @number-in-exp)
+
+ +

captures number_literal nodes that are inside a +binary_expression node with capture name number-in-exp. +

+

We can capture the binary_expression node too, with capture +name biexp: +

+
+
(binary_expression
+ (number_literal) @number-in-exp) @biexp
+
+ +

Query function

+ +

Now we can introduce the query functions. +

+
+
Function: treesit-query-capture node query &optional beg end node-only
+

This function matches patterns in query in node. +Parameter query can be either a string, a s-expression, or a +compiled query object. For now, we focus on the string syntax; +s-expression syntax and compiled query are described at the end of the +section. +

+

Parameter node can also be a parser or a language symbol. A +parser means using its root node, a language symbol means find or +create a parser for that language in the current buffer, and use the +root node. +

+

The function returns all captured nodes in a list of +(capture_name . node). If node-only is +non-nil, a list of node is returned instead. If beg and +end are both non-nil, this function only pattern matches nodes +in that range. +

+ +

This function raise a treesit-query-error if query is +malformed. The signal data contains a description of the specific +error. You can use treesit-query-validate to debug the query. +

+ +

For example, suppose node’s content is 1 + 2, and +query is +

+
+
(setq query
+      "(binary_expression
+        (number_literal) @number-in-exp) @biexp")
+
+ +

Querying that query would return +

+
+
(treesit-query-capture node query)
+    ⇒ ((biexp . <node for "1 + 2">)
+       (number-in-exp . <node for "1">)
+       (number-in-exp . <node for "2">))
+
+ +

As we mentioned earlier, a query could contain multiple +patterns. For example, it could have two top-level patterns: +

+
+
(setq query
+      "(binary_expression) @biexp
+       (number_literal)  @number @biexp")
+
+ +
+
Function: treesit-query-string string query language
+

This function parses string with language, pattern matches +its root node with query, and returns the result. +

+ +

More query syntax

+ +

Besides node type and capture, tree-sitter’s query syntax can express +anonymous node, field name, wildcard, quantification, grouping, +alternation, anchor, and predicate. +

+

Anonymous node

+ +

An anonymous node is written verbatim, surrounded by quotes. A +pattern matching (and capturing) keyword return would be +

+
+
"return" @keyword
+
+ +

Wild card

+ +

In a query pattern, ‘(_)’ matches any named node, and ‘_’ +matches any named and anonymous node. For example, to capture any +named child of a binary_expression node, the pattern would be +

+
+
(binary_expression (_) @in_biexp)
+
+ +

Field name

+ +

We can capture child nodes that has specific field names: +

+
+
(function_definition
+  declarator: (_) @func-declarator
+  body: (_) @func-body)
+
+ +

We can also capture a node that doesn’t have certain field, say, a +function_definition without a body field. +

+
+
(function_definition !body) @func-no-body
+
+ +

Quantify node

+ +

Tree-sitter recognizes quantification operators ‘*’, ‘+’ and +‘?’. Their meanings are the same as in regular expressions: +‘*’ matches the preceding pattern zero or more times, ‘+’ +matches one or more times, and ‘?’ matches zero or one time. +

+

For example, this pattern matches type_declaration nodes +that has zero or more long keyword. +

+
+
(type_declaration "long"*) @long-type
+
+ +

And this pattern matches a type declaration that has zero or one +long keyword: +

+
+
(type_declaration "long"?) @long-type
+
+ +

Grouping

+ +

Similar to groups in regular expression, we can bundle patterns into a +group and apply quantification operators to it. For example, to +express a comma separated list of identifiers, one could write +

+
+
(identifier) ("," (identifier))*
+
+ +

Alternation

+ +

Again, similar to regular expressions, we can express “match anyone +from this group of patterns” in the query pattern. The syntax is a +list of patterns enclosed in square brackets. For example, to capture +some keywords in C, the query pattern would be +

+
+
[
+  "return"
+  "break"
+  "if"
+  "else"
+] @keyword
+
+ +

Anchor

+ +

The anchor operator ‘.’ can be used to enforce juxtaposition, +i.e., to enforce two things to be directly next to each other. The +two “things” can be two nodes, or a child and the end of its parent. +For example, to capture the first child, the last child, or two +adjacent children: +

+
+
;; Anchor the child with the end of its parent.
+(compound_expression (_) @last-child .)
+
+;; Anchor the child with the beginning of its parent.
+(compound_expression . (_) @first-child)
+
+;; Anchor two adjacent children.
+(compound_expression
+ (_) @prev-child
+ .
+ (_) @next-child)
+
+ +

Note that the enforcement of juxtaposition ignores any anonymous +nodes. +

+

Predicate

+ +

We can add predicate constraints to a pattern. For example, if we use +the following query pattern +

+
+
(
+ (array . (_) @first (_) @last .)
+ (#equal @first @last)
+)
+
+ +

Then tree-sitter only matches arrays where the first element equals to +the last element. To attach a predicate to a pattern, we need to +group then together. A predicate always starts with a ‘#’. +Currently there are two predicates, #equal and #match. +

+
+
Predicate: equal arg1 arg2
+

Matches if arg1 equals to arg2. Arguments can be either a +string or a capture name. Capture names represent the text that the +captured node spans in the buffer. +

+ +
+
Predicate: match regexp capture-name
+

Matches if the text that capture-name’s node spans in the buffer +matches regular expression regexp. Matching is case-sensitive. +

+ +

Note that a predicate can only refer to capture names appeared in the +same pattern. Indeed, it makes little sense to refer to capture names +in other patterns anyway. +

+

S-expression patterns

+ +

Besides strings, Emacs provides a s-expression based syntax for query +patterns. It largely resembles the string-based syntax. For example, +the following pattern +

+
+
(treesit-query-capture
+ node "(addition_expression
+        left: (_) @left
+        \"+\" @plus-sign
+        right: (_) @right) @addition
+
+        [\"return\" \"break\"] @keyword")
+
+ +

is equivalent to +

+
+
(treesit-query-capture
+ node '((addition_expression
+         left: (_) @left
+         "+" @plus-sign
+         right: (_) @right) @addition
+
+         ["return" "break"] @keyword))
+
+ +

Most pattern syntax can be written directly as strange but +never-the-less valid s-expressions. Only a few of them needs +modification: +

+
    +
  • Anchor ‘.’ is written as :anchor. +
  • ?’ is written as ‘:?’. +
  • *’ is written as ‘:*’. +
  • +’ is written as ‘:+’. +
  • #equal is written as :equal. In general, predicates +change their ‘#’ to ‘:’. +
+ +

For example, +

+
+
"(
+  (compound_expression . (_) @first (_)* @rest)
+  (#match \"love\" @first)
+  )"
+
+ +

is written in s-expression as +

+
+
'((
+   (compound_expression :anchor (_) @first (_) :* @rest)
+   (:match "love" @first)
+   ))
+
+ +

Compiling queries

+ +

If a query will be used repeatedly, especially in tight loops, it is +important to compile that query, because a compiled query is much +faster than an uncompiled one. A compiled query can be used anywhere +a query is accepted. +

+
+
Function: treesit-query-compile language query
+

This function compiles query for language into a compiled +query object and returns it. +

+

This function raise a treesit-query-error if query is +malformed. The signal data contains a description of the specific +error. You can use treesit-query-validate to debug the query. +

+ +
+
Function: treesit-query-expand query
+

This function expands the s-expression query into a string +query. +

+ +
+
Function: treesit-pattern-expand pattern
+

This function expands the s-expression pattern into a string +pattern. +

+ +

Finally, tree-sitter project’s documentation about +pattern-matching can be found at +https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries. +

+
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/Retrieving-Node.html b/admin/notes/tree-sitter/html-manual/Retrieving-Node.html new file mode 100644 index 00000000000..1bea0dde76b --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Retrieving-Node.html @@ -0,0 +1,362 @@ + + + + + + +Retrieving Node (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + + +
+ +
+

37.3 Retrieving Node

+ + + +

Before we continue, lets go over some conventions of tree-sitter +functions. +

+

We talk about a node being “smaller” or “larger”, and “lower” or +“higher”. A smaller and lower node is lower in the syntax tree and +therefore spans a smaller piece of text; a larger and higher node is +higher up in the syntax tree, containing many smaller nodes as its +children, and therefore spans a larger piece of text. +

+

When a function cannot find a node, it returns nil. And for the +convenience for function chaining, all the functions that take a node +as argument and returns a node accept the node to be nil; in that +case, the function just returns nil. +

+ +

Nodes are not automatically updated when the associated buffer is +modified. And there is no way to update a node once it is retrieved. +Using an outdated node throws treesit-node-outdated error. +

+

Retrieving node from syntax tree

+ +
+
Function: treesit-node-at beg end &optional parser-or-lang named
+

This function returns the smallest node that starts at or after +the point. In other words, the start of the node is equal or +greater than point. +

+

When parser-or-lang is nil, this function uses the first parser +in (treesit-parser-list) in the current buffer. If +parser-or-lang is a parser object, it use that parser; if +parser-or-lang is a language, it finds the first parser using +that language in (treesit-parser-list) and use that. +

+

If named is non-nil, this function looks for a named node +only (see named node). +

+

Example: +

+
;; Find the node at point in a C parser's syntax tree.
+(treesit-node-at (point) 'c)
+    
+
+ +
+
Function: treesit-node-on beg end &optional parser-or-lang named
+

This function returns the smallest node that covers the span +from beg to end. In other words, the start of the node is +less or equal to beg, and the end of the node is greater or +equal to end. +

+

Beware that calling this function on an empty line that is not +inside any top-level construct (function definition, etc) most +probably will give you the root node, because the root node is the +smallest node that covers that empty line. Most of the time, you want +to use treesit-node-at. +

+

When parser-or-lang is nil, this function uses the first parser +in (treesit-parser-list) in the current buffer. If +parser-or-lang is a parser object, it use that parser; if +parser-or-lang is a language, it finds the first parser using +that language in (treesit-parser-list) and use that. +

+

If named is non-nil, this function looks for a named node only +(see named node). +

+ +
+
Function: treesit-parser-root-node parser
+

This function returns the root node of the syntax tree generated by +parser. +

+ +
+
Function: treesit-buffer-root-node &optional language
+

This function finds the first parser that uses language in +(treesit-parser-list) in the current buffer, and returns the +root node of that buffer. If it cannot find an appropriate parser, +nil is returned. +

+ +

Once we have a node, we can retrieve other nodes from it, or query for +information about this node. +

+

Retrieving node from other nodes

+ +

By kinship

+ +
+
Function: treesit-node-parent node
+

This function returns the immediate parent of node. +

+ +
+
Function: treesit-node-child node n &optional named
+

This function returns the n’th child of node. If +named is non-nil, then it only counts named nodes +(see named node). For example, in a node +that represents a string: "text", there are three children +nodes: the opening quote ", the string content text, and +the enclosing quote ". Among these nodes, the first child is +the opening quote ", the first named child is the string +content text. +

+ +
+
Function: treesit-node-children node &optional named
+

This function returns all of node’s children in a list. If +named is non-nil, then it only retrieves named nodes. +

+ +
+
Function: treesit-next-sibling node &optional named
+

This function finds the next sibling of node. If named is +non-nil, it finds the next named sibling. +

+ +
+
Function: treesit-prev-sibling node &optional named
+

This function finds the previous sibling of node. If +named is non-nil, it finds the previous named sibling. +

+ +

By field name

+ +

To make the syntax tree easier to analyze, many language definitions +assign field names to child nodes (see field name). For example, a function_definition node +could have a declarator and a body. +

+
+
Function: treesit-child-by-field-name node field-name
+

This function finds the child of node that has field-name +as its field name. +

+
+
;; Get the child that has "body" as its field name.
+(treesit-child-by-field-name node "body")
+    
+
+ +

By position

+ +
+
Function: treesit-first-child-for-pos node pos &optional named
+

This function finds the first child of node that extends beyond +pos. “Extend beyond” means the end of the child node >= +pos. This function only looks for immediate children of +node, and doesn’t look in its grand children. If named is +non-nil, it only looks for named child (see named node). +

+ +
+
Function: treesit-node-descendant-for-range node beg end &optional named
+

This function finds the smallest child/grandchild... of +node that spans the range from beg to end. It is +similar to treesit-node-at. If named is non-nil, it only +looks for named child. +

+ +

Searching for node

+ +
+
Function: treesit-search-subtree node predicate &optional all backward limit
+

This function traverses the subtree of node (including +node), and match predicate with each node along the way. +And predicate is a regexp that matches (case-insensitively) +against each node’s type, or a function that takes a node and returns +nil/non-nil. If a node matches, that node is returned, if no node +ever matches, nil is returned. +

+

By default, this function only traverses named nodes, if all is +non-nil, it traverses all nodes. If backward is non-nil, it +traverses backwards. If limit is non-nil, it only traverses +that number of levels down in the tree. +

+ +
+
Function: treesit-search-forward start predicate &optional all backward up
+

This function is somewhat similar to treesit-search-subtree. +It also traverse the parse tree and match each node with +predicate (except for start), where predicate can be +a (case-insensitive) regexp or a function. For a tree like the below +where start is marked 1, this function traverses as numbered: +

+
+
              o
+              |
+     3--------4-----------8
+     |        |           |
+o--o-+--1  5--+--6    9---+-----12
+|  |    |        |    |         |
+o  o    2        7  +-+-+    +--+--+
+                    |   |    |  |  |
+                    10  11   13 14 15
+
+ +

Same as in treesit-search-subtree, this function only searches +for named nodes by default. But if all is non-nil, it searches +for all nodes. If backward is non-nil, it searches backwards. +

+

If up is non-nil, this function will only traverse to siblings +and parents. In that case, only 1 3 4 8 would be traversed. +

+ +
+
Function: treesit-search-forward-goto predicate side &optional all backward up
+

This function jumps to the start or end of the next node in buffer +that matches predicate. Parameters predicate, all, +backward, and up are the same as in +treesit-search-forward. And side controls which side of +the matched no do we stop at, it can be start or end. +

+ +
+
Function: treesit-induce-sparse-tree root predicate &optional process-fn limit
+

This function creates a sparse tree from root’s subtree. +

+

Basically, it takes the subtree under root, and combs it so only +the nodes that match predicate are left, like picking out grapes +on the vine. Like previous functions, predicate can be a regexp +string that matches against each node’s type case-insensitively, or a +function that takes a node and return nil/non-nil. +

+

For example, for a subtree on the left that consist of both numbers +and letters, if predicate is “letter only”, the returned tree +is the one on the right. +

+
+
    a                 a              a
+    |                 |              |
++---+---+         +---+---+      +---+---+
+|   |   |         |   |   |      |   |   |
+b   1   2         b   |   |      b   c   d
+    |   |     =>      |   |  =>      |
+    c   +--+          c   +          e
+    |   |  |          |   |
+ +--+   d  4       +--+   d
+ |  |              |
+ e  5              e
+
+ +

If process-fn is non-nil, instead of returning the matched +nodes, this function passes each node to process-fn and uses the +returned value instead. If non-nil, limit is the number of +levels to go down from root. +

+

Each node in the returned tree looks like (tree-sitter +node . (child ...)). The tree-sitter node of the root +of this tree will be nil if ROOT doesn’t match pred. If +no node matches predicate, return nil. +

+ +

More convenient functions

+ +
+
Function: treesit-filter-child node pred &optional named
+

This function finds immediate children of node that satisfies +pred. +

+

Function pred takes the child node as the argument and should +return non-nil to indicated keeping the child. If named +non-nil, this function only searches for named nodes. +

+ +
+
Function: treesit-parent-until node pred
+

This function repeatedly finds the parent of node, and returns +the parent if it satisfies pred (which takes the parent as the +argument). If no parent satisfies pred, this function returns +nil. +

+ +
+
Function: treesit-parent-while
+

This function repeatedly finds the parent of node, and keeps +doing so as long as the parent satisfies pred (which takes the +parent as the single argument). I.e., this function returns the +farthest parent that still satisfies pred. +

+ +
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html b/admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html new file mode 100644 index 00000000000..77cea6b3f95 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html @@ -0,0 +1,212 @@ + + + + + + +Tree-sitter C API (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + +
+ +
+

37.7 Tree-sitter C API Correspondence

+ +

Emacs’ tree-sitter integration doesn’t expose every feature +tree-sitter’s C API provides. Missing features include: +

+
    +
  • Creating a tree cursor and navigating the syntax tree with it. +
  • Setting timeout and cancellation flag for a parser. +
  • Setting the logger for a parser. +
  • Printing a DOT graph of the syntax tree to a file. +
  • Coping and modifying a syntax tree. (Emacs doesn’t expose a tree +object.) +
  • Using (row, column) coordinates as position. +
  • Updating a node with changes. (In Emacs, retrieve a new node instead +of updating the existing one.) +
  • Querying statics of a language definition. +
+ +

In addition, Emacs makes some changes to the C API to make the API more +convenient and idiomatic: +

+
    +
  • Instead of using byte positions, the ELisp API uses character +positions. +
  • Null nodes are converted to nil. +
+ +

Below is the correspondence between all C API functions and their +ELisp counterparts. Sometimes one ELisp function corresponds to +multiple C functions, and many C functions don’t have an ELisp +counterpart. +

+
+
ts_parser_new                           treesit-parser-create
+ts_parser_delete
+ts_parser_set_language
+ts_parser_language                      treesit-parser-language
+ts_parser_set_included_ranges           treesit-parser-set-included-ranges
+ts_parser_included_ranges               treesit-parser-included-ranges
+ts_parser_parse
+ts_parser_parse_string                  treesit-parse-string
+ts_parser_parse_string_encoding
+ts_parser_reset
+ts_parser_set_timeout_micros
+ts_parser_timeout_micros
+ts_parser_set_cancellation_flag
+ts_parser_cancellation_flag
+ts_parser_set_logger
+ts_parser_logger
+ts_parser_print_dot_graphs
+ts_tree_copy
+ts_tree_delete
+ts_tree_root_node
+ts_tree_language
+ts_tree_edit
+ts_tree_get_changed_ranges
+ts_tree_print_dot_graph
+ts_node_type                            treesit-node-type
+ts_node_symbol
+ts_node_start_byte                      treesit-node-start
+ts_node_start_point
+ts_node_end_byte                        treesit-node-end
+ts_node_end_point
+ts_node_string                          treesit-node-string
+ts_node_is_null
+ts_node_is_named                        treesit-node-check
+ts_node_is_missing                      treesit-node-check
+ts_node_is_extra                        treesit-node-check
+ts_node_has_changes                     treesit-node-check
+ts_node_has_error                       treesit-node-check
+ts_node_parent                          treesit-node-parent
+ts_node_child                           treesit-node-child
+ts_node_field_name_for_child            treesit-node-field-name-for-child
+ts_node_child_count                     treesit-node-child-count
+ts_node_named_child                     treesit-node-child
+ts_node_named_child_count               treesit-node-child-count
+ts_node_child_by_field_name             treesit-node-by-field-name
+ts_node_child_by_field_id
+ts_node_next_sibling                    treesit-next-sibling
+ts_node_prev_sibling                    treesit-prev-sibling
+ts_node_next_named_sibling              treesit-next-sibling
+ts_node_prev_named_sibling              treesit-prev-sibling
+ts_node_first_child_for_byte            treesit-first-child-for-pos
+ts_node_first_named_child_for_byte      treesit-first-child-for-pos
+ts_node_descendant_for_byte_range       treesit-descendant-for-range
+ts_node_descendant_for_point_range
+ts_node_named_descendant_for_byte_range treesit-descendant-for-range
+ts_node_named_descendant_for_point_range
+ts_node_edit
+ts_node_eq                              treesit-node-eq
+ts_tree_cursor_new
+ts_tree_cursor_delete
+ts_tree_cursor_reset
+ts_tree_cursor_current_node
+ts_tree_cursor_current_field_name
+ts_tree_cursor_current_field_id
+ts_tree_cursor_goto_parent
+ts_tree_cursor_goto_next_sibling
+ts_tree_cursor_goto_first_child
+ts_tree_cursor_goto_first_child_for_byte
+ts_tree_cursor_goto_first_child_for_point
+ts_tree_cursor_copy
+ts_query_new
+ts_query_delete
+ts_query_pattern_count
+ts_query_capture_count
+ts_query_string_count
+ts_query_start_byte_for_pattern
+ts_query_predicates_for_pattern
+ts_query_step_is_definite
+ts_query_capture_name_for_id
+ts_query_string_value_for_id
+ts_query_disable_capture
+ts_query_disable_pattern
+ts_query_cursor_new
+ts_query_cursor_delete
+ts_query_cursor_exec                    treesit-query-capture
+ts_query_cursor_did_exceed_match_limit
+ts_query_cursor_match_limit
+ts_query_cursor_set_match_limit
+ts_query_cursor_set_byte_range
+ts_query_cursor_set_point_range
+ts_query_cursor_next_match
+ts_query_cursor_remove_match
+ts_query_cursor_next_capture
+ts_language_symbol_count
+ts_language_symbol_name
+ts_language_symbol_for_name
+ts_language_field_count
+ts_language_field_name_for_id
+ts_language_field_id_for_name
+ts_language_symbol_type
+ts_language_version
+
+
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/Using-Parser.html b/admin/notes/tree-sitter/html-manual/Using-Parser.html new file mode 100644 index 00000000000..438e3858f1b --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Using-Parser.html @@ -0,0 +1,186 @@ + + + + + + +Using Parser (GNU Emacs Lisp Reference Manual) + + + + + + + + + + + + + + + + + + + + + +
+ +
+

37.2 Using Tree-sitter Parser

+ + +

This section described how to create and configure a tree-sitter +parser. In Emacs, each tree-sitter parser is associated with a +buffer. As we edit the buffer, the associated parser and the syntax +tree is automatically kept up-to-date. +

+
+
Variable: treesit-max-buffer-size
+

This variable contains the maximum size of buffers in which +tree-sitter can be activated. Major modes should check this value +when deciding whether to enable tree-sitter features. +

+ +
+
Function: treesit-can-enable-p
+

This function checks whether the current buffer is suitable for +activating tree-sitter features. It basically checks +treesit-available-p and treesit-max-buffer-size. +

+ + +
+
Function: treesit-parser-create language &optional buffer no-reuse
+

To create a parser, we provide a buffer and the language +to use (see Tree-sitter Language Definitions). If buffer is nil, the +current buffer is used. +

+

By default, this function reuses a parser if one already exists for +language in buffer, if no-reuse is non-nil, this +function always creates a new parser. +

+ +

Given a parser, we can query information about it: +

+
+
Function: treesit-parser-buffer parser
+

Returns the buffer associated with parser. +

+ +
+
Function: treesit-parser-language parser
+

Returns the language that parser uses. +

+ +
+
Function: treesit-parser-p object
+

Checks if object is a tree-sitter parser. Return non-nil if it +is, return nil otherwise. +

+ +

There is no need to explicitly parse a buffer, because parsing is done +automatically and lazily. A parser only parses when we query for a +node in its syntax tree. Therefore, when a parser is first created, +it doesn’t parse the buffer; it waits until we query for a node for +the first time. Similarly, when some change is made in the buffer, a +parser doesn’t re-parse immediately. +

+ +

When a parser do parse, it checks for the size of the buffer. +Tree-sitter can only handle buffer no larger than about 4GB. If the +size exceeds that, Emacs signals treesit-buffer-too-large +with signal data being the buffer size. +

+

Once a parser is created, Emacs automatically adds it to the +internal parser list. Every time a change is made to the buffer, +Emacs updates parsers in this list so they can update their syntax +tree incrementally. +

+
+
Function: treesit-parser-list &optional buffer
+

This function returns the parser list of buffer. And +buffer defaults to the current buffer. +

+ +
+
Function: treesit-parser-delete parser
+

This function deletes parser. +

+ + +

Normally, a parser “sees” the whole +buffer, but when the buffer is narrowed (see Narrowing), the +parser will only see the visible region. As far as the parser can +tell, the hidden region is deleted. And when the buffer is later +widened, the parser thinks text is inserted in the beginning and in +the end. Although parsers respect narrowing, narrowing shouldn’t be +the mean to handle a multi-language buffer; instead, set the ranges in +which a parser should operate in. See Parsing Text in Multiple Languages. +

+

Because a parser parses lazily, when we narrow the buffer, the parser +is not affected immediately; as long as we don’t query for a node +while the buffer is narrowed, the parser is oblivious of the +narrowing. +

+ +
+
Function: treesit-parse-string string language
+

Besides creating a parser for a buffer, we can also just parse a +string. Unlike a buffer, parsing a string is a one-time deal, and +there is no way to update the result. +

+

This function parses string with language, and returns the +root node of the generated syntax tree. +

+ +
+
+ + + + + + diff --git a/admin/notes/tree-sitter/html-manual/build-manual.sh b/admin/notes/tree-sitter/html-manual/build-manual.sh new file mode 100755 index 00000000000..adde3f2a2af --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/build-manual.sh @@ -0,0 +1,23 @@ +#!/bin/bash + +MANUAL_DIR="../../../doc/lispref" +THIS_DIR=$(pwd) + +echo "Build manual" +cd "${MANUAL_DIR}" +make elisp.html HTML_OPTS="--html --css-ref=./manual.css" + +cd "${THIS_DIR}" + +echo "Copy manual" +cp -f "${MANUAL_DIR}/elisp.html/Parsing-Program-Source.html" . +cp -f "${MANUAL_DIR}/elisp.html/Language-Definitions.html" . +cp -f "${MANUAL_DIR}/elisp.html/Using-Parser.html" . +cp -f "${MANUAL_DIR}/elisp.html/Retrieving-Node.html" . +cp -f "${MANUAL_DIR}/elisp.html/Accessing-Node.html" . +cp -f "${MANUAL_DIR}/elisp.html/Pattern-Matching.html" . +cp -f "${MANUAL_DIR}/elisp.html/Multiple-Languages.html" . +cp -f "${MANUAL_DIR}/elisp.html/Tree_002dsitter-C-API.html" . + +cp -f "${MANUAL_DIR}/elisp.html/Parser_002dbased-Font-Lock.html" . +cp -f "${MANUAL_DIR}/elisp.html/Parser_002dbased-Indentation.html" . diff --git a/admin/notes/tree-sitter/html-manual/manual.css b/admin/notes/tree-sitter/html-manual/manual.css new file mode 100644 index 00000000000..5a6790a3458 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/manual.css @@ -0,0 +1,374 @@ +/* Style-sheet to use for Emacs manuals */ + +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. + +Copying and distribution of this file, with or without modification, +are permitted in any medium without royalty provided the copyright +notice and this notice are preserved. This file is offered as-is, +without any warranty. +*/ + +/* style.css begins here */ + +/* This stylesheet is used by manuals and a few older resources. */ + +/* reset.css begins here */ + +/* +Software License Agreement (BSD License) + +Copyright (c) 2006, Yahoo! Inc. +All rights reserved. + +Redistribution and use of this software in source and +binary forms, with or without modification, arepermitted +provided that the following conditions are met: + +* Redistributions of source code must retain the above +copyright notice, this list of conditions and the +following disclaimer. + +* Redistributions in binary form must reproduce the above +copyright notice, this list of conditions and the +following disclaimer in the documentation and/or other +materials provided with the distribution. + +* Neither the name of Yahoo! Inc. nor the names of its +contributors may be used to endorse or promote products +derived from this software without specific prior +written permission of Yahoo! Inc. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND +CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, +INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT +NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER +IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE. +*/ + +html { + color: #000; + background: #FFF; +} + +body, div, dl, dt, dd, ul, ol, li, h1, h2, h3, h4, +h5, h6, pre, code, form, fieldset, legend, input, +button, textarea, p, blockquote, th, td { + margin: 0; + padding: 0; +} + +table { + border-collapse: collapse; + border-spacing: 0; +} + +fieldset, img { + border: 0; +} + +address, caption, cite, code, dfn, em, strong, +th, var, optgroup { + font-style: inherit; + font-weight: inherit; +} + +del, ins { + text-decoration: none; +} + +li { + list-style:none; +} + +caption, th { + text-align: left; +} + +h1, h2, h3, h4, h5, h6 { + font-size: 100%; + font-weight: normal; +} + +q:before, q:after { + content:''; +} + +abbr, acronym { + border: 0; + font-variant: normal; +} + +sup { + vertical-align: baseline; +} +sub { + vertical-align: baseline; +} + +legend { + color: #000; +} + +input, button, textarea, select, optgroup, option { + font-family: inherit; + font-size: inherit; + font-style: inherit; + font-weight: inherit; +} + +input, button, textarea, select { + *font-size: 100%; +} + + +/* reset.css ends here */ + +/*** PAGE LAYOUT ***/ + +html, body { + font-size: 1em; + text-align: left; + text-decoration: none; +} +html { background-color: #e7e7e7; } + +body { + max-width: 74.92em; + margin: 0 auto; + padding: .5em 1em 1em 1em; + background-color: white; + border: .1em solid #c0c0c0; +} + + +/*** BASIC ELEMENTS ***/ + +/* Size and positioning */ + +p, pre, li, dt, dd, table, code, address { line-height: 1.3em; } + +h1 { font-size: 2em; margin: 1em 0 } +h2 { font-size: 1.50em; margin: 1.0em 0 0.87em 0; } +h3 { font-size: 1.30em; margin: 1.0em 0 0.87em 0; } +h4 { font-size: 1.13em; margin: 1.0em 0 0.88em 0; } +h5 { font-size: 1.00em; margin: 1.0em 0 1.00em 0; } + +p, pre { margin: 1em 0; } +pre { overflow: auto; padding-bottom: .3em; } + +ul, ol, blockquote { margin-left: 1.5%; margin-right: 1.5%; } +hr { margin: 1em 0; } +/* Lists of underlined links are difficult to read. The top margin + gives a little more spacing between entries. */ +ul li { margin: .5em 1em; } +ol li { margin: 1em; } +ol ul li { margin: .5em 1em; } +ul li p, ul ul li { margin-top: .3em; margin-bottom: .3em; } +ul ul, ol ul { margin-top: 0; margin-bottom: 0; } + +/* Separate description lists from preceding text */ +dl { margin: 1em 0 0 0; } +/* separate the "term" from subsequent "description" */ +dt { margin: .5em 0; } +/* separate the "description" from subsequent list item + when the final
child is an anonymous box */ +dd { margin: .5em 3% 1em 3%; } +/* separate anonymous box (used to be the first element in
) + from subsequent

*/ +dd p { margin: .5em 0; } + +table { + display: block; overflow: auto; + margin-top: 1.5em; margin-bottom: 1.5em; +} +th { padding: .3em .5em; text-align: center; } +td { padding: .2em .5em; } + +address { margin-bottom: 1em; } +caption { margin-bottom: .5em; text-align: center; } +sup { vertical-align: super; } +sub { vertical-align: sub; } + +/* Style */ + +h1, h2, h3, h4, h5, h6, strong, dt, th { font-weight: bold; } + +/* The default color (black) is too dark for large text in + bold font. */ +h1, h2, h3, h4 { color: #333; } +h5, h6, dt { color: #222; } + +a[href] { color: #005090; } +a[href]:visited { color: #100070; } +a[href]:active, a[href]:hover { + color: #100070; + text-decoration: none; +} + +h1 a[href]:visited, h2 a[href]:visited, h3 a[href]:visited, +h4 a[href]:visited { color: #005090; } +h1 a[href]:hover, h2 a[href]:hover, h3 a[href]:hover, +h4 a[href]:hover { color: #100070; } + +ol { list-style: decimal outside;} +ul { list-style: square outside; } +ul ul, ol ul { list-style: circle; } +li { list-style: inherit; } + +hr { background-color: #ede6d5; } +table { border: 0; } + +abbr,acronym { + border-bottom:1px dotted #000; + text-decoration: none; + cursor:help; +} +del { text-decoration: line-through; } +em { font-style: italic; } +small { font-size: .9em; } + +img { max-width: 100%} + + +/*** SIMPLE CLASSES ***/ + +.center, .c { text-align: center; } +.nocenter{ text-align: left; } + +.underline { text-decoration: underline; } +.nounderline { text-decoration: none; } + +.no-bullet { list-style: none; } +.inline-list li { display: inline } + +.netscape4, .no-display { display: none; } + + +/*** MANUAL PAGES ***/ + +/* This makes the very long tables of contents in Gnulib and other + manuals easier to read. */ +.contents ul, .shortcontents ul { font-weight: bold; } +.contents ul ul, .shortcontents ul ul { font-weight: normal; } +.contents ul { list-style: none; } + +/* For colored navigation bars (Emacs manual): make the bar extend + across the whole width of the page and give it a decent height. */ +.header, .node { margin: 0 -1em; padding: 0 1em; } +.header p, .node p { line-height: 2em; } + +/* For navigation links */ +.node a, .header a { display: inline-block; line-height: 2em; } +.node a:hover, .header a:hover { background: #f2efe4; } + +/* Inserts */ +table.cartouche td { padding: 1.5em; } + +div.display, div.lisp, div.smalldisplay, +div.smallexample, div.smalllisp { margin-left: 3%; } + +div.example { padding: .8em 1.2em .4em; } +pre.example { padding: .8em 1.2em; } +div.example, pre.example { + margin: 1em 0 1em 3% ; + -webkit-border-radius: .3em; + -moz-border-radius: .3em; + border-radius: .3em; + border: 1px solid #d4cbb6; + background-color: #f2efe4; +} +div.example > pre.example { + padding: 0 0 .4em; + margin: 0; + border: none; +} + +pre.menu-comment { padding-top: 1.3em; margin: 0; } + + +/*** FOR WIDE SCREENS ***/ + +@media (min-width: 40em) { + body { padding: .5em 3em 1em 3em; } + div.header, div.node { margin: 0 -3em; padding: 0 3em; } +} + +/* style.css ends here */ + +/* makeinfo convert @deffn and similar functions to something inside +

. style.css uses italic for blockquote. This looks poor + in the Emacs manuals, which make extensive use of @defun (etc). + In particular, references to function arguments appear as + inside
. Since is also italic, it makes it + impossible to distinguish variables. We could change to + e.g. bold-italic, or normal, or a different color, but that does + not look as good IMO. So we just override blockquote to be non-italic. + */ +blockquote { font-style: normal; } + +var { font-style: italic; } + +div.header { + background-color: #DDDDFF; + padding-top: 0.2em; +} + + +/*** Customization ***/ + +body { + font-family: Charter, serif; + font-size: 14pt; + line-height: 1.4; + background-color: #fefefc; + color: #202010; +} + +pre.menu-comment { + font-family: Charter, serif; + font-size: 14pt; +} + +body > *, body > div.display, body > div.lisp, body > div.smalldisplay, +body > div.example, body > div.smallexample, body > div.smalllisp { + width: 700px; + margin-left: auto; + margin-right: auto; +} + +div.header { + width: 100%; + min-height: 3em; + font-size: 13pt; +} + +/* Documentation block for functions and variables. Make then + narrower*/ +dd { + margin: .5em 6% 1em 6% +} + +code, pre, kbd, samp, tt { + font-size: 12pt; + font-family: monospace; +} + +/* In each node we have index table to all sub-nodes. Make more space + for the first column, which is the name to each sub-node. */ +table.menu tbody tr td:nth-child(1) { + white-space: nowrap; +} + +div.header p { + text-align: center; + margin: 0.5em auto 0.5em auto; +} diff --git a/admin/notes/tree-sitter/starter-guide b/admin/notes/tree-sitter/starter-guide new file mode 100644 index 00000000000..6cf8cf8a236 --- /dev/null +++ b/admin/notes/tree-sitter/starter-guide @@ -0,0 +1,442 @@ +STARTER GUIDE ON WRITTING MAJOR MODE WITH TREE-SITTER -*- org -*- + +This document guides you on adding tree-sitter support to a major +mode. + +TOC: + +- Building Emacs with tree-sitter +- Install language definitions +- Setup +- Font-lock +- Indent +- Imenu +- Navigation +- Which-func +- More features? +- Common tasks (code snippets) +- Manual + +* Building Emacs with tree-sitter + +You can either install tree-sitter by your package manager, or from +source: + + git clone https://github.com/tree-sitter/tree-sitter.git + cd tree-sitter + make + make install + +Then pull the tree-sitter branch (or the master branch, if it has +merged) and rebuild Emacs. + +* Install language definitions + +Tree-sitter by itself doesn’t know how to parse any particular +language. We need to install language definitions (or “grammars”) for +a language to be able to parse it. There are a couple of ways to get +them. + +You can use this script that I put together here: + + https://github.com/casouri/tree-sitter-module + +You can also find them under this directory in /build-modules. + +This script automatically pulls and builds language definitions for C, +C++, Rust, JSON, Go, HTML, Javascript, CSS, Python, Typescript, +and C#. Better yet, I pre-built these language definitions for +GNU/Linux and macOS, they can be downloaded here: + + https://github.com/casouri/tree-sitter-module/releases/tag/v2.1 + +To build them yourself, run + + git clone git@github.com:casouri/tree-sitter-module.git + cd tree-sitter-module + ./batch.sh + +and language definitions will be in the /dist directory. You can +either copy them to standard dynamic library locations of your system, +eg, /usr/local/lib, or leave them in /dist and later tell Emacs where +to find language definitions by setting ‘treesit-extra-load-path’. + +Language definition sources can be found on GitHub under +tree-sitter/xxx, like tree-sitter/tree-sitter-python. The tree-sitter +organization has all the "official" language definitions: + + https://github.com/tree-sitter + +* Setting up for adding major mode features + +Start Emacs, and load tree-sitter with + + (require 'treesit) + +Now check if Emacs is built with tree-sitter library + + (treesit-available-p) + +For your major mode, first create a tree-sitter switch: + +#+begin_src elisp +(defcustom python-use-tree-sitter nil + "If non-nil, `python-mode' tries to use tree-sitter. +Currently `python-mode' can utilize tree-sitter for font-locking, +imenu, and movement functions." + :type 'boolean) +#+end_src + +Then in other places, we decide on whether to enable tree-sitter by + +#+begin_src elisp +(and python-use-tree-sitter + (treesit-can-enable-p)) +#+end_src + +* Font-lock + +Tree-sitter works like this: You provide a query made of patterns and +capture names, tree-sitter finds the nodes that match these patterns, +tag the corresponding capture names onto the nodes and return them to +you. The query function returns a list of (capture-name . node). For +font-lock, we use face names as capture names. And the captured node +will be fontified in their capture name. The capture name could also +be a function, in which case (START END NODE) is passed to the +function for font-lock. START and END is the start and end the +captured NODE. + +** Query syntax + +There are two types of nodes, named, like (identifier), +(function_definition), and anonymous, like "return", "def", "(", +"}". Parent-child relationship is expressed as + + (parent (child) (child) (child (grand_child))) + +Eg, an argument list (1, "3", 1) could be: + + (argument_list "(" (number) (string) (number) ")") + +Children could have field names in its parent: + + (function_definition name: (identifier) type: (identifier)) + +Match any of the list: + + ["true" "false" "none"] + +Capture names can come after any node in the pattern: + + (parent (child) @child) @parent + +The query above captures both parent and child. + + ["return" "continue" "break"] @keyword + +The query above captures all the keywords with capture name +"keyword". + +These are the common syntax, see all of them in the manual +("Parsing Program Source" section). + +** Query references + +But how do one come up with the queries? Take python for an +example, open any python source file, evaluate + + (treesit-parser-create 'python) + +so there is a parser available, then enable ‘treesit-inspect-mode’. +Now you should see information of the node under point in +mode-line. Move around and you should be able to get a good +picture. Besides this, you can consult the grammar of the language +definition. For example, Python’s grammar file is at + + https://github.com/tree-sitter/tree-sitter-python/blob/master/grammar.js + +Neovim also has a bunch of queries to reference: + + https://github.com/nvim-treesitter/nvim-treesitter/tree/master/queries + +The manual explains how to read grammar files in the bottom of section +"Tree-sitter Language Definitions". + +** Debugging queires + +If your query has problems, it usually cannot compile. In that case +use ‘treesit-query-validate’ to debug the query. It will pop a buffer +containing the query (in text format) and mark the offending part in +red. + +** Code + +To enable tree-sitter font-lock, set ‘treesit-font-lock-settings’ +buffer-locally and call ‘treesit-font-lock-enable’. For example, see +‘python--treesit-settings’ in python.el. Below I paste a snippet of +it. + +Note that like the current font-lock, if the to-be-fontified region +already has a face (ie, an earlier match fontified part/all of the +region), the new face is discarded rather than applied. If you want +later matches always override earlier matches, use the :override +keyword. + +#+begin_src elisp +(defvar python--treesit-settings + (treesit-font-lock-rules + :language 'python + :override t + `(;; Queries for def and class. + (function_definition + name: (identifier) @font-lock-function-name-face) + + (class_definition + name: (identifier) @font-lock-type-face) + + ;; Comment and string. + (comment) @font-lock-comment-face + + ...))) +#+end_src + +Then in ‘python-mode’, enable tree-sitter font-lock: + +#+begin_src elisp +(treesit-parser-create 'python) +;; This turns off the syntax-based font-lock for comments and +;; strings. So it doesn’t override tree-sitter’s fontification. +(setq-local font-lock-keywords-only t) +(setq-local treesit-font-lock-settings + python--treesit-settings) +(treesit-font-lock-enable) +#+end_src + +Concretely, something like this: + +#+begin_src elisp +(define-derived-mode python-mode prog-mode "Python" + ... + + (treesit-parser-create 'python) + + (if (and python-use-tree-sitter + (treesit-can-enable-p)) + ;; Tree-sitter. + (progn + (setq-local font-lock-keywords-only t) + (setq-local treesit-font-lock-settings + python--treesit-settings) + (treesit-font-lock-enable)) + ;; No tree-sitter + (setq-local font-lock-defaults ...)) + + ...) +#+end_src + +You’ll notice that tree-sitter’s font-lock doesn’t respect +‘font-lock-maximum-decoration’, major modes are free to set +‘treesit-font-lock-settings’ based on the value of +‘font-lock-maximum-decoration’, or provide more fine-grained control +through other mode-specific means. + +* Indent + +Indent works like this: We have a bunch of rules that look like this: + + (MATCHER ANCHOR OFFSET) + +At the beginning point is at the BOL of a line, we want to know which +column to indent this line to. Let NODE be the node at point, we pass +this node to the MATCHER of each rule, one of them will match the node +("this node is a closing bracket!"). Then we pass the node to the +ANCHOR, which returns a point, eg, the BOL of the previous line. We +find the column number of that point (eg, 4), add OFFSET to it (eg, +0), and that is the column we want to indent the current line to (4 + +0 = 4). + +For MATHCER we have + + (parent-is TYPE) + (node-is TYPE) + (query QUERY) => matches if querying PARENT with QUERY + captures NODE. + + (match NODE-TYPE PARENT-TYPE NODE-FIELD + NODE-INDEX-MIN NODE-INDEX-MAX) + + => checks everything. If an argument is nil, don’t match that. Eg, + (match nil nil TYPE) is the same as (parent-is TYPE) + +For ANCHOR we have + + first-sibling => start of the first sibling + parent => start of parent + parent-bol => BOL of the line parent is on. + prev-sibling + no-indent => don’t indent + prev-line => same indent as previous line + +There is also a manual section for indent: "Parser-based Indentation". + +When writing indent rules, you can use ‘treesit-check-indent’ to +check if your indentation is correct. To debug what went wrong, set +‘treesit--indent-verboase’ to non-nil. Then when you indent, Emacs +tells you which rule is applied in the echo area. + +#+begin_src elisp +(defvar typescript-mode-indent-rules + (let ((offset typescript-indent-offset)) + `((typescript + ;; This rule matches if node at point is "}", ANCHOR is the + ;; parent node’s BOL, and offset is 0. + ((node-is "}") parent-bol 0) + ((node-is ")") parent-bol 0) + ((node-is "]") parent-bol 0) + ((node-is ">") parent-bol 0) + ((node-is ".") parent-bol ,offset) + ((parent-is "ternary_expression") parent-bol ,offset) + ((parent-is "named_imports") parent-bol ,offset) + ((parent-is "statement_block") parent-bol ,offset) + ((parent-is "type_arguments") parent-bol ,offset) + ((parent-is "variable_declarator") parent-bol ,offset) + ((parent-is "arguments") parent-bol ,offset) + ((parent-is "array") parent-bol ,offset) + ((parent-is "formal_parameters") parent-bol ,offset) + ((parent-is "template_substitution") parent-bol ,offset) + ((parent-is "object_pattern") parent-bol ,offset) + ((parent-is "object") parent-bol ,offset) + ((parent-is "object_type") parent-bol ,offset) + ((parent-is "enum_body") parent-bol ,offset) + ((parent-is "arrow_function") parent-bol ,offset) + ((parent-is "parenthesized_expression") parent-bol ,offset) + ...)))) +#+end_src + +Then you set ‘treesit-simple-indent-rules’ to your rules, and set +‘indent-line-function’: + +#+begin_src elisp +(setq-local treesit-simple-indent-rules typescript-mode-indent-rules) +(setq-local indent-line-function #'treesit-indent) +#+end_src + +* Imenu + +Not much to say except for utilizing ‘treesit-induce-sparse-tree’. +See ‘python--imenu-treesit-create-index-1’ in python.el for an +example. + +Once you have the index builder, set ‘imenu-create-index-function’. + +* Navigation + +Mainly ‘beginning-of-defun-function’ and ‘end-of-defun-function’. +You can find the end of a defun with something like + +(treesit-search-forward-goto "function_definition" 'end) + +where "function_definition" matches the node type of a function +definition node, and ’end means we want to go to the end of that +node. + +Something like this should suffice: + +#+begin_src elisp +(defun xxx-beginning-of-defun (&optional arg) + (if (> arg 0) + ;; Go backward. + (while (and (> arg 0) + (treesit-search-forward-goto + "function_definition" 'start nil t)) + (setq arg (1- arg))) + ;; Go forward. + (while (and (< arg 0) + (treesit-search-forward-goto + "function_definition" 'start)) + (setq arg (1+ arg))))) + +(setq-local beginning-of-defun-function #'xxx-beginning-of-defun) +#+end_src + +And the same for end-of-defun. + +* Which-func + +You can find the current function by going up the tree and looking for +the function_definition node. See ‘python-info-treesit-current-defun’ +in python.el for an example. Since Python allows nested function +definitions, that function keeps going until it reaches the root node, +and records all the function names along the way. + +#+begin_src elisp +(defun python-info-treesit-current-defun (&optional include-type) + "Identical to `python-info-current-defun' but use tree-sitter. +For INCLUDE-TYPE see `python-info-current-defun'." + (let ((node (treesit-node-at (point))) + (name-list ()) + (type nil)) + (cl-loop while node + if (pcase (treesit-node-type node) + ("function_definition" + (setq type 'def)) + ("class_definition" + (setq type 'class)) + (_ nil)) + do (push (treesit-node-text + (treesit-node-child-by-field-name node "name") + t) + name-list) + do (setq node (treesit-node-parent node)) + finally return (concat (if include-type + (format "%s " type) + "") + (string-join name-list "."))))) +#+end_src + +* More features? + +Obviously this list is just a starting point, if there are features in +the major mode that would benefit a parse tree, adding tree-sitter +support for that would be great. But in the minimal case, just adding +font-lock is awesome. + +* Common tasks + +How to... + +** Get the buffer text corresponding to a node? + +(treesit-node-text node) + +BTW ‘treesit-node-string’ does different things. + +** Scan the whole tree for stuff? + +(treesit-search-subtree) +(treesit-search-forward) +(treesit-induce-sparse-tree) + +** Move to next node that...? + +(treesit-search-forward-goto) + +** Get the root node? + +(treesit-buffer-root-node) + +** Get the node at point? + +(treesit-node-at (point)) + +* Manual + +I suggest you read the manual section for tree-sitter in Info. The +section is Parsing Program Source. Typing + + C-h i d m elisp RET g Parsing Program Source RET + +will bring you to that section. You can also read the HTML version +under /html-manual in this directory. I find the HTML version easier +to read. You don’t need to read through every sentence, just read the +text paragraphs and glance over function names.