From: Yuan Fu Date: Thu, 3 Nov 2022 18:41:42 +0000 (-0700) Subject: ; Update guides in /admin/notes/tree-sitter X-Git-Tag: emacs-29.0.90~1726 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=5416ae5990337f5fb2b3e0fbf9c4575508da808e;p=emacs.git ; Update guides in /admin/notes/tree-sitter * admin/notes/tree-sitter/html-manual/Language-Definitions.html * admin/notes/tree-sitter/html-manual/Multiple-Languages.html * admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html * admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html * admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html * admin/notes/tree-sitter/html-manual/Pattern-Matching.html * admin/notes/tree-sitter/html-manual/Retrieving-Node.html * admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html * admin/notes/tree-sitter/html-manual/Using-Parser.html * admin/notes/tree-sitter/starter-guide: Update to reflect changes made recently. --- diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html index ba3eeb9eeb9..6df676b1680 100644 --- a/admin/notes/tree-sitter/html-manual/Language-Definitions.html +++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html @@ -66,14 +66,17 @@ Next: + +

In each of these directories, Emacs looks for a file with file-name +extensions specified by the variable treesit-load-suffixes. +

+

If Emacs cannot find the library or has problems loading it, Emacs +signals the treesit-load-language-error error. The data of +that signal could be one of the following: +

+
+
(not-found error-msg …)
+

This means that Emacs could not find the language definition library. +

+
(symbol-error error-msg)
+

This means that Emacs could not find in the library the expected function +that every language definition library should export. +

+
(version-mismatch error-msg)
+

This means that the version of language definition library is incompatible +with that of the tree-sitter library. +

+
+ +

In all of these cases, error-msg might provide additional +details about the failure.

-
Function: treesit-language-available-p language
-

This function checks whether the dynamic library for language is -present on the system, and return non-nil if it is. +

Function: treesit-language-available-p language &optional detail
+

This function returns non-nil if the language definitions for +language exist and can be loaded. +

+

If detail is non-nil, return (t . nil) when +language is available, and (nil . data) when it’s +unavailable. data is the signal data of +treesit-load-language-error.

-

By convention, the dynamic library for language is -libtree-sitter-language.ext, where ext is the -system-specific extension for dynamic libraries. Also by convention, +

By convention, the file name of the dynamic library for language is +libtree-sitter-language.ext, where ext is the +system-specific extension for dynamic libraries. Also by convention, the function provided by that library is named -tree_sitter_language. If a language definition doesn’t -follow this convention, you should add an entry +tree_sitter_language. If a language definition library +doesn’t follow this convention, you should add an entry

(language library-base-name function-name)
 
-

to treesit-load-name-override-list, where -library-base-name is the base filename for the dynamic library -(conventionally libtree-sitter-language), and +

to the list in the variable treesit-load-name-override-list, where +library-base-name is the basename of the dynamic library’s file name, +(usually, libtree-sitter-language), and function-name is the function provided by the library -(conventionally tree_sitter_language). For example, +(usually, tree_sitter_language). For example,

(cool-lang "libtree-sitter-coool" "tree_sitter_cooool")
 
-

for a language too cool to abide by conventions. +

for a language that considers itself too “cool” to abide by +conventions.

+
Function: treesit-language-version &optional min-compatible
-

Tree-sitter library has a language version, a language -definition’s version needs to match this version to be compatible. -

-

This function returns tree-sitter library’s language version. If -min-compatible is non-nil, it returns the minimal compatible -version. +

This function returns the version of the language-definition +Application Binary Interface (ABI) supported by the +tree-sitter library. By default, it returns the latest ABI version +supported by the library, but if min-compatible is +non-nil, it returns the oldest ABI version which the library +still can support. Language definition libraries must be built for +ABI versions between the oldest and the latest versions supported by +the tree-sitter library, otherwise the library will be unable to load +them.

Concrete syntax tree

+

A syntax tree is what a parser generates. In a syntax tree, each node represents a piece of text, and is connected to each other by a @@ -155,31 +195,34 @@ parent-child relationship. For example, if the source text is +------------+ +--------------+ +------------+ -

We can also represent it in s-expression: +

We can also represent it as an s-expression:

(root (expression (number) (operator) (number)))
 

Node types

- - - - -

Names like root, expression, number, -operator are nodes’ type. However, not all nodes in a -syntax tree have a type. Nodes that don’t are anonymous nodes, -and nodes with a type are named nodes. Anonymous nodes are -tokens with fixed spellings, including punctuation characters like -bracket ‘]’, and keywords like return. + + + + + +

Names like root, expression, number, and +operator specify the type of the nodes. However, not all +nodes in a syntax tree have a type. Nodes that don’t have a type are +known as anonymous nodes, and nodes with a type are named +nodes. Anonymous nodes are tokens with fixed spellings, including +punctuation characters like bracket ‘]’, and keywords like +return.

Field names

+ -

To make the syntax tree easier to -analyze, many language definitions assign field names to child -nodes. For example, a function_definition node could have a -declarator and a body: +

To make the syntax tree easier to analyze, many language definitions +assign field names to child nodes. For example, a +function_definition node could have a declarator and a +body:

(function_definition
@@ -189,39 +232,40 @@ nodes.  For example, a function_definition node could have a
 
 
Command: treesit-inspect-mode
-

This minor mode displays the node that starts at point in -mode-line. The mode-line will display +

This minor mode displays on the mode-line the node that starts +at point. The mode-line will display

-
parent field-name: (child (grand-child (...)))
+
parent field: (child (grandchild (…)))
 
-

child, grand-child, and grand-grand-child, etc, are -nodes that have their beginning at point. And parent is the -parent of child. +

child, grand, grand-grandchild, etc., are nodes that +begin at point. parent is the parent node of child.

If there is no node that starts at point, i.e., point is in the middle of a node, then the mode-line only displays the smallest node that -spans point, and its immediate parent. +spans the position of point, and its immediate parent.

This minor mode doesn’t create parsers on its own. It simply uses the first parser in (treesit-parser-list) (see Using Tree-sitter Parser).

Reading the grammar definition

+

Authors of language definitions define the grammar of a -language, and this grammar determines how does a parser construct a -concrete syntax tree out of the text. In order to use the syntax -tree effectively, we need to read the grammar file. +programming language, which determines how a parser constructs a +concrete syntax tree out of the program text. In order to use the +syntax tree effectively, you need to consult the grammar file.

-

The grammar file is usually grammar.js in a language -definition’s project repository. The link to a language definition’s -home page can be found in tree-sitter’s homepage -(https://tree-sitter.github.io/tree-sitter). +

The grammar file is usually grammar.js in a language +definition’s project repository. The link to a language definition’s +home page can be found on +tree-sitter’s +homepage.

-

The grammar is written in JavaScript syntax. For example, the rule -matching a function_definition node looks like +

The grammar definition is written in JavaScript. For example, the +rule matching a function_definition node looks like

function_definition: $ => seq(
@@ -231,12 +275,12 @@ matching a function_definition node looks like
 )
 
-

The rule is represented by a function that takes a single argument +

The rules are represented by functions that take a single argument $, representing the whole grammar. The function itself is -constructed by other functions: the seq function puts together a -sequence of children; the field function annotates a child with -a field name. If we write the above definition in BNF syntax, it -would look like +constructed by other functions: the seq function puts together +a sequence of children; the field function annotates a child +with a field name. If we write the above definition in the so-called +Backus-Naur Form (BNF) syntax, it would look like

function_definition :=
@@ -252,66 +296,77 @@ would look like
   body: (compound_statement))
 
-

Below is a list of functions that one will see in a grammar -definition. Each function takes other rules as arguments and returns -a new rule. +

Below is a list of functions that one can see in a grammar definition. +Each function takes other rules as arguments and returns a new rule.

-
    -
  • seq(rule1, rule2, ...) matches each rule one after another. - -
  • choice(rule1, rule2, ...) matches one of the rules in its -arguments. - -
  • repeat(rule) matches rule for zero or more times. +
    +
    seq(rule1, rule2, …)
    +

    matches each rule one after another. +

    +
    choice(rule1, rule2, …)
    +

    matches one of the rules in its arguments. +

    +
    repeat(rule)
    +

    matches rule for zero or more times. This is like the ‘*’ operator in regular expressions. - -

  • repeat1(rule) matches rule for one or more times. +

    +
    repeat1(rule)
    +

    matches rule for one or more times. This is like the ‘+’ operator in regular expressions. - -

  • optional(rule) matches rule for zero or one time. +

    +
    optional(rule)
    +

    matches rule for zero or one time. This is like the ‘?’ operator in regular expressions. - -

  • field(name, rule) assigns field name name to the child -node matched by rule. - -
  • alias(rule, alias) makes nodes matched by rule appear as -alias in the syntax tree generated by the parser. For example, - +

    +
    field(name, rule)
    +

    assigns field name name to the child node matched by rule. +

    +
    alias(rule, alias)
    +

    makes nodes matched by rule appear as alias in the syntax +tree generated by the parser. For example, +

    alias(preprocessor_call_exp, call_expression)
     
    -

    makes any node matched by preprocessor_call_exp to appear as +

    makes any node matched by preprocessor_call_exp appear as call_expression. -

+

+ -

Below are grammar functions less interesting for a reader of a +

Below are grammar functions of lesser importance for reading a language definition.

-
    -
  • token(rule) marks rule to produce a single leaf node. -That is, instead of generating a parent node with individual child -nodes under it, everything is combined into a single leaf node. - -
  • Normally, grammar rules ignore preceding whitespaces, -token.immediate(rule) changes rule to match only when -there is no preceding whitespaces. - -
  • prec(n, rule) gives rule a level n precedence. - -
  • prec.left([n,] rule) marks rule as left-associative, -optionally with level n. - -
  • prec.right([n,] rule) marks rule as right-associative, -optionally with level n. - -
  • prec.dynamic(n, rule) is like prec, but the precedence -is applied at runtime instead. -
- -

The tree-sitter project talks about writing a grammar in more detail: -https://tree-sitter.github.io/tree-sitter/creating-parsers. -Read especially “The Grammar DSL” section. +

+
token(rule)
+

marks rule to produce a single leaf node. That is, instead of +generating a parent node with individual child nodes under it, +everything is combined into a single leaf node. +

+
token.immediate(rule)
+

Normally, grammar rules ignore preceding whitespace; this +changes rule to match only when there is no preceding +whitespaces. +

+
prec(n, rule)
+

gives rule the level-n precedence. +

+
prec.left([n,] rule)
+

marks rule as left-associative, optionally with level n. +

+
prec.right([n,] rule)
+

marks rule as right-associative, optionally with level n. +

+
prec.dynamic(n, rule)
+

this is like prec, but the precedence is applied at runtime +instead. +

+
+ +

The documentation of the tree-sitter project has +more +about writing a grammar. Read especially “The Grammar DSL” +section.


diff --git a/admin/notes/tree-sitter/html-manual/Multiple-Languages.html b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html index 1ee2df7f442..eac142921f1 100644 --- a/admin/notes/tree-sitter/html-manual/Multiple-Languages.html +++ b/admin/notes/tree-sitter/html-manual/Multiple-Languages.html @@ -33,7 +33,7 @@ developing GNU and promoting software freedom." --> - +