From: Yuan Fu Date: Sun, 23 Oct 2022 01:44:11 +0000 (-0700) Subject: Resolve FIXME's in tree-sitter manual sections X-Git-Tag: emacs-29.0.90~1804 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=773cce640fc5d67cb1a64622defa073d7ec5fcc4;p=emacs.git Resolve FIXME's in tree-sitter manual sections Pattern vs query: a query consists of many patterns. I tightened up the use of pattern vs query in the manual, now there shouldn't be ambiguities. * doc/lispref/modes.texi (Parser-based Font Lock): * doc/lispref/parsing.texi (Language Definitions): Resolve FIXME's. --- diff --git a/doc/lispref/modes.texi b/doc/lispref/modes.texi index 24892077d1a..3537d312f2f 100644 --- a/doc/lispref/modes.texi +++ b/doc/lispref/modes.texi @@ -3904,10 +3904,17 @@ variables with regexp-based font lock, it uses similar customization schemes. The tree-sitter counterpart of @var{font-lock-keywords} is @var{treesit-font-lock-settings}. -@c FIXME: The ``query'' part here and thereafter comes ``out of the -@c blue''. There should be some text here explaining what those -@c ``queries'' are and how are they related to fontifications, or a -@c cross-reference to another place with such an explanation. +In general, tree-sitter fontification works like the following: a Lisp +program provides a @dfn{query} consisting of @dfn{patterns} with +@dfn{capture names}. Tree-sitter finds the nodes in the parse tree +that match these patterns, tags the corresponding capture names onto +the nodes, and returns them to the Lisp program. The Lisp program +takes theses nodes and highlights the corresponding buffer text of +each node depending on the tagged capture name of the node. For +example, a node tagged @code{font-lock-keyword} would simply be +highlighted in @code{font-lock-keyword} face. For more information on +queries, patterns and capture names, @pref{Pattern Matching}. + @defun treesit-font-lock-rules :keyword value query... This function is used to set @var{treesit-font-lock-settings}. It takes care of compiling queries and other post-processing, and outputs @@ -3948,9 +3955,10 @@ Other keywords are optional: @item @tab @code{keep} @tab Fill-in regions without an existing face @end multitable -@c FIXME: The ``capture names'' part should be expl,ained before it is -@c first used: what it is and how it's related to fontifications. -Capture names in @var{query} should be face names like +Lisp programs mark patterns in the query with capture names (names +that starts with @code{@@}), and tree-sitter will return matched nodes +with capture names tagged onto them. For the purpose of +fontification, capture names in @var{query} should be face names like @code{font-lock-keyword-face}. The captured node will be fontified with that face. Capture names can also be function names, in which case the function is called with 3 arguments: @var{start}, @var{end}, @@ -3966,9 +3974,8 @@ is a list that represents a decoration level. @code{font-lock-maximum-decoration} controls which levels are activated. -@c FIXME: This should be rewritten using our style: ``each element of -@c the list is a list of the form (FOO BAR BAZ), where FOO...'' etc. -Inside each sublist are feature symbols, which correspond to the +Each element of the list is a list of the form @w{@code{(@var{feature} +@dots{})}}, where each @var{feature} corresponds to the @code{:feature} value of a query defined in @code{treesit-font-lock-rules}. Removing a feature symbol from this list disables the corresponding query during font-lock. @@ -3992,40 +3999,18 @@ For example, the value of this variable could be: Major modes should set this variable before calling @code{treesit-font-lock-enable}. -@c FIXME: ``for further changes''? This should clarify when this -@c function has to be called. @findex treesit-font-lock-recompute-features -In addition, for further changes to this variable to take effect, call -@code{treesit-font-lock-recompute-features}. +For this variable to take effect, a Lisp program should call +@code{treesit-font-lock-recompute-features} (which resets +@code{treesit-font-lock-settings} accordingly). @end defvar @defvar treesit-font-lock-settings A list of settings for tree-sitter based font lock. The exact format of this variable is considered internal. One should always use @code{treesit-font-lock-rules} to set this variable. - -@c FIXME: If the format is considered ``internal'', why do we need to -@c describe it here? -Each @var{setting} is of form - -@example -(@var{query} @var{enable} @var{feature} @var{override}) -@end example - -@var{query} must be a compiled query (@pxref{Pattern Matching}). - -For @var{setting} to be activated for font-lock, @var{enable} must be -@code{t}. To disable this @var{setting}, set @var{enable} to -@code{nil}. - -@var{feature} is the ``feature name'' of the query, users can control -which features are enabled with @code{font-lock-maximum-decoration} -and @code{treesit-font-lock-feature-list}. - -@var{override} is the override flag for this query. Its value can be -@code{t}, @code{nil}, @code{append}, @code{prepend}, or @code{keep}. -@c FIXME: See where? -See more in @code{treesit-font-lock-rules}. +@c Because the format is internal, we don't document them here. +@c Though We do have explanations in the docstring. @end defvar Multi-language major modes should provide range functions in @@ -4790,27 +4775,26 @@ a list of the form: @w{@code{(@var{language} . @var{rules})}}, where @var{language} is a language symbol, and @var{rules} is a list of the form @w{@code{(@var{matcher} @var{anchor} @var{offset})}}. -@c FIXME: ``node''? -First, Emacs passes the node at point to @var{matcher}; if it returns -non-@code{nil}, this rule is applicable. Then Emacs passes the node -to @var{anchor}, which returns a buffer position. Emacs takes the -column number of that position, adds @var{offset} to it, and the -result is the indentation column for the current line. +First, Emacs passes the smallest tree-sitter node at the beginning of +the current line to @var{matcher}; if it returns non-@code{nil}, this +rule is applicable. Then Emacs passes the node to @var{anchor}, which +returns a buffer position. Emacs takes the column number of that +position, adds @var{offset} to it, and the result is the indentation +column for the current line. The @var{matcher} and @var{anchor} are functions, and Emacs provides convenient defaults for them. -@c FIXME: Clarify the following description. In particular, how to -@c find/compute ``the largest node'' and its ``parent''? Each @var{matcher} or @var{anchor} is a function that takes three arguments: @var{node}, @var{parent}, and @var{bol}. The argument @var{bol} is the buffer position whose indentation is required: the position of the first non-whitespace character after the beginning of the line. The argument @var{node} is the largest (highest-in-tree) node that starts at that position; and @var{parent} is the parent of -@var{node}. @var{matcher} should return non-@code{nil} if the rule is -applicable, and @var{anchor} should return a buffer position that is -the basis of the indentation. +@var{node}. Emacs finds @var{bol}, @var{node} and @var{parent} and +passes them to each @var{matcher} and @var{anchor}. @var{matcher} +should return non-@code{nil} if the rule is applicable, and +@var{anchor} should return a buffer position. @end defvar @defvar treesit-simple-indent-presets @@ -4821,63 +4805,69 @@ available default functions are: @ftable @code @item no-node -This matcher is a symbol that matches the case where @var{node} is +This matcher is a function that matches the case where @var{node} is @code{nil}, i.e., there is no node that starts at @var{bol}. This is the case when @var{bol} is on an empty line or inside a multi-line string, etc. @item parent-is -This matcher is a function of one argument, @var{type}; it matches if -the type of the parent node is @var{type}. +This matcher is a function of one argument, @var{type}; it return a +function that given @w{@code{(@var{node} @var{parent} @var{bol})}}, +matches if @var{parent}'s type is @var{type}. @item node-is -This matcher is a function of one argument, @var{type}; it matches if -the node's type is @var{type}. +This matcher is a function of one argument, @var{type}; it returns a +function that given @w{@code{(@var{node} @var{parent} @var{bol})}}, +matches if @var{node}'s type is @var{type}. -@c FIXME: The description of this matcher is unclear. What is -@c ``parent'' and what does it mean ``captures NODE''? @item query -This matcher is a function of one argument, @var{query}; it matches if -querying @var{parent} with @var{query} captures @var{node}. The -capture name does not matter. @c Why is this bit important? +This matcher is a function of one argument, @var{query}; it returns a +function that given @w{@code{(@var{node} @var{parent} @var{bol})}}, +matches if querying @var{parent} with @var{query} captures @var{node} +(@pxref{Pattern Matching}). @item match This matcher is a function of 5 arguments: @var{node-type}, @var{parent-type}, @var{node-field}, @var{node-index-min}, and -@var{node-index-max}). It matches if @var{node}'s type is @var{node-type}, -@var{parent}'s type is @var{parent-type}, @var{node}'s field name in -@var{parent} is @var{node-field}, and @var{node}'s index among its -siblings is between @var{node-index-min} and @var{node-index-max}. If -@c FIXME: ``constraint''? -the value of a constraint is nil, this matcher doesn't check for that -constraint. For example, to match the first child where parent is +@var{node-index-max}). It returns a function that given +@w{@code{(@var{node} @var{parent} @var{bol})}}, matches if +@var{node}'s type is @var{node-type}, @var{parent}'s type is +@var{parent-type}, @var{node}'s field name in @var{parent} is +@var{node-field}, and @var{node}'s index among its siblings is between +@var{node-index-min} and @var{node-index-max}. If the value of an +argument is @code{nil}, this matcher doesn't check for that argument. +For example, to match the first child where parent is @code{argument_list}, use @example (match nil "argument_list" nil nil 0 0) @end example -@c FIXME: ``PARENT''? is that an argument of the anchor function @item first-sibling -This anchor returns the start of the first child of @var{parent}. +This anchor is a function that given @w{@code{(@var{node} @var{parent} +@var{bol})}}, returns the start of the first child of @var{parent}. @item parent -This anchor returns the start of @var{parent}. @c FIXME: Likewise. +This anchor is a function that given @w{@code{(@var{node} @var{parent} +@var{bol})}}, returns the start of @var{parent}. @item parent-bol -This anchor returns the first non-space character on the line of +This anchor is a function that given @w{@code{(@var{node} @var{parent} +@var{bol})}}, returns the first non-space character on the line of @var{parent}. -@c FIXME: ``NODE''? @item prev-sibling -This anchor returns the start of the previous sibling of @var{node}. +This anchor is a function that given @w{@code{(@var{node} @var{parent} +@var{bol})}}, returns the start of the previous sibling of @var{node}. @item no-indent -This anchor returns the start of @var{node}, i.e., no indent. @c ??? +This anchor is a function that given @w{@code{(@var{node} @var{parent} +@var{bol})}}, returns the start of @var{node}. @item prev-line -This anchor returns the first non-whitespace charater on the previous -line. +This anchor is a function that given @w{@code{(@var{node} @var{parent} +@var{bol})}}, returns the first non-whitespace charater on the +previous line. @end ftable @end defvar diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi index 9079e0f7817..502a0e4f264 100644 --- a/doc/lispref/parsing.texi +++ b/doc/lispref/parsing.texi @@ -95,7 +95,7 @@ This means Emacs could not find the language definition library. @item (symbol-error @var{error-msg}) This means Emacs could not find in the library the expected function that every language definition library should export. -@item (version_mismatch @var{error-msg}) +@item (version-mismatch @var{error-msg}) This means the version of language definition library is incompatible with that of the tree-sitter library. @end table @@ -253,7 +253,7 @@ syntax tree effectively, you need to consult the @dfn{grammar file}. The grammar file is usually @file{grammar.js} in a language definition's project repository. The link to a language definition's home page can be found on -@uref{https://tree-sitter.github.io/tree-sitter, the tree-sitter's +@uref{https://tree-sitter.github.io/tree-sitter, tree-sitter's homepage}. The grammar definition is written in JavaScript. For example, the @@ -405,11 +405,11 @@ returns non-@code{nil} if it is, @code{nil} otherwise. @end defun There is no need to explicitly parse a buffer, because parsing is done -automatically and lazily. A parser only parses when the mode queris -for a node in its syntax tree. Therefore, when a parser is first -created, it doesn't parse the buffer; it waits until the mode queries -for a node for the first time. Similarly, when some change is made in -the buffer, a parser doesn't re-parse immediately. +automatically and lazily. A parser only parses when a Lisp program +queris for a node in its syntax tree. Therefore, when a parser is +first created, it doesn't parse the buffer; it waits until the Lisp +program queries for a node for the first time. Similarly, when some +change is made in the buffer, a parser doesn't re-parse immediately. @vindex treesit-buffer-too-large When a parser does parse, it checks for the size of the buffer. @@ -510,7 +510,7 @@ Example: @group ;; Find the node at point in a C parser's syntax tree. (treesit-node-at (point) 'c) - @result{} # + @result{} # @end group @end example @end defun @@ -606,7 +606,7 @@ This function finds the child of @var{node} whose field name is @group ;; Get the child that has "body" as its field name. (treesit-child-by-field-name node "body") - @result{} # + @result{} # @end group @end example @end defun @@ -644,20 +644,24 @@ does. By default, this function only traverses named nodes, but if @var{all} is non-@code{nil}, it traverses all the nodes. If @var{backward} is -@c FIXME: What does it mean to ``traverse backward''? -non-nil, it traverses backwards. If @var{limit} is non-@code{nil}, it +non-nil, it traverses backwards (meaning visiting the last child first +when traversing down the tree). If @var{limit} is non-@code{nil}, it must be a number that limits the tree traversal to that many levels down the tree. @end defun @defun treesit-search-forward start predicate &optional all backward up -@c FIXME: Explain better what is the differencve between this function -@c and the previous one. -This function is somewhat similar to @code{treesit-search-subtree}. -It also traverse the parse tree and matches each node with -@var{predicate} (except for @var{start}), where @var{predicate} can be -a (case-insensitive) regexp or a function. For a tree like the below -where @var{start} is marked 1, this function traverses as numbered: +While @code{treesit-search-subtree} traverses the subtree of a node, +this function usually starts with a leaf node and traverses every node +comes after it in terms of buffer position. It is useful for +answering questions like ``what is the first node after @var{start} in +the buffer that satisfies some condition?'' + +Like @code{treesit-search-subtree}, this function also traverse the +parse tree and matches each node with @var{predicate} (except for +@var{start}), where @var{predicate} can be a (case-insensitive) regexp +or a function. For a tree like the below where @var{start} is marked +1, this function traverses as numbered: @example @group @@ -830,7 +834,7 @@ is not yet in its final form. @cindex tree-sitter extra node @cindex extra node, tree-sitter -A node can be ``extra'': extra nodes represent things like comments, +A node can be ``extra'': such nodes represent things like comments, which can appear anywhere in the text. @cindex tree-sitter node that has changes @@ -1007,9 +1011,9 @@ root node with @var{query}, and returns the result. @heading More query syntax -Besides node type and capture, tree-sitter's query syntax can express -anonymous node, field name, wildcard, quantification, grouping, -alternation, anchor, and predicate. +Besides node type and capture, tree-sitter's pattern syntax can +express anonymous node, field name, wildcard, quantification, +grouping, alternation, anchor, and predicate. @subheading Anonymous node @@ -1022,9 +1026,9 @@ pattern matching (and capturing) keyword @code{return} would be @subheading Wild card -In a query pattern, @samp{(_)} matches any named node, and @samp{_} -matches any named and anonymous node. For example, to capture any -named child of a @code{binary_expression} node, the pattern would be +In a pattern, @samp{(_)} matches any named node, and @samp{_} matches +any named and anonymous node. For example, to capture any named child +of a @code{binary_expression} node, the pattern would be @example (binary_expression (_) @@in_biexp) @@ -1032,10 +1036,10 @@ named child of a @code{binary_expression} node, the pattern would be @subheading Field name -It is possible to capture child nodes that have specific field names: +It is possible to capture child nodes that have specific field names. +In the pattern below, @code{declarator} and @code{body} are field +names, indicated by the colon following them. -@c FIXME: The significance of ``:'' should be explained, and also what -@c are ``declarator'' and ``body''. @example @group (function_definition @@ -1059,7 +1063,6 @@ Tree-sitter recognizes quantification operators @samp{*}, @samp{+} and @samp{*} matches the preceding pattern zero or more times, @samp{+} matches one or more times, and @samp{?} matches zero or one time. -@c FIXME: ``pattern'' or :''query''? Or maybe ``query pattern''? For example, the following pattern matches @code{type_declaration} nodes that has @emph{zero or more} @code{long} keyword. @@ -1087,9 +1090,9 @@ express a comma separated list of identifiers, one could write @subheading Alternation Again, similar to regular expressions, we can express ``match anyone -from this group of patterns'' in the query pattern. The syntax is a -list of patterns enclosed in square brackets. For example, to capture -some keywords in C, the query pattern would be +from this group of patterns'' in a pattern. The syntax is a list of +patterns enclosed in square brackets. For example, to capture some +keywords in C, the pattern would be @example @group @@ -1136,7 +1139,7 @@ nodes. @subheading Predicate It is possible to add predicate constraints to a pattern. For -example, with the following query pattern: +example, with the following pattern: @example @group @@ -1170,11 +1173,11 @@ names in other patterns. @heading S-expression patterns -@cindex query patterns as sexps +@cindex patterns as sexps @cindex patterns, tree-sitter, in sexp form -Besides strings, Emacs provides a s-expression based syntax for query +Besides strings, Emacs provides a s-expression based syntax for patterns. It largely resembles the string-based syntax. For example, -the following pattern +the following query @example @group