at point. The mode-line will display
@example
-@var{parent} @var{field}: (@var{child} (@var{grandchild} (@dots{})))
+@var{parent} @var{field}: (@var{node} (@var{child} (@dots{})))
@end example
-@var{child}, @var{grand}, @var{grand-grandchild}, etc., are nodes that
-begin at point. @var{parent} is the parent node of @var{child}.
+where @var{node}, @var{child}, etc, are nodes which begin at point.
+@var{parent} is the parent of @var{node}. @var{node} is displayed in
+bold typeface. @var{field-name}s are field names of @var{node} and
+@var{child}, etc.
-If there is no node that starts at point, i.e., point is in the middle
-of a node, then the mode-line only displays the smallest node that
-spans the position of point, and its immediate parent.
+If no node starts at point, i.e., point is in the middle of a node,
+then the mode line displays the earliest node that spans point, and
+its immediate parent.
-This minor mode doesn't create parsers on its own. It simply uses the
-first parser in @code{(treesit-parser-list)} (@pxref{Using Parser}).
+This minor mode doesn't create parsers on its own. It uses the first
+parser in @code{(treesit-parser-list)} (@pxref{Using Parser}).
@end deffn
@heading Reading the grammar definition
need to be assigned different parsers. Traditionally, this is
achieved by using narrowing. While tree-sitter works with narrowing
(@pxref{tree-sitter narrowing, narrowing}), the recommended way is
-instead to set regions of buffer text in which a parser will operate.
+instead to set regions of buffer text (i.e., ranges) in which a parser
+will operate. This section describes functions for setting and
+getting ranges for a parser.
-@c FIXME: This text should be expanded, as it doesn't explain who do
-@c the other functions in this node related to supporting multiple
-@c languages.
-Specifically, a multi-language major mode should set
-@code{treesit-language-at-point-function} and
-@code{treesit-range-settings} for Emacs to handle multiple languages
-in the same buffer.
+Lisp programs should call @code{treesit-update-ranges} to make sure
+the ranges for each parser are correct before using parsers in a
+buffer, and call @code{treesit-language-at} to figure out the language
+responsible for the text at some position. Multi-language major modes
+sets @code{treesit-range-settings} and
+@code{treesit-language-at-point-function} respectively to power these
+two functions. These functions and variables are explained in more
+detail towards the end of the section.
+
+@heading Getting and setting ranges
@defun treesit-parser-set-included-ranges parser ranges
This function sets up @var{parser} to operate on @var{ranges}. The
@end example
@end defun
-@defun treesit-set-ranges parser-or-lang ranges
-Like @code{treesit-parser-set-included-ranges}, this function sets
-the ranges of @var{parser-or-lang} to @var{ranges}. Conveniently,
-@var{parser-or-lang} could be either a parser or a language. If it is
-a language, this function looks for the first parser in
-@code{(treesit-parser-list)} for that language in the current buffer,
-and sets the ranges for it.
-@end defun
-
-@defun treesit-get-ranges parser-or-lang
-This function returns the ranges of @var{parser-or-lang}, like
-@code{treesit-parser-included-ranges}. And like
-@code{treesit-set-ranges}, @var{parser-or-lang} can be a parser or
-a language symbol.
-@end defun
-
@defun treesit-query-range source query &optional beg end
This function matches @var{source} with @var{query} and returns the
ranges of captured nodes. The return value is a list of cons cells of
@code{treesit-query-error} error if @var{query} is malformed.
@end defun
+@heading Supporting multiple languages as Lisp programs
+
+It should suffice for general Lisp programs to call these two
+functions in order to support program sources that mixes multiple
+languages.
+
@defun treesit-update-ranges &optional beg end
-This function is used by fontifications and indentation to update
-ranges before using any parser. It makes sure the parsers' ranges are
-set correctly between @var{beg} and @var{end}, according to
-@code{treesit-range-settings}. If omitted, @var{beg} defaults to the
-beginning of the buffer, and @var{end} defaults to the end of the
-buffer.
+This function updates ranges for parsers in the buffer. It makes sure
+the parsers' ranges are set correctly between @var{beg} and @var{end},
+according to @code{treesit-range-settings}. If omitted, @var{beg}
+defaults to the beginning of the buffer, and @var{end} defaults to the
+end of the buffer.
+
+For example, fontification functions uses this function before
+querying for nodes in a region.
@end defun
-@vindex treesit-language-at-point-function
@defun treesit-language-at pos
-This function tries to figure out which language is responsible for
-the text at buffer position @var{pos}. Under the hood it just calls
-@code{treesit-language-at-point-function}, which is a function that
-takes the same argument as this function.
-
-Various Lisp programs use this function. For example, the indentation
-program uses this function to determine which language's rule to use
-in a multi-language buffer.
+This function returns the language of the text at buffer position
+@var{pos}. Under the hood it calls
+@code{treesit-language-at-point-function} and returns its return
+value. If @code{treesit-language-at-point-function} is @code{nil},
+this function returns the language of the first parser in the returned
+value of @code{treesit-parser-list}. If there is no parser in the
+buffer, it returns @code{nil}.
@end defun
-@heading An example
+@heading Supporting multiple languages as major modes
+@cindex host language, tree-sitter
+@cindex tree-sitter host language
+@cindex embedded language, tree-sitter
+@cindex tree-sitter embedded language
Normally, in a set of languages that can be mixed together, there is a
-main language and several embedded languages. A Lisp program usually
-first parses the whole document with the major language's parser, sets
-ranges for the embedded languages, and then parses the embedded
+host language and one or more embedded languages. A Lisp program
+usually first parses the whole document with the host language's
+parser, retrieve some information, sets ranges for the embedded
+languages with that information, and then parses the embedded
languages.
-Suppose we need to parse a very simple document that mixes
-@acronym{HTML}, @acronym{CSS} and JavaScript:
+Take a buffer containing @acronym{HTML}, @acronym{CSS} and JavaScript
+as an example. A lisp program will first parse the whole buffer with
+an @acronym{HTML} parser, then query the parser for
+@code{style_element} and @code{script_element} nodes, which
+corresponds to @acronym{CSS} and JavaScript text, respectively. Then
+it sets the range of the @acronym{CSS} and JavaScript parser to the
+range in which their corresponding nodes span.
+
+Given a simple @acronym{HTML} document:
@example
@group
@end group
@end example
-We first parse with @acronym{HTML}, then set ranges for @acronym{CSS}
-and JavaScript:
+A Lisp program will first parse with a @acronym{HTML} parser, then set
+ranges for @acronym{CSS} and JavaScript parsers:
@example
@group
@end group
@end example
-We use a query pattern @w{@code{(style_element (raw_text) @@capture)}}
-to find @acronym{CSS} nodes in the @acronym{HTML} parse tree. For how
-to write query patterns, @pxref{Pattern Matching}.
-
-Emacs can automate the above process in @code{treesit-update-ranges}.
-For it to work, a Lisp program should set
-@code{treesit-range-settings} to the output of
-@code{treesit-range-rules}, like in the following example:
+Emacs automates this process in @code{treesit-update-ranges}. A
+multi-language major mode should set @code{treesit-range-settings} so
+that this function knows how to perform this process automatically.
+Major modes should use the helper function @code{treesit-range-rules}
+to generate the value that @code{treesit-range-settings} can have.
+The settings in the following example directly translates to
+operations shown above.
@example
@group
(treesit-range-rules
:embed 'javascript
:host 'html
- '((script_element (raw_text) @@cap))
+ '((script_element (raw_text) @@capture))
@end group
@group
:embed 'css
:host 'html
- '((style_element (raw_text) @@cap))))
+ '((style_element (raw_text) @@capture))))
@end group
@end example
-@c FIXME: This is NOT how we document series of 3 arguments! It is
-@c better to use ``&rest query-specs'' instead, and then tell that
-@c each query-spec is a triplet of :keyword, value, and the query
-@c itself. But then the doc string of the function and its advertised
-@c calling sequence, should be changed accordingly.
-@defun treesit-range-rules :keyword value query...
+@defun treesit-range-rules &rest query-specs
This function is used to set @var{treesit-range-settings}. It
takes care of compiling queries and other post-processing, and outputs
a value that @var{treesit-range-settings} can have.
-It takes a series of one or more @var{query}s in either the string,
-s-expression or compiled form. Each @var{query} should be preceded by
-a pair of @var{:keyword} and @var{value} that configure the query (and
-only that query).
-
-@c FIXME: The notion of ``host language'' was never explained. We do
-@c mention ``embedded language'', but without a @dfn and without an
-@c index entry to find it; that should also be fixed.
-For each query, the @code{:embed} keyword specifies the embedded
-language, and the @code{:host} keyword specified the host language.
-@c FIXME: The next sentence is not clear: what does it mean ``to set
-@c ranges in the embedded language''?
-Emacs queries the @var{query} in the host language and uses the result
-to set ranges for the embedded language.
-
-@c FIXME: ``It only needs to ensure...'' is not clear. What does
-@c ``only'' refer to? does it mean that's the only purpose of such a
-@c function?
-A @var{query} can also be a function that takes two arguments,
-@var{start} and @var{end}, and sets the range for parsers. It only
-needs to ensure ranges between @var{start} and @var{end} is correct.
-@c FIXME: This should be at the beginning of the description.
-When @var{query} is a function, it doesn't need keywords before it.
+It takes a series of @var{query-spec}s, where each @var{query-spec} is
+of the form @w{@code{@var{:keyword} @var{value} @dots{} @var{query}}}.
+Each @var{query} is a tree-sitter query in either the string,
+s-expression or compiled form, or a function.
+
+If @var{query} is a tree-sitter qurey, it should be preceded by 2
+pairs of @var{:keyword} and @var{value}, where the @code{:embed}
+keyword specifies the @dfn{embedded language}, and the @code{:host}
+keyword specified the @dfn{host language}.
+
+@code{treesit-update-ranges} uses @var{query} to figure out how to set
+the ranges for parsers for the embedded language. It queries
+@var{query} in a host language parser, computes the ranges in which
+the captured nodes span, and applies these ranges to embedded
+language parsers.
+
+If @var{query} is a function, it doesn't need any @var{keyword} and
+@var{value} pair. It should be a function that takes 2 arguments,
+@var{start} and @var{end}, and sets the ranges for parsers in the
+current buffer in the region between @var{start} and @var{end}. It is
+fine for this function to set ranges in a larger region that
+encompasses the region between @var{start} and @var{end}.
@end defun
+@defvar treesit-range-settings
+This variable helps @code{treesit-update-ranges} to update ranges for
+parsers in the buffer. It is a list of @var{setting}s where the exact
+format of a @var{setting} is considered internal. You should use
+@code{treesit-range-rules} to generate a value that this variable can
+have.
+
+@c Because the format is internal, we don't document them here. Though
+@c we do have it explained in the docstring. We also expose the fact
+@c that it is a list of settings, so one could combine two of them with
+@c append.
+@end defvar
+
+
+@defvar treesit-language-at-point-function
+This variable's value should be a function that takes a single
+argument, @var{pos}, which is a buffer position, and returns the
+language of the buffer text at @var{pos}. This variable is used by
+@code{treesit-language-at}.
+@end defvar
+
@node Tree-sitter major modes
@section Developing major modes with tree-sitter
@cindex major mode, developing with tree-sitter
This section covers some general guidelines on developing tree-sitter
integration for a major mode.
-In general, a major mode supporting tree-sitter features should
-roughly follow this pattern:
+A major mode supporting tree-sitter features should roughly follow
+this pattern:
@example
@group
checks whether the user turned on tree-sitter for @var{mode}
(according to @code{treesit-settings}), whether Emacs was built with
tree-sitter, whether the buffer's size is not too large for
-tree-sitter to handle it, and whether support for @var{language} is
-available in tree-sitter.
+tree-sitter to handle it, and whether language definition for
+@var{language} is available on the system (@pxref{Language
+Definitions}).
When the user sets @var{mode} to @var{demand} in @code{treesit-settings},
this function emits a warning if tree-sitter cannot be activated. If
@code{treesit-settings}.
If all the necessary conditions are met, this function returns
-non-@code{nil}; otherwise it return @code{nil}.
+non-@code{nil}; otherwise it returns @code{nil}.
@end defun
Next, the major mode should set up tree-sitter variables and call
@pxref{Parser-based Font Lock}, @pxref{Parser-based Indentation}, and
@pxref{List Motion}.
+For supporting mixing of multiple languages in a major mode,
+@pxref{Multiple Languages}.
+
@node Tree-sitter C API
@section Tree-sitter C API Correspondence
arguments, START and END. It should ensure parsers' ranges are
correct in the region between START and END.
-The exact form of the variable is considered internal and subject
+The exact form of each setting is considered internal and subject
to change. Use `treesit-range-rules' to set this variable.")
-(defun treesit-range-rules (&rest args)
+(defun treesit-range-rules (&rest query-specs)
"Produce settings for `treesit-range-settings'.
-Take a series of QUERYs in either the string, s-expression or
-compiled form.
+QUERY-SPECS contains a series of QUERY-SPEC of the form
-Each QUERY should be preceded by a :KEYWORD VALUE pair that
-configures the query (and only that query). For example,
+ :KEYWORD VALUE... QUERY
+
+Each QUERY is a tree-sitter query in either the string,
+s-expression or compiled form.
+
+Each QUERY should be preceded by :KEYWORD VALUE pairs that
+configures this query. For example,
(treesit-range-rules
:embed \\='javascript
:host \\='html
\\='((script_element (raw_text) @cap)))
-For each query, the `:embed' keyword specifies the embedded language,
-the `:host' keyword specified the host (main) language. Emacs queries
-the QUERY in the host language and uses the result to set ranges for
-the embedded language.
+The `:embed' keyword specifies the embedded language, and the
+`:host' keyword specifies the host language. They are used in
+this way: Emacs queries QUERY in the host language's parser,
+computes the ranges spanned by the captured nodes, and applies
+these ranges to parsers for the embedded language.
QUERY can also be a function that takes two arguments, START and
-END, and sets the range for parsers. The function only needs to
-ensure ranges between START and END is correct. If QUERY is a
-function, it doesn't need to have the :KEYWORD VALUE pair before it.
-
-\(:KEYWORD VALUE QUERY...)"
+END. If QUERY is a function, it doesn't need :KEYWORD VALUE
+pairs preceding it. This function should set the ranges for
+parsers in the current buffer in the region between START and
+END. It is OK for this function to set ranges in a larger region
+that encompasses the region between START and END."
(let (host embed result)
- (while args
- (pcase (pop args)
- (:host (let ((host-lang (pop args)))
+ (while query-specs
+ (pcase (pop query-specs)
+ (:host (let ((host-lang (pop query-specs)))
(unless (symbolp host-lang)
(signal 'treesit-error (list "Value of :host option should be a symbol" host-lang)))
(setq host host-lang)))
- (:embed (let ((embed-lang (pop args)))
+ (:embed (let ((embed-lang (pop query-specs)))
(unless (symbolp embed-lang)
(signal 'treesit-error (list "Value of :embed option should be a symbol" embed-lang)))
(setq embed embed-lang)))
(defvar-local treesit-font-lock-settings nil
"A list of SETTINGs for treesit-based fontification.
-The exact format of this variable is considered internal. One
-should always use `treesit-font-lock-rules' to set this variable.
+The exact format of each SETTING is considered internal. Use
+`treesit-font-lock-rules' to set this variable.
Each SETTING has the form:
t, nil, append, prepend, keep. See more in
`treesit-font-lock-rules'.")
-(defun treesit-font-lock-rules (&rest args)
+(defun treesit-font-lock-rules (&rest query-specs)
"Return a value suitable for `treesit-font-lock-settings'.
-Take a series of QUERIES in either string, s-expression or
-compiled form. For each query, captured nodes are highlighted
+QUERY-SPECS is made of a series of QUERY-SPECs of the form
+
+ :KEYWORD VALUE... QUERY
+
+QUERY is a tree-sitter query in either the string, s-expression
+or compiled form. For each query, captured nodes are highlighted
with the capture name as its face.
Before each QUERY there could be :KEYWORD VALUE pairs that
If a capture name is both a face and a function, the face takes
priority. If a capture name is not a face name nor a function
-name, it is ignored.
-
-\(fn :KEYWORD VALUE QUERY...)"
+name, it is ignored."
;; Other tree-sitter function don't tend to be called unless
;; tree-sitter is enabled, which means tree-sitter must be compiled.
;; But this function is usually call in `defvar' which runs
current-feature
;; The list this function returns.
(result nil))
- (while args
- (let ((token (pop args)))
+ (while query-specs
+ (let ((token (pop query-specs)))
(pcase token
;; (1) Process keywords.
(:language
- (let ((lang (pop args)))
+ (let ((lang (pop query-specs)))
(when (or (not (symbolp lang)) (null lang))
(signal 'treesit-font-lock-error
`("Value of :language should be a symbol"
,lang)))
(setq current-language lang)))
(:override
- (let ((flag (pop args)))
+ (let ((flag (pop query-specs)))
(when (not (memq flag '(t nil append prepend keep)))
(signal 'treesit-font-lock-error
`("Value of :override should be one of t, nil, append, prepend, keep"
,flag)))
(setq current-override flag)))
(:feature
- (let ((var (pop args)))
+ (let ((var (pop query-specs)))
(when (or (not (symbolp var))
(memq var '(t nil)))
(signal 'treesit-font-lock-error
(message "%s" treesit--inspect-name)
(message "No node at point")))))
-;; FIXME: in the next doc string, what is FIELD-NAME?
(define-minor-mode treesit-inspect-mode
"Minor mode that displays in the mode-line the node which starts at point.
PARENT FIELD-NAME: (NODE FIELD-NAME: (CHILD (...)))
-where NODE, CHILD, etc, are nodes which begin at point.
-PARENT is the parent of NODE. NODE is displayed in bold typeface.
+where NODE, CHILD, etc, are nodes which begin at point. PARENT
+is the parent of NODE. NODE is displayed in bold typeface.
+FIELD-NAMEs are field names of NODE and CHILD, etc (see Info
+node `(elisp)Language Definitions', heading \"Field names\").
If no node starts at point, i.e., point is in the middle of a
node, then the mode line displays the earliest node that spans point,