in Emacs's use cases.
- Many tree-sitter functions take a TSPoint, which is basically a
- row and column. Emacs uses a gap buffer and does not keep
- information about the row and column position of a buffer.
- According to the author of tree-sitter, those functions only take
- a TSPoint so that it can be moved alongside the byte position and
- returned to the caller afterwards, and the position actually used
- is the specified byte position. He also said that he _thinks_
- that just passing a byte position will also work. As a result, a
- dummy value is used in place of each TSPoint. Judging by the
- nature of parsing algorithms, I think it is safe to use only the
- byte position, and I don't think this will change in the future.
-
- See: https://github.com/tree-sitter/tree-sitter/issues/445
+ line and column. Emacs uses a gap buffer and does not keep
+ information about the line and column positions in a buffer, so
+ it's hard for us to pass it to tree-sitter. Instead we just give
+ it dummy values. But there are certain languages that does need
+ the line and column positions to work right, like Haskell. So we
+ added optional line and column tracking. See the linecol section
+ below.
treesit.h has some commentary on the two main data structure for
the parser and node. treesit_sync_visible_region has some
Tree-sitter-related code in other files:
- src/alloc.c for gc for parser and node
- - src/casefiddle.c & src/insdel.c for notifying tree-sitter
- parser of buffer changes.
+ - src/casefiddle.c, src/insdel.c, src/editfns.c for notifying
+ tree-sitter parser of buffer changes.
- lisp/emacs-lisp/cl-preloaded.el & data.c & lisp.h for parser and
node type.
- print.c for printing tree-sitter objects (node, parser, query).
from the user's POV, each buffer, regardless of indirect or not,
appears to have their own parser list. A discussion can be found in
bug#59693. Note that that discussion led to an earlier design, which
- is different from the current one. */
+ is different from the current one.
+
+ Line and column reporting to tree-sitter: technically we had to send
+ tree-sitter the line and column position of each edit. But in
+ practice we just send it dummy values, because tree-sitter doesn't
+ use it for parsing and mostly just carries the line and column
+ positions around and return it when e.g. reporting node positions[1].
+ This has been working fine until we encountered grammars that
+ actually utilizes the line and column information for parsing
+ (Haskell)[2].
+
+ [1] https://github.com/tree-sitter/tree-sitter/issues/445
+ [2] https://github.com/tree-sitter/tree-sitter/issues/4001
+
+ So now we have to keep track of line and column positions and pass
+ valid values to tree-sitter. (It adds quite some complexity, but
+ only linearly; one can ignore all the linecol stuff when trying to
+ understand treesit code and then come back to it later.) Eli
+ convinced me to disable tracking by default, and only enable it for
+ languages that needs it. So the buffer starts out not tracking
+ linecol. And when a parser is created, if the language is in
+ treesit-languages-require-line-column-tracking, we enable tracking in
+ the buffer, and enable tracking for the parser. To simplify things,
+ once a buffer starts tracking linecol, it never disables tracking,
+ even if parsers that need tracking are all deleted; and for parsers,
+ tracking is determined at creation time, if it starts out
+ tracking/non-tracking, it stays that way, regardless of later changes
+ to treesit-languages-require-line-column-tracking.
+
+ To make calculating line/column positons fast, we store linecol
+ caches for begv, point, and zv in the buffer
+ (buf->ts_linecol_cache_xxx); and in the parser object, we store
+ linecol cache for visible beg/end of that parser.
+
+ In buffer editing functions, we need the linecol for
+ start/old_end/new_end, those can be calculated by scanning newlines
+ (treesit_linecol_of_pos) from the buffer point cache, which should be
+ always near the point. And we usually set the calculated linecol of
+ new_end back to the buffer point cache.
+
+ We also need to calculate linecol for the visible_beg/end for each
+ parser, and linecol for the buffer's begv/zv, these positions are
+ usually far from point, so we have caches for all of them (in either
+ the parser object or the buffer). These positions are far from
+ point, so it's inefficient to scan newlines from point to there to
+ get up-to-date linecol for them; but in the same time, because
+ they're far and outside the changed region, we can calculate their
+ change in line and column number by simply counting how much newlines
+ are added/removed in the changed region
+ (compute_new_linecol_by_change). */
+
+\f
+/*** Constants */
+
+/* A linecol_cache that points to BOB, this is always valid. */
+const struct ts_linecol TREESIT_BOB_LINECOL = { 1, 1, 0 };
+/* An uninitialized linecol. */
+const struct ts_linecol TREESIT_EMPTY_LINECOL = { 0, 0, 0 };
+const TSPoint TREESIT_TS_POINT_1_0 = { 1, 0 };
+
\f
/*** Initialization */
}
\f
+/*** Linecol functions */
+
+#define TREESIT_DEBUG_LINECOL false
+
+void treesit_debug_print_linecol (struct ts_linecol);
+
+void
+treesit_debug_print_linecol (struct ts_linecol linecol)
+{
+ printf ("{ line=%ld col=%ld bytepos=%ld }\n", linecol.line, linecol.col, linecol.bytepos);
+}
+
+/* Returns true if BUF tracks linecol. */
+bool treesit_buf_tracks_linecol_p (struct buffer *buf)
+{
+ return BUF_TS_LINECOL_BEGV (buf).bytepos != 0;
+}
+
+static void
+restore_restriction_and_selective_display (Lisp_Object record)
+{
+ save_restriction_restore (Fcar (record));
+ BVAR (current_buffer, selective_display) = Fcdr (record);
+ return;
+}
+
+/* Similar to display_count_lines, but behaves differently when
+ searching backwards: when found a newline, stop at the newline,
+ return count as normal (display_count_lines stops after the newline
+ and subtracts one from count). When searching forward, stop at the
+ position after the newline. Another difference is this function
+ disregards narrowing, so it works on bytepos outside of the visible
+ range. */
+static ptrdiff_t
+treesit_count_lines (ptrdiff_t start_byte,
+ ptrdiff_t limit_byte, ptrdiff_t count,
+ ptrdiff_t *byte_pos_ptr)
+{
+ /* I don't think display_count_lines signals, so the unwind-protect
+ technically isn't necessary. Also treesit_count_lines aren't
+ suppose to signal either since it's used in functions that aren't
+ supposed to signal (treesit_record_change and friends). */
+ Lisp_Object record = Fcons (save_restriction_save (),
+ BVAR (current_buffer, selective_display));
+
+
+ specpdl_ref pdl_count = SPECPDL_INDEX ();
+ record_unwind_protect (restore_restriction_and_selective_display, record);
+
+ BVAR (current_buffer, selective_display) = Qnil;
+ labeled_restrictions_remove_in_current_buffer ();
+ Fwiden ();
+ ptrdiff_t counted = display_count_lines (start_byte, limit_byte,
+ count, byte_pos_ptr);
+
+ unbind_to (pdl_count, Qnil);
+
+ /* If searching backwards and we found COUNT newlines, countermand the
+ different logic in display_count_lines. */
+ if (count < 0 && limit_byte != *byte_pos_ptr)
+ {
+ counted += 1;
+ *byte_pos_ptr -= 1;
+ }
+
+ return counted;
+}
+
+static void
+treesit_debug_validate_linecol (struct ts_linecol linecol)
+{
+ eassert (linecol.bytepos <= Z_BYTE);
+
+ /* We can't use count_lines as ground truth because it respects
+ narrowing, and calling it with a bytepos outside of the visible
+ portion results in infloop. */
+ ptrdiff_t _unused;
+ ptrdiff_t true_line_count = treesit_count_lines (BEG_BYTE, linecol.bytepos,
+ Z_BYTE, &_unused) + 1;
+ eassert (true_line_count == linecol.line);
+}
+
+/* Calculate and return the line and column number of BYTE_POS by
+ scanning newlines from CACHE. CACHE must be valid. */
+static struct ts_linecol
+treesit_linecol_of_pos (ptrdiff_t target_bytepos,
+ struct ts_linecol cache)
+{
+ if (TREESIT_DEBUG_LINECOL)
+ {
+ treesit_debug_validate_linecol (cache);
+ }
+
+ /* When we finished searching for newlines between CACHE and
+ TARGET_POS, BYTE_POS_2 is at TARGET_POS, and BYTE_POS_1 is at the
+ previous newline. If TARGET_POS happends to be on a newline,
+ BYTE_POS_1 will be on that position. BYTE_POS_1 is used for
+ calculating the column. (If CACHE and TARGET_POS are in the same
+ line, BYTE_POS_1 is unset and we don't use it.) */
+ ptrdiff_t byte_pos_1 = 0;
+ ptrdiff_t byte_pos_2 = 0;
+ /* Number of lines between CACHE and TARGET_POS. */
+ ptrdiff_t line_delta = 0;
+
+ if (target_bytepos == cache.bytepos)
+ return cache;
+
+ /* Search forward. */
+ if (cache.bytepos < target_bytepos)
+ {
+ byte_pos_2 = cache.bytepos;
+ while (byte_pos_2 < target_bytepos)
+ {
+ ptrdiff_t counted = treesit_count_lines (byte_pos_2, target_bytepos,
+ 1, &byte_pos_2);
+
+ if (counted > 0)
+ {
+ byte_pos_1 = byte_pos_2;
+ }
+ line_delta += counted;
+ }
+ eassert (byte_pos_2 == target_bytepos);
+ /* At this point, byte_pos_2 is at target_pos, and byte_pos_1 is
+ at the previous newline if we went across any. */
+
+ struct ts_linecol target_linecol;
+ target_linecol.bytepos = target_bytepos;
+ target_linecol.line = cache.line + line_delta;
+ /* If we moved across any newline, use the previous newline to
+ calculate the column; if we stayed at the same line, use the
+ cached column to calculate the new column. */
+ target_linecol.col = line_delta > 0
+ ? target_bytepos - byte_pos_1
+ : target_bytepos - cache.bytepos + cache.col;
+
+ if (TREESIT_DEBUG_LINECOL)
+ {
+ treesit_debug_validate_linecol (target_linecol);
+ }
+
+ return target_linecol;
+ }
+
+ /* Search backward. */
+ byte_pos_2 = cache.bytepos;
+ while (byte_pos_2 > target_bytepos)
+ {
+ ptrdiff_t counted = treesit_count_lines (byte_pos_2, target_bytepos,
+ -1, &byte_pos_2);
+ line_delta -= counted;
+ }
+ eassert (byte_pos_2 == target_bytepos);
+ /* At this point, pos_2 is at target_pos. */
+
+ struct ts_linecol target_linecol;
+ target_linecol.bytepos = target_bytepos;
+ target_linecol.line = cache.line + line_delta;
+ eassert (cache.line + line_delta > 0);
+
+ /* Calculate the column. */
+ if (line_delta == 0)
+ {
+ target_linecol.col = cache.col - (cache.bytepos - target_bytepos);
+ }
+ else
+ {
+ /* We need to find the previous newline in order to calculate the
+ column. */
+ ptrdiff_t counted = treesit_count_lines (byte_pos_2, BEG_BYTE, -1, &byte_pos_2);
+ target_linecol.col
+ = target_bytepos - (byte_pos_2 + counted == 1 ? 1 : 0);
+ }
+
+ if (TREESIT_DEBUG_LINECOL)
+ {
+ treesit_debug_validate_linecol (target_linecol);
+ }
+
+ return target_linecol;
+}
+
+/* Return a TSPoint given POS and VISIBLE_BEG. VISIBLE_BEG must be
+ before POS. */
+static TSPoint
+treesit_make_ts_point (struct ts_linecol visible_beg,
+ struct ts_linecol pos)
+{
+ TSPoint point;
+ if (visible_beg.line == pos.line)
+ {
+ point.row = 0;
+ point.column = pos.col - visible_beg.col;
+ eassert (point.column >= 0);
+ }
+ else
+ {
+ point.row = pos.line - visible_beg.line;
+ eassert (point.row > 0);
+ point.column = pos.col;
+ }
+ return point;
+}
+
+DEFUN ("treesit-tracking-line-column-p",
+ Ftreesit_tracking_line_column_p,
+ Streesit_tracking_line_column_p, 0, 1, 0,
+ doc : /* Return non-nil if BUFFER is tracking line and column.
+
+Return nil otherwise. BUFFER defaults to the current buffer. */)
+ (Lisp_Object buffer)
+{
+ struct buffer *buf = current_buffer;
+ if (!NILP (buffer))
+ {
+ CHECK_BUFFER (buffer);
+ buf = XBUFFER (buffer);
+ }
+
+ return treesit_buf_tracks_linecol_p (buf) ? Qt : Qnil;
+}
+
+DEFUN ("treesit-parser-tracking-line-column-p",
+ Ftreesit_parser_tracking_line_column_p,
+ Streesit_parser_tracking_line_column_p, 1, 1, 0,
+ doc : /* Return non-nil if PARSER is tracking line and column.
+
+Return nil otherwise.*/)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ return XTS_PARSER (parser)->visi_beg_linecol.bytepos == 0 ? Qnil : Qt;
+}
+
+
/*** Parsing functions */
static void
larger than UINT32_MAX. */
static inline void
treesit_tree_edit_1 (TSTree *tree, ptrdiff_t start_byte,
- ptrdiff_t old_end_byte, ptrdiff_t new_end_byte)
+ ptrdiff_t old_end_byte, ptrdiff_t new_end_byte,
+ TSPoint start_point, TSPoint old_end_point,
+ TSPoint new_end_point)
{
eassert (start_byte >= 0);
eassert (start_byte <= old_end_byte);
eassert (start_byte <= new_end_byte);
- TSPoint dummy_point = {0, 0};
eassert (start_byte <= UINT32_MAX);
eassert (old_end_byte <= UINT32_MAX);
eassert (new_end_byte <= UINT32_MAX);
TSInputEdit edit = {(uint32_t) start_byte,
(uint32_t) old_end_byte,
(uint32_t) new_end_byte,
- dummy_point, dummy_point, dummy_point};
+ start_point, old_end_point, new_end_point};
ts_tree_edit (tree, &edit);
}
-/* Update each parser's tree after the user made an edit. This
- function does not parse the buffer and only updates the tree, so it
- should be very fast. */
-void
-treesit_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
- ptrdiff_t new_end_byte)
+/* Given a position at POS_LINECOL, and the linecol of a buffer change
+ (START_LINECOL, OLD_END_LINECOL, and NEW_END_LINCOL), compute the new
+ linecol for that position, then scan from this now valid linecol to
+ TARGET_BYTEPOS and return the linecol at TARGET_BYTEPOS.
+
+ When POS_LINECOL is outside of the range between START_LINECOL and
+ OLD_END_LINECOL, we can calculate the change in line and column
+ number of POS_LINECOL by simply counting how many newlines are
+ removed/added in the change. Once we have the up-to-date line and
+ column number at POS_LINECOL.bytepos, we can just scan to
+ TARGET_BYTEPOS to get a linecol for it. The assumption is that
+ TARGET_BYTEPOS is far from START_LINECOL, etc, but close to
+ POS_LINECOL. So we avoids scanning longs distance from
+ START_LINECOL, etc.
+
+ However, this optimization only works when POS_LINECOL is outside the
+ range between START_LINECOL and OLD_END_LINECOL. If not, we've have
+ to scan from START_LINECOL or NEW_END_LINECOL to TARGET_BYTEPOS. */
+static struct ts_linecol
+compute_new_linecol_by_change (struct ts_linecol pos_linecol,
+ struct ts_linecol start_linecol,
+ struct ts_linecol old_end_linecol,
+ struct ts_linecol new_end_linecol,
+ ptrdiff_t target_bytepos)
+{
+ struct ts_linecol new_linecol = { 0, 0, 0 };
+
+ /* 1. Even start is behind pos, pos isn't affected. */
+ if (start_linecol.bytepos >= pos_linecol.bytepos)
+ {
+ new_linecol = pos_linecol;
+ }
+ /* 2. When old_end (oe) is before pos, the differnce between pos and
+ pos' is the difference between old_end and new_end (ne).
+
+ | | | | | |
+ s oe pos s oe pos
+ OR
+ | | | | |
+ s ne pos' s ne pos'
+
+ */
+ else if (old_end_linecol.bytepos <= pos_linecol.bytepos)
+ {
+ ptrdiff_t line_delta = new_end_linecol.line - old_end_linecol.line;
+ new_linecol.line = pos_linecol.line + line_delta;
+ new_linecol.bytepos
+ = pos_linecol.bytepos + new_end_linecol.bytepos - old_end_linecol.bytepos;
+
+ /* Suppose # is text, | is cursor:
+
+ ################
+ ########|########|
+ oe pos
+
+ Now, if we insert something:
+
+ ################
+ ########|OOOOO
+ OOOOOOOOOO|########|
+ ne pos'
+
+ Clearly, col for pos' is just the col of new_end plus the
+ distance between old_end and pos. The same goes for deletion.
+ */
+ if (old_end_linecol.line == pos_linecol.line)
+ {
+ eassert (old_end_linecol.col <= pos_linecol.col);
+ ptrdiff_t old_end_to_pos = pos_linecol.col - old_end_linecol.col;
+ new_linecol.col = new_end_linecol.col + old_end_to_pos;
+ }
+ else
+ {
+ new_linecol.col = pos_linecol.col;
+ }
+ }
+ /* 3. At this point, start < pos < old_end. We're kinda cooked, there
+ aren't much we can do other than scan the buffer from new_end or
+ start. */
+ else if (target_bytepos - start_linecol.bytepos
+ < eabs (target_bytepos - new_end_linecol.bytepos))
+ {
+ new_linecol = treesit_linecol_of_pos (target_bytepos, start_linecol);
+ }
+ else
+ {
+ new_linecol = treesit_linecol_of_pos (target_bytepos, new_end_linecol);
+ }
+
+ /* Now new_linecol is a valid linecol, scan from it to target_bytepos. */
+ if (new_linecol.bytepos != target_bytepos)
+ {
+ new_linecol = treesit_linecol_of_pos (target_bytepos, new_linecol);
+ }
+
+ if (TREESIT_DEBUG_LINECOL)
+ treesit_debug_validate_linecol (new_linecol);
+
+ return new_linecol;
+}
+
+/* Update each parser's tree after the user made an edit. This function
+ does not parse the buffer and only updates the tree, so it should be
+ very fast. If the caller knows there's no parser in the current
+ buffer, they can pass empty linecol for
+ START/OLD_END/NEW_END_linecol.
+
+ If the current buffer doesn't track linecol, start_linecol,
+ old_end_linecol, and new_end_linecol will be empty. In that case,
+ don't process linecols. */
+static void
+treesit_record_change_1 (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
+ ptrdiff_t new_end_byte,
+ struct ts_linecol start_linecol,
+ struct ts_linecol old_end_linecol,
+ struct ts_linecol new_end_linecol)
{
struct buffer *base_buffer = current_buffer;
if (current_buffer->base_buffer)
base_buffer = current_buffer->base_buffer;
Lisp_Object parser_list = BVAR (base_buffer, ts_parser_list);
+ bool buf_tracks_linecol = start_linecol.bytepos != 0;
+
FOR_EACH_TAIL_SAFE (parser_list)
{
CHECK_CONS (parser_list);
/* See comment (ref:visible-beg-null) if you wonder why we don't
update visible_beg/end when tree is NULL. */
+ bool parser_tracks_linecol
+ = XTS_PARSER (lisp_parser)->visi_beg_linecol.bytepos != 0;
+
if (tree != NULL)
{
eassert (start_byte <= old_end_byte);
eassert (start_byte <= new_end_byte);
- /* Think the recorded change as a delete followed by an
- insert, and think of them as moving unchanged text back
- and forth. After all, the whole point of updating the
- tree is to update the position of unchanged text. */
- ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
- ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
+ /* Before sending the edit to tree-sitter, we need to first
+ clip the beg/end to visible_beg and visible_end of the
+ parser. A tip for understanding the code below: think the
+ recorded change as a delete followed by an insert, and
+ think of them as moving unchanged text back and forth.
+ After all, the whole point of updating the tree is to
+ update the position of unchanged text. */
+ const ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
+ const ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
eassert (visible_beg >= 0);
eassert (visible_beg <= visible_end);
eassert (start_offset <= old_end_offset);
eassert (start_offset <= new_end_offset);
- treesit_tree_edit_1 (tree, start_offset, old_end_offset,
- new_end_offset);
- XTS_PARSER (lisp_parser)->need_reparse = true;
-
/* VISIBLE_BEG/END records tree-sitter's range of view in
the buffer. We need to adjust them when tree-sitter's
view changes. */
visi_beg_delta = (old_end_byte < visible_beg
? new_end_byte - old_end_byte : 0);
- XTS_PARSER (lisp_parser)->visible_beg = visible_beg + visi_beg_delta;
- XTS_PARSER (lisp_parser)->visible_end = (visible_end
- + visi_beg_delta
- + (new_end_offset
- - old_end_offset));
+ const ptrdiff_t new_visible_beg = visible_beg + visi_beg_delta;
+ const ptrdiff_t new_visible_end
+ = (visible_end + visi_beg_delta
+ + (new_end_offset - old_end_offset));
+
+ XTS_PARSER (lisp_parser)->visible_beg = new_visible_beg;
+ XTS_PARSER (lisp_parser)->visible_end = new_visible_end;
+
+ eassert (BEG_BYTE <= new_visible_beg);
+ eassert (new_visible_beg <= new_visible_end);
+ eassert (new_visible_end <= Z_BYTE);
+
+ /* (Optionally) calculate the point for start/old_end/new_end
+ to be sent to tree-sitter. Also update parser cache for
+ linecol. */
+ TSPoint start_point = TREESIT_TS_POINT_1_0;
+ TSPoint old_end_point = TREESIT_TS_POINT_1_0;
+ TSPoint new_end_point = TREESIT_TS_POINT_1_0;
+ if (parser_tracks_linecol)
+ {
+ eassert (buf_tracks_linecol);
+ struct ts_linecol old_visi_beg_linecol
+ = XTS_PARSER (lisp_parser)->visi_beg_linecol;
+ struct ts_linecol old_visi_end_linecol
+ = XTS_PARSER (lisp_parser)->visi_end_linecol;
+
+ const struct ts_linecol new_visi_beg_linecol
+ = compute_new_linecol_by_change (old_visi_beg_linecol,
+ start_linecol,
+ old_end_linecol,
+ new_end_linecol,
+ new_visible_beg);
+ const struct ts_linecol new_visi_end_linecol
+ = compute_new_linecol_by_change (old_visi_end_linecol,
+ start_linecol,
+ old_end_linecol,
+ new_end_linecol,
+ new_visible_end);
+ XTS_PARSER (lisp_parser)->visi_beg_linecol
+ = new_visi_beg_linecol;
+ XTS_PARSER (lisp_parser)->visi_end_linecol
+ = new_visi_end_linecol;
+
+ /* Now, calculate TSPoints and finally update the tree. */
+ struct ts_linecol new_begv_linecol
+ = XTS_PARSER (lisp_parser)->visi_beg_linecol;
+ old_end_point = treesit_make_ts_point (old_visi_beg_linecol,
+ old_end_linecol);
+ start_point = treesit_make_ts_point (new_begv_linecol,
+ start_linecol);
+ new_end_point = treesit_make_ts_point (new_begv_linecol,
+ new_end_linecol);
+ }
- eassert (XTS_PARSER (lisp_parser)->visible_beg >= 0);
- eassert (XTS_PARSER (lisp_parser)->visible_beg
- <= XTS_PARSER (lisp_parser)->visible_end);
+ treesit_tree_edit_1 (tree, start_offset, old_end_offset,
+ new_end_offset, start_point, old_end_point,
+ new_end_point);
+ XTS_PARSER (lisp_parser)->need_reparse = true;
}
}
}
+/* Return the linecol of POS, calculated from CACHE. But if there's no
+ parser in the current buffer, or line-column tracking is disabled,
+ skip calculation and return an empty linecol instead. */
+struct ts_linecol
+treesit_linecol_maybe (ptrdiff_t pos, ptrdiff_t pos_byte,
+ struct ts_linecol cache)
+{
+ if (NILP (BVAR (current_buffer, ts_parser_list))
+ || !treesit_buf_tracks_linecol_p (current_buffer))
+ return TREESIT_EMPTY_LINECOL;
+
+ return treesit_linecol_of_pos (pos_byte, cache);
+}
+
+/* Update each parser's tree after the user made an edit. This function
+ does not parse the buffer and only updates the tree, so it should be
+ very fast.
+
+ This is a wrapper over treesit_record_change that does a bit more
+ boilerplate work: it (optionally) calculates linecol for new_end,
+ pass all the positions into treesit_record_change_1 which does the
+ real work, and finally (optionally) sets buffer's linecol cache to
+ new_end's linecol.
+
+ If NEW_END is next to NEW_END_BYTE in the arglist, caller might
+ accidentally swap them, so I placed NEW_END at the end of the
+ arglist.
+
+ If the current buffer doesn't track linecol, start_linecol and
+ old_end_linecol will be empty. In that case, don't process
+ linecols. */
+void
+treesit_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
+ ptrdiff_t new_end_byte,
+ struct ts_linecol start_linecol,
+ struct ts_linecol old_end_linecol,
+ ptrdiff_t new_end)
+{
+ struct ts_linecol new_end_linecol
+ = treesit_linecol_maybe (new_end, new_end_byte, start_linecol);
+
+ treesit_record_change_1 (start_byte, old_end_byte, new_end_byte,
+ start_linecol, old_end_linecol, new_end_linecol);
+
+ if (new_end_linecol.bytepos != 0)
+ {
+ const struct ts_linecol new_begv_linecol
+ = compute_new_linecol_by_change (BUF_TS_LINECOL_BEGV (current_buffer),
+ start_linecol,
+ old_end_linecol,
+ new_end_linecol,
+ BEGV_BYTE);
+ const struct ts_linecol new_zv_linecol
+ = compute_new_linecol_by_change (BUF_TS_LINECOL_ZV (current_buffer),
+ start_linecol,
+ old_end_linecol,
+ new_end_linecol,
+ ZV_BYTE);
+
+ SET_BUF_TS_LINECOL_BEGV (current_buffer, new_begv_linecol);
+ SET_BUF_TS_LINECOL_POINT (current_buffer, new_end_linecol);
+ SET_BUF_TS_LINECOL_ZV (current_buffer, new_zv_linecol);
+ }
+}
+
static TSRange *treesit_make_ts_ranges (Lisp_Object, Lisp_Object,
uint32_t *);
{
TSTree *tree = XTS_PARSER (parser)->tree;
struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+ const bool track_linecol = treesit_buf_tracks_linecol_p (buffer);
/* If we are setting visible_beg/end for the first time, we can skip
the offset acrobatics and updating the tree below. */
ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg;
ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end;
+
eassert (0 <= visible_beg);
eassert (visible_beg <= visible_end);
from ________|xxxx|__
to |xxxx|__________ */
+ struct ts_linecol visi_beg_linecol = track_linecol
+ ? XTS_PARSER (parser)->visi_beg_linecol : TREESIT_EMPTY_LINECOL;
+ struct ts_linecol visi_end_linecol = track_linecol
+ ? XTS_PARSER (parser)->visi_end_linecol : TREESIT_EMPTY_LINECOL;
+
+ struct ts_linecol buffer_begv_linecol = track_linecol
+ ? treesit_linecol_of_pos (BUF_BEGV_BYTE (buffer), BUF_TS_LINECOL_BEGV (buffer))
+ : TREESIT_EMPTY_LINECOL;
+ struct ts_linecol buffer_zv_linecol = track_linecol
+ ? treesit_linecol_of_pos (BUF_ZV_BYTE (buffer), BUF_TS_LINECOL_ZV (buffer))
+ : TREESIT_EMPTY_LINECOL;
+
+ if (track_linecol) eassert (visi_beg_linecol.bytepos == visible_beg);
+
/* 1. Make sure visible_beg <= BUF_BEGV_BYTE. */
if (visible_beg > BUF_BEGV_BYTE (buffer))
{
+ TSPoint point_new_end = track_linecol
+ ? treesit_make_ts_point (buffer_begv_linecol, visi_beg_linecol)
+ : TREESIT_TS_POINT_1_0;
/* Tree-sitter sees: insert at the beginning. */
- treesit_tree_edit_1 (tree, 0, 0, visible_beg - BUF_BEGV_BYTE (buffer));
+ treesit_tree_edit_1 (tree, 0, 0, visible_beg - BUF_BEGV_BYTE (buffer),
+ TREESIT_TS_POINT_1_0, TREESIT_TS_POINT_1_0,
+ point_new_end);
visible_beg = BUF_BEGV_BYTE (buffer);
+ visi_beg_linecol = buffer_begv_linecol;
eassert (visible_beg <= visible_end);
}
/* 2. Make sure visible_end = BUF_ZV_BYTE. */
if (visible_end < BUF_ZV_BYTE (buffer))
{
+ TSPoint point_start = track_linecol
+ ? treesit_make_ts_point (visi_beg_linecol, visi_end_linecol)
+ : TREESIT_TS_POINT_1_0;
+ TSPoint point_new_end = track_linecol
+ ? treesit_make_ts_point (visi_beg_linecol, buffer_zv_linecol)
+ : TREESIT_TS_POINT_1_0;
/* Tree-sitter sees: insert at the end. */
treesit_tree_edit_1 (tree, visible_end - visible_beg,
visible_end - visible_beg,
- BUF_ZV_BYTE (buffer) - visible_beg);
+ BUF_ZV_BYTE (buffer) - visible_beg,
+ point_start, point_start, point_new_end);
visible_end = BUF_ZV_BYTE (buffer);
+ visi_end_linecol = buffer_zv_linecol;
eassert (visible_beg <= visible_end);
}
else if (visible_end > BUF_ZV_BYTE (buffer))
{
+ TSPoint point_start = track_linecol
+ ? treesit_make_ts_point (visi_beg_linecol, buffer_zv_linecol)
+ : TREESIT_TS_POINT_1_0;
+ TSPoint point_old_end = track_linecol
+ ? treesit_make_ts_point (visi_beg_linecol, visi_end_linecol)
+ : TREESIT_TS_POINT_1_0;
/* Tree-sitter sees: delete at the end. */
treesit_tree_edit_1 (tree, BUF_ZV_BYTE (buffer) - visible_beg,
visible_end - visible_beg,
- BUF_ZV_BYTE (buffer) - visible_beg);
+ BUF_ZV_BYTE (buffer) - visible_beg,
+ point_start, point_old_end, point_start);
visible_end = BUF_ZV_BYTE (buffer);
+ visi_end_linecol = buffer_zv_linecol;
eassert (visible_beg <= visible_end);
}
/* 3. Make sure visible_beg = BUF_BEGV_BYTE. */
if (visible_beg < BUF_BEGV_BYTE (buffer))
{
+ TSPoint point_old_end = track_linecol
+ ? treesit_make_ts_point (visi_beg_linecol, buffer_begv_linecol)
+ : TREESIT_TS_POINT_1_0;
/* Tree-sitter sees: delete at the beginning. */
- treesit_tree_edit_1 (tree, 0, BUF_BEGV_BYTE (buffer) - visible_beg, 0);
+ treesit_tree_edit_1 (tree, 0, BUF_BEGV_BYTE (buffer) - visible_beg, 0,
+ TREESIT_TS_POINT_1_0, point_old_end,
+ TREESIT_TS_POINT_1_0);
visible_beg = BUF_BEGV_BYTE (buffer);
+ visi_beg_linecol = buffer_begv_linecol;
eassert (visible_beg <= visible_end);
}
eassert (0 <= visible_beg);
XTS_PARSER (parser)->visible_beg = visible_beg;
XTS_PARSER (parser)->visible_end = visible_end;
+ XTS_PARSER (parser)->visi_beg_linecol = visi_beg_linecol;
+ XTS_PARSER (parser)->visi_end_linecol = visi_end_linecol;
+
+ if (track_linecol)
+ {
+ eassert (visi_beg_linecol.bytepos == visible_beg);
+ eassert (visi_end_linecol.bytepos == visible_end);
+ }
/* Fix ranges so that the ranges stays with in visible_end. Here we
try to do minimal work so that the ranges is minimally correct and
Lisp_Object
make_treesit_parser (Lisp_Object buffer, TSParser *parser,
TSTree *tree, Lisp_Object language_symbol,
- Lisp_Object tag)
+ Lisp_Object tag, bool tracks_linecol)
{
struct Lisp_TS_Parser *lisp_parser;
lisp_parser->need_to_gc_buffer = false;
lisp_parser->within_reparse = false;
eassert (lisp_parser->visible_beg <= lisp_parser->visible_end);
+
+ if (tracks_linecol)
+ {
+ struct buffer *old_buf = current_buffer;
+ set_buffer_internal (XBUFFER (buffer));
+
+ /* treesit_linecol_of_pos doesn't signal, so no need to
+ unwind-protect. */
+ lisp_parser->visi_beg_linecol
+ = treesit_linecol_of_pos (BEGV_BYTE, TREESIT_BOB_LINECOL);
+ lisp_parser->visi_end_linecol
+ = treesit_linecol_of_pos (ZV_BYTE, lisp_parser->visi_beg_linecol);
+
+ set_buffer_internal (old_buf);
+ }
+ else
+ {
+ lisp_parser->visi_beg_linecol = TREESIT_EMPTY_LINECOL;
+ lisp_parser->visi_end_linecol = TREESIT_EMPTY_LINECOL;
+ }
+
return make_lisp_ptr (lisp_parser, Lisp_Vectorlike);
}
always succeed. */
ts_parser_set_language (parser, lang);
+ const bool lang_need_linecol_tracking
+ = !NILP (Fmemq (remapped_lang,
+ Vtreesit_languages_require_line_column_tracking));
+
/* Create parser. Use the unmapped LANGUAGE symbol, so the nodes
created by this parser (and the parser itself) identify themselves
as the unmapped language. This makes the grammar mapping
completely transparent. */
Lisp_Object lisp_parser = make_treesit_parser (buf_orig,
parser, NULL,
- language, tag);
+ language, tag,
+ lang_need_linecol_tracking);
+
+ /* Enable line-column tracking if this language requires it. */
+ if (lang_need_linecol_tracking && !treesit_buf_tracks_linecol_p (buf))
+ {
+ /* We can use TREESIT_BOB_LINECOL for begv and zv since these
+ cache doesn't need to be always in sync with BEGV and ZV. */
+ SET_BUF_TS_LINECOL_BEGV (buf, TREESIT_BOB_LINECOL);
+ SET_BUF_TS_LINECOL_POINT (buf, TREESIT_BOB_LINECOL);
+ SET_BUF_TS_LINECOL_ZV (buf, TREESIT_BOB_LINECOL);
+ }
/* Update parser-list. */
BVAR (buf, ts_parser_list) = Fcons (lisp_parser, BVAR (buf, ts_parser_list));
}
}
+DEFUN ("treesit--linecol-at", Ftreesit__linecol_at,
+ Streesit__linecol_at, 1, 1, 0,
+ doc: /* Test buffer-local linecol cache.
+
+Calculate the line and column at POS using the buffer-local cache,
+return the line and column in the form of
+
+ (LINE . COL)
+
+This is used for internal testing and debugging ONLY. */)
+ (Lisp_Object pos)
+{
+ CHECK_NUMBER (pos);
+ struct ts_linecol pos_linecol
+ = treesit_linecol_of_pos (CHAR_TO_BYTE (XFIXNUM (pos)),
+ BUF_TS_LINECOL_POINT (current_buffer));
+ return Fcons (make_fixnum (pos_linecol.line), make_fixnum (pos_linecol.col));
+}
+
+DEFUN ("treesit--linecol-cache-set", Ftreesit__linecol_cache_set,
+ Streesit__linecol_cache_set, 3, 3, 0,
+ doc: /* Set the linecol cache for the current buffer.
+
+This is used for internal testing and debugging ONLY. */)
+ (Lisp_Object line, Lisp_Object col, Lisp_Object bytepos)
+{
+ CHECK_FIXNUM (line);
+ CHECK_FIXNUM (col);
+ CHECK_FIXNUM (bytepos);
+
+ struct ts_linecol linecol;
+ linecol.line = XFIXNUM (line);
+ linecol.col = XFIXNUM (col);
+ linecol.bytepos = XFIXNUM (bytepos);
+
+ SET_BUF_TS_LINECOL_POINT (current_buffer, linecol);
+
+ return Qnil;
+}
+
+DEFUN ("treesit--linecol-cache", Ftreesit__linecol_cache,
+ Streesit__linecol_cache, 0, 0, 0,
+ doc: /* Return the buffer-local linecol cache for debugging.
+
+Return a plist (:line LINE :col COL :pos POS :bytepos BYTEPOS). This is
+used for internal testing and debugging ONLY. */)
+ (void)
+{
+ struct ts_linecol cache = BUF_TS_LINECOL_POINT (current_buffer);
+
+ Lisp_Object plist = (list4 (QCcol, make_fixnum (cache.col),
+ QCbytepos, make_fixnum (cache.bytepos)));
+ plist = Fcons (make_fixnum (cache.line), plist);
+ plist = Fcons (QCline, plist);
+
+ return plist;
+}
+
+
#endif /* HAVE_TREE_SITTER */
DEFUN ("treesit-available-p", Ftreesit_available_p,
DEFSYM (QCequal, ":equal");
DEFSYM (QCmatch, ":match");
DEFSYM (QCpred, ":pred");
+ DEFSYM (QCline, ":line");
+ DEFSYM (QCcol, ":col");
+ DEFSYM (QCpos, ":pos");
+ DEFSYM (QCbytepos, ":bytepos");
+
DEFSYM (Qnot_found, "not-found");
DEFSYM (Qsymbol_error, "symbol-error");
DEFSYM (Qtreesit_language_remap_alist, "treesit-language-remap-alist");
Fmake_variable_buffer_local (Qtreesit_language_remap_alist);
+ DEFVAR_LISP ("treesit-languages-require-line-column-tracking",
+ Vtreesit_languages_require_line_column_tracking,
+ doc:
+ /* A list of languages that need line-column tracking.
+
+Most tree-sitter language grammars don't require line and column
+tracking to work, but some languages do. When creating a parser, if the
+language is in this list, Emacs enables line-column tracking for the
+buffer. */);
+ Vtreesit_languages_require_line_column_tracking = Qnil;
+
staticpro (&Vtreesit_str_libtree_sitter);
Vtreesit_str_libtree_sitter = build_string ("libtree-sitter-");
staticpro (&Vtreesit_str_tree_sitter);
defsubr (&Streesit_language_abi_version);
defsubr (&Streesit_grammar_location);
+ defsubr (&Streesit_parser_tracking_line_column_p);
+ defsubr (&Streesit_tracking_line_column_p);
+
defsubr (&Streesit_parser_p);
defsubr (&Streesit_node_p);
defsubr (&Streesit_compiled_query_p);
defsubr (&Streesit_induce_sparse_tree);
defsubr (&Streesit_node_match_p);
defsubr (&Streesit_subtree_stat);
+
+ defsubr (&Streesit__linecol_at);
+ defsubr (&Streesit__linecol_cache);
+ defsubr (&Streesit__linecol_cache_set);
#endif /* HAVE_TREE_SITTER */
defsubr (&Streesit_available_p);
#ifdef WINDOWSNT