Add line-column tracking for tree-sitter
Add line-column tracking for tree-sitter parsers. Copied from
comments in treesit.c:
Technically we had to send tree-sitter the line and column
position of each edit. But in practice we just send it dummy
values, because tree-sitter doesn't use it for parsing and
mostly just carries the line and column positions around and
return it when e.g. reporting node positions[1]. This has
been working fine until we encountered grammars that actually
utilizes the line and column information for
parsing (Haskell)[2].
[1] https://github.com/tree-sitter/tree-sitter/issues/445
[2] https://github.com/tree-sitter/tree-sitter/issues/4001
So now we have to keep track of line and column positions and
pass valid values to tree-sitter. (It adds quite some
complexity, but only linearly; one can ignore all the linecol
stuff when trying to understand treesit code and then come
back to it later.) Eli convinced me to disable tracking by
default, and only enable it for languages that needs it. So
the buffer starts out not tracking linecol. And when a
parser is created, if the language is in
treesit-languages-require-line-column-tracking, we enable
tracking in the buffer, and enable tracking for the parser.
To simplify things, once a buffer starts tracking linecol, it
never disables tracking, even if parsers that need tracking
are all deleted; and for parsers, tracking is determined at
creation time, if it starts out tracking/non-tracking, it
stays that way, regardless of later changes to
treesit-languages-require-line-column-tracking.
To make calculating line/column positons fast, we store
linecol caches for begv, point, and zv in the
buffer (buf->ts_linecol_cache_xxx); and in the parser object,
we store linecol cache for visible beg/end of that parser.
In buffer editing functions, we need the linecol for
start/old_end/new_end, those can be calculated by scanning
newlines (treesit_linecol_of_pos) from the buffer point
cache, which should be always near the point. And we usually
set the calculated linecol of new_end back to the buffer
point cache.
We also need to calculate linecol for the visible_beg/end for
each parser, and linecol for the buffer's begv/zv, these
positions are usually far from point, so we have caches for
all of them (in either the parser object or the buffer).
These positions are far from point, so it's inefficient to
scan newlines from point to there to get up-to-date linecol
for them; but in the same time, because they're far and
outside the changed region, we can calculate their change in
line and column number by simply counting how much newlines
are added/removed in the changed
region (compute_new_linecol_by_change).
* doc/lispref/parsing.texi (Using Parser): Mention line-column
tracking in manual.
* etc/NEWS: Add news.
* lisp/treesit.el:
(treesit-languages-need-line-column-tracking): New variable.
* src/buffer.c: Include treesit.h (for TREESIT_EMPTY_LINECOL).
(Fget_buffer_create):
(Fmake_indirect_buffer): Initialize new buffer fields.
(Fbuffer_swap_text): Add new buffer fields.
* src/buffer.h (ts_linecol): New struct.
(buffer): New buffer fields.
(BUF_TS_LINECOL_BEGV):
(BUF_TS_LINECOL_POINT):
(BUF_TS_LINECOL_ZV):
(SET_BUF_TS_LINECOL_BEGV):
(SET_BUF_TS_LINECOL_POINT):
(SET_BUF_TS_LINECOL_ZV): New inline functions.
* src/casefiddle.c (casify_region): Record linecol info.
* src/editfns.c (Fsubst_char_in_region):
(Ftranslate_region_internal):
(Ftranspose_regions): Record linecol info.
* src/insdel.c (insert_1_both):
(insert_from_string_1):
(insert_from_gap_1):
(insert_from_buffer):
(replace_range):
(del_range_2): Record linecol info.
* src/treesit.c (TREESIT_BOB_LINECOL):
(TREESIT_EMPTY_LINECOL):
(TREESIT_TS_POINT_1_0): New constants.
(treesit_debug_print_linecol):
(treesit_buf_tracks_linecol_p):
(restore_restriction_and_selective_display):
(treesit_count_lines):
(treesit_debug_validate_linecol):
(treesit_linecol_of_pos):
(treesit_make_ts_point):
(Ftreesit_tracking_line_column_p):
(Ftreesit_parser_tracking_line_column_p): New functions.
(treesit_tree_edit_1): Accept real TSPoint and pass to
tree-sitter.
(compute_new_linecol_by_change): New function.
(treesit_record_change_1): Rename from treesit_record_change,
handle linecol if tracking is enabled.
(treesit_linecol_maybe): New function.
(treesit_record_change): New wrapper around
treesit_record_change_1 that handles some boilerplate and sets
buffer state.
(treesit_sync_visible_region): Handle linecol if tracking is
enabled.
(make_treesit_parser): Setup parser's linecol cache if tracking
is enabled.
(Ftreesit_parser_create): Enable tracking if the parser's
language requires it.
(Ftreesit__linecol_at):
(Ftreesit__linecol_cache_set):
(Ftreesit__linecol_cache): New functions for debugging and
testing.
(syms_of_treesit): New variable
Vtreesit_languages_require_line_column_tracking.
* src/treesit.h (Lisp_TS_Parser): New fields.
(TREESIT_BOB_LINECOL):
(TREESIT_EMPTY_LINECOL): New constants.
* test/src/treesit-tests.el (treesit-linecol-basic):
(treesit-linecol-search-back-across-newline):
(treesit-linecol-col-same-line):
(treesit-linecol-enable-disable): New tests.
* src/lisp.h: Declare display_count_lines.
* src/xdisp.c (display_count_lines): Remove static keyword.
(cherry picked from commit
1897da0b599cc3ea1e4aa626e47ac8943a7b6833)