From: Yuan Fu Date: Thu, 29 Jun 2023 00:05:29 +0000 (-0700) Subject: ; * admin/notes/tree-sitter/treesit_record_change: Update. X-Git-Tag: emacs-29.1-rc1~91^2 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=1d2ba6b363b;p=emacs.git ; * admin/notes/tree-sitter/treesit_record_change: Update. --- diff --git a/admin/notes/tree-sitter/treesit_record_change b/admin/notes/tree-sitter/treesit_record_change index 0dc6491e2d1..e80df4adfa7 100644 --- a/admin/notes/tree-sitter/treesit_record_change +++ b/admin/notes/tree-sitter/treesit_record_change @@ -3,10 +3,10 @@ NOTES ON TREESIT_RECORD_CHANGE It is vital that Emacs informs tree-sitter of every change made to the buffer, lest tree-sitter's parse tree would be corrupted/out of sync. -All buffer changes in Emacs are made through functions in insdel.c -(and casefiddle.c), I augmented functions in those files with calls to -treesit_record_change. Below is a manifest of all the relevant -functions in insdel.c as of Emacs 29: +Almost all buffer changes in Emacs are made through functions in +insdel.c (see below for exceptions), I augmented functions in insdel.c +with calls to treesit_record_change. Below is a manifest of all the +relevant functions in insdel.c as of Emacs 29: Function Calls ---------------------------------------------------------------------- @@ -43,8 +43,176 @@ insert_from_buffer but not insert_from_buffer_1. I also left a reminder comment. -As for casefiddle.c, do_casify_unibyte_region and +EXCEPTIONS + + +There are a couple of functions that replaces characters in-place +rather than insert/delete. They are in casefiddle.c and editfns.c. + +In casefiddle.c, do_casify_unibyte_region and do_casify_multibyte_region modifies buffer, but they are static functions and are called by casify_region, which calls treesit_record_change. Other higher-level functions calls -casify_region to do the work. \ No newline at end of file +casify_region to do the work. + +In editfns.c, subst-char-in-region and translate-region-internal might +replace characters in-place, I made them to call +treesit_record_change. transpose-regions uses memcpy to move text +around, it calls treesit_record_change too. + +I found these exceptions by grepping for signal_after_change and +checking each caller manually. Below is all the result as of Emacs 29 +and some comment for each one. Readers can use + +(highlight-regexp "^[^[:space:]]+?\\.c:[[:digit:]]+:[^z-a]+?$" 'highlight) + +to make things easier to read. + +grep [...] --color=auto -i --directories=skip -nH --null -e signal_after_change *.c + +callproc.c:789: calling prepare_to_modify_buffer and signal_after_change. +callproc.c:793: is one call to signal_after_change in each of the +callproc.c:800: signal_after_change hasn't. A continue statement +callproc.c:804: again, and this time signal_after_change gets called, + +Not code. + +callproc.c:820: signal_after_change (PT - nread, 0, nread); +callproc.c:863: signal_after_change (PT - process_coding.produced_char, + +Both are called in call-process. I don’t think we’ll ever use +tree-sitter in call-process’s stdio buffer, right? I didn’t check +line-by-line, but it seems to only use insert_1_both and del_range_2. + +casefiddle.c:558: signal_after_change (start, end - start - added, end - start); + +Called in casify-region, calls treesit_record_change. + +decompress.c:195: signal_after_change (data->orig, data->start - data->orig, + +Called in unwind_decompress, uses del_range_2, insdel function. + +decompress.c:334: signal_after_change (istart, iend - istart, unwind_data.nbytes); + +Called in zlib-decompress-region, uses del_range_2, insdel function. + +editfns.c:2139: signal_after_change (BEGV, size_a, ZV - BEGV); + +Called in replace-buffer-contents, which calls del_range and +Finsert_buffer_substring, both are ok. + +editfns.c:2416: signal_after_change (changed, + +Called in subst-char-in-region, which either calls replace_range (a +insdel function) or modifies buffer content by itself (need to call +treesit_record_change). + +editfns.c:2544: /* Reload as signal_after_change in last iteration may GC. */ + +Not code. + +editfns.c:2604: signal_after_change (pos, 1, 1); + +Called in translate-region-internal, which has three cases: + +if (nc != oc && nc >= 0) { + if (len != str_len) { + replace_range() + } else { + while (str_len-- > 0) + *p++ = *str++; + } +} +else if (nc < 0) { + replace_range() +} + +replace_range is ok, but in the case where it manually modifies buffer +content, it needs to call treesit_record_change. + +editfns.c:4779: signal_after_change (start1, end2 - start1, end2 - start1); + +Called in transpose-regions. It just uses memcpy’s and doesn’t use +insdel functions; needs to call treesit_record_change. + +fileio.c:4825: signal_after_change (PT, 0, inserted); + +Called in insert_file_contents. Uses insert_1_both (very first in the +function); del_range_1 and del_range_byte (the optimized way to +implement replace when decoding isn’t needed); del_range_byte and +insert_from_buffer (the optimized way used when decoding is needed); +decode_coding_gap or insert_from_gap_1 (I’m not sure the condition for +this, but anyway it’s safe). The function also calls memcpy and +memmove, but they are irrelevant: memcpy is used for decoding, and +memmove is moving stuff inside the gap for decode_coding_gap. + +I’d love someone to verify this function, since it’s so complicated +and large, but from what I can tell it’s safe. + +fns.c:3998: signal_after_change (XFIXNAT (beg), 0, inserted_chars); + +Called in base64-decode-region, uses insert_1_both and del_range_both, +safe. + +insdel.c:681: signal_after_change (opoint, 0, len); +insdel.c:696: signal_after_change (opoint, 0, len); +insdel.c:741: signal_after_change (opoint, 0, len); +insdel.c:757: signal_after_change (opoint, 0, len); +insdel.c:976: signal_after_change (opoint, 0, PT - opoint); +insdel.c:996: signal_after_change (opoint, 0, PT - opoint); +insdel.c:1187: signal_after_change (opoint, 0, PT - opoint); +insdel.c:1412: signal_after_change. */ +insdel.c:1585: signal_after_change (from, nchars_del, GPT - from); +insdel.c:1600: prepare_to_modify_buffer and never call signal_after_change. +insdel.c:1603: region once. Apart from signal_after_change, any caller of this +insdel.c:1747: signal_after_change (from, to - from, 0); +insdel.c:1789: signal_after_change (from, to - from, 0); +insdel.c:1833: signal_after_change (from, to - from, 0); +insdel.c:2223:signal_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins) +insdel.c:2396: signal_after_change (begpos, endpos - begpos - change, endpos - begpos); + +I’ve checked all insdel functions. We can assume insdel functions are +all safe. + +json.c:790: signal_after_change (PT, 0, inserted); + +Called in json-insert, calls either decode_coding_gap or +insert_from_gap_1, both are safe. Calls memmove but it’s for +decode_coding_gap. + +keymap.c:2873: /* Insert calls signal_after_change which may GC. */ + +Not code. + +print.c:219: signal_after_change (PT - print_buffer.pos, 0, print_buffer.pos); + +Called in print_finish, calls copy_text and insert_1_both, safe. + +process.c:6365: process buffer is changed in the signal_after_change above. +search.c:2763: (see signal_before_change and signal_after_change). Try to error + +Not code. + +search.c:2777: signal_after_change (sub_start, sub_end - sub_start, SCHARS (newtext)); + +Called in replace_match. Calls replace_range, upcase-region, +upcase-initials-region (both calls casify_region in the end), safe. +Calls memcpy but it’s for string manipulation. + +textprop.c:1261: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start), +textprop.c:1272: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start), +textprop.c:1283: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start), +textprop.c:1458: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start), +textprop.c:1652: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start), +textprop.c:1661: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start), +textprop.c:1672: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start), +textprop.c:1750: before changes are made and signal_after_change when we are done. +textprop.c:1752: and call signal_after_change before returning if MODIFIED. */ +textprop.c:1764: signal_after_change (XFIXNUM (start), +textprop.c:1778: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start), +textprop.c:1791: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start), +textprop.c:1810: signal_after_change (XFIXNUM (start), + +We don’t care about text property changes. + +Grep finished with 51 matches found at Wed Jun 28 15:12:23