From: Jacob S. Gordon Date: Mon, 19 May 2025 19:05:37 +0000 (-0400) Subject: calc: Allow strings with character codes above Latin-1 X-Git-Url: http://git.eshelyaron.com/gitweb/?a=commitdiff_plain;h=377ddee6b102f5bf4124710b9d5f33fa3f8a04ce;p=emacs.git calc: Allow strings with character codes above Latin-1 The current behavior of the functions 'calc-display-strings', 'strings', and 'bstrings' is to skip any vector containing integers outside the Latin-1 range (0x00-0xFF). We introduce a custom variable 'calc-string-maximum-character' to replace this hard-coded maximum, and to allow vectors containing higher character codes to be displayed as strings. The default value of 0xFF preserves the existing behavior. * lisp/calc/calc.el (calc-string-maximum-character): Add custom variable 'calc-string-maximum-character'. * lisp/calc/calccomp.el (math-vector-is-string): Replace hard-coded maximum with 'calc-string-maximum-character', and the 'natnump' assertion with 'characterp'. The latter guards against the maximum being larger than '(max-char)', but not on invalid types of the maximum such as strings. * test/lisp/calc/calc-tests.el (calc-math-vector-is-string): Add tests for 'math-vector-is-string' using different values of 'calc-string-maximum-character'. * doc/misc/calc.texi (Quick Calculator, Strings, Customizing Calc): Add variable definition for 'calc-string-maximum-character' and reference thereof when discussing 'calc-display-strings'. Generalize a comment about string display and availability of 8-bit fonts. (Bug#78528) (cherry picked from commit 5bd9fa084dcf0ce8efaaf9212c24addec48d824f) --- diff --git a/doc/misc/calc.texi b/doc/misc/calc.texi index 61466b55201..eda442ecb38 100644 --- a/doc/misc/calc.texi +++ b/doc/misc/calc.texi @@ -10179,7 +10179,7 @@ result @samp{[120]} (because 120 is the ASCII code of the lower-case is displayed only according to the current mode settings. But running Quick Calc again and entering @samp{120} will produce the result @samp{120 (16#78, 8#170, x)} which shows the number in its -decimal, hexadecimal, octal, and ASCII forms. +decimal, hexadecimal, octal, and character forms. Please note that the Quick Calculator is not any faster at loading or computing the answer than the full Calculator; the name ``quick'' @@ -10836,11 +10836,11 @@ from 1 to @samp{n}. @cindex Strings @cindex Character strings Character strings are not a special data type in the Calculator. -Rather, a string is represented simply as a vector all of whose -elements are integers in the range 0 to 255 (ASCII codes). You can -enter a string at any time by pressing the @kbd{"} key. Quotation -marks and backslashes are written @samp{\"} and @samp{\\}, respectively, -inside strings. Other notations introduced by backslashes are: +Rather, a string is represented simply as a vector all of whose elements +are integers in the Latin-1 range 0 to 255. You can enter a string at +any time by pressing the @kbd{"} key. Quotation marks and backslashes +are written @samp{\"} and @samp{\\}, respectively, inside strings. +Other notations introduced by backslashes are: @example @group @@ -10857,21 +10857,24 @@ inside strings. Other notations introduced by backslashes are: @noindent Finally, a backslash followed by three octal digits produces any -character from its ASCII code. +character from its code. @kindex d " @pindex calc-display-strings Strings are normally displayed in vector-of-integers form. The @w{@kbd{d "}} (@code{calc-display-strings}) command toggles a mode in which any vectors of small integers are displayed as quoted strings -instead. +instead. The display of strings containing higher character codes can +be enabled by increasing the custom variable +@code{calc-string-maximum-character} (@pxref{Customizing Calc}). The backslash notations shown above are also used for displaying -strings. Characters 128 and above are not translated by Calc; unless -you have an Emacs modified for 8-bit fonts, these will show up in -backslash-octal-digits notation. For characters below 32, and -for character 127, Calc uses the backslash-letter combination if -there is one, or otherwise uses a @samp{\^} sequence. +strings. For ASCII control characters (below 32), and for the +@code{DEL} character (127), Calc uses the backslash-letter combination +if there is one, or otherwise uses a @samp{\^} sequence. Control +characters above 127 are not translated by Calc, and will show up in +backslash-octal-digits notation. The display of higher character codes +will depend on your display settings and system font coverage. The only Calc feature that uses strings is @dfn{compositions}; @pxref{Compositions}. Strings also provide a convenient @@ -35684,6 +35687,33 @@ choose from, or the user can enter their own date. The default value of @code{calc-gregorian-switch} is @code{nil}. @end defvar +@defvar calc-string-maximum-character +@xref{Strings}.@* + +The variable @code{calc-string-maximum-character} is the maximum value +of a vector's elements for @code{calc-display-strings}, @code{string}, +and @code{bstring} to display the vector as a string. This maximum +@emph{must} represent a character, i.e. it's a non-negative integer less +than or equal to @code{(max-char)} or @code{0x3FFFFF}. Any negative +value effectively disables the display of strings, and for values larger +than @code{0x3FFFFF} the display acts as if the maximum were +@code{0x3FFFFF}. Some natural choices (and their resulting ranges) are: + +@itemize +@item +@code{0x7F} or 127 (ASCII), +@item +@code{0xFF} or 255 (Latin-1, the default), +@item +@code{0x10FFFF} (Unicode), +@item +@code{0x3FFFFF} (Emacs). +@end itemize + +The default value of @code{calc-string-maximum-character} is @code{0xFF} +or 255. +@end defvar + @node Reporting Bugs @appendix Reporting Bugs diff --git a/lisp/calc/calc.el b/lisp/calc/calc.el index c1aee896851..3eec89320f4 100644 --- a/lisp/calc/calc.el +++ b/lisp/calc/calc.el @@ -628,6 +628,37 @@ Otherwise, 1 / 0 is changed to uinf (undirected infinity).") (defcalcmodevar calc-display-strings nil "If non-nil, display vectors of byte-sized integers as strings.") +(defcustom calc-string-maximum-character #xFF + "Maximum value of vector contents to be displayed as a string. + +If a vector consists of characters up to this maximum value, the +function `calc-display-strings' will toggle displaying the vector as a +string. This maximum value must represent a character (see `characterp'). +Some natural choices (and their resulting ranges) are: + +- `0x7F' (`ASCII'), +- `0xFF' (`Latin-1', the default), +- `0x10FFFF' (`Unicode'), +- `0x3FFFFF' (`Emacs'). + +Characters for low control codes are either caret or backslash escaped, +while others without a glyph are displayed in backslash-octal notation. +The display of strings containing higher character codes will depend on +your display settings and system font coverage. + +See the following for further information: + +- info node `(calc)Strings', +- info node `(elisp)Text Representations', +- info node `(emacs)Text Display'." + :version "31.1" + :type '(choice (restricted-sexp :tag "Character Code" + :match-alternatives (characterp)) + (const :tag "ASCII" #x7F) + (const :tag "Latin-1" #xFF) + (const :tag "Unicode" #x10FFFF) + (const :tag "Emacs" #x3FFFFF))) + (defcalcmodevar calc-matrix-just 'center "If nil, vector elements are left-justified. If `right', vector elements are right-justified. diff --git a/lisp/calc/calccomp.el b/lisp/calc/calccomp.el index faa9516682a..11bcf73323b 100644 --- a/lisp/calc/calccomp.el +++ b/lisp/calc/calccomp.el @@ -907,13 +907,20 @@ (concat " " math-comp-right-bracket))))) (defun math-vector-is-string (a) + "Return t if A can be displayed as a string, and nil otherwise. + +Elements of A must either be a character (see `characterp') or a complex +number with only a real character part, each with a value less than or +equal to the custom variable `calc-string-maximum-character'." (while (and (setq a (cdr a)) - (or (and (natnump (car a)) - (<= (car a) 255)) + (or (and (characterp (car a)) + (<= (car a) + calc-string-maximum-character)) (and (eq (car-safe (car a)) 'cplx) - (natnump (nth 1 (car a))) + (characterp (nth 1 (car a))) (eq (nth 2 (car a)) 0) - (<= (nth 1 (car a)) 255))))) + (<= (nth 1 (car a)) + calc-string-maximum-character))))) (null a)) (defconst math-vector-to-string-chars '( ( ?\" . "\\\"" ) diff --git a/test/lisp/calc/calc-tests.el b/test/lisp/calc/calc-tests.el index 42eb6077b04..2fd6a6be45e 100644 --- a/test/lisp/calc/calc-tests.el +++ b/test/lisp/calc/calc-tests.el @@ -879,5 +879,72 @@ An existing calc stack is reused, otherwise a new one is created." (should-error (math-read-preprocess-string nil)) (should-error (math-read-preprocess-string 42))) +(ert-deftest calc-math-vector-is-string () + "Test `math-vector-is-string' with varying `calc-string-maximum-character'. + +All tests operate on both an integer vector and the corresponding +complex vector. The sets covered are: + +1. `calc-string-maximum-character' is a valid character. The last case +with `0x3FFFFF' is borderline, as integers above it will not make it +past the `characterp' test. +2. `calc-string-maximum-character' is negative, so the test always fails. +3. `calc-string-maximum-character' is above `(max-char)', so only the +first `characterp' test is active. +4. `calc-string-maximum-character' has an invalid type, which triggers +an error in the comparison." + (cl-flet* ((make-vec (lambda (contents) (append (list 'vec) contents))) + (make-cplx (lambda (x) (list 'cplx x 0))) + (make-cplx-vec (lambda (contents) + (make-vec (mapcar #'make-cplx contents))))) + ;; 1: calc-string-maximum-character is a valid character + (dolist (maxchar '(#x7F #xFF #x10FFFF #x3FFFFD #x3FFFFF)) + (let* ((calc-string-maximum-character maxchar) + (small-chars (number-sequence (- maxchar 2) maxchar)) + (large-chars (number-sequence maxchar (+ maxchar 2))) + (small-real-vec (make-vec small-chars)) + (large-real-vec (make-vec large-chars)) + (small-cplx-vec (make-cplx-vec small-chars)) + (large-cplx-vec (make-cplx-vec large-chars))) + (should (math-vector-is-string small-real-vec)) + (should-not (math-vector-is-string large-real-vec)) + (should (math-vector-is-string small-cplx-vec)) + (should-not (math-vector-is-string large-cplx-vec)))) + ;; 2: calc-string-maximum-character is negative + (let* ((maxchar -1) + (calc-string-maximum-character maxchar) + (valid-contents (number-sequence 0 2)) + (invalid-contents (number-sequence (- maxchar 2) maxchar)) + (valid-real-vec (make-vec valid-contents)) + (invalid-real-vec (make-vec invalid-contents)) + (valid-cplx-vec (make-cplx-vec valid-contents)) + (invalid-cplx-vec (make-cplx-vec invalid-contents))) + (should-not (math-vector-is-string valid-real-vec)) + (should-not (math-vector-is-string invalid-real-vec)) + (should-not (math-vector-is-string valid-cplx-vec)) + (should-not (math-vector-is-string invalid-cplx-vec))) + ;; 3: calc-string-maximum-character is larger than (max-char) + (let* ((maxchar (+ (max-char) 3)) + (calc-string-maximum-character maxchar) + (valid-chars (number-sequence (- (max-char) 2) (max-char))) + (invalid-chars (number-sequence (1+ (max-char)) maxchar)) + (valid-real-vec (make-vec valid-chars)) + (invalid-real-vec (make-vec invalid-chars)) + (valid-cplx-vec (make-cplx-vec valid-chars)) + (invalid-cplx-vec (make-cplx-vec invalid-chars))) + (should (math-vector-is-string valid-real-vec)) + (should-not (math-vector-is-string invalid-real-vec)) + (should (math-vector-is-string valid-cplx-vec)) + (should-not (math-vector-is-string invalid-cplx-vec))) + ;; 4: calc-string-maximum-character has the wrong type + (let* ((calc-string-maximum-character "wrong type") + (contents (number-sequence 0 2)) + (real-vec (make-vec contents)) + (cplx-vec (make-cplx-vec contents))) + (should-error (math-vector-is-string real-vec) + :type 'wrong-type-argument) + (should-error (math-vector-is-string cplx-vec) + :type 'wrong-type-argument)))) + (provide 'calc-tests) ;;; calc-tests.el ends here