From 2efe8ea7dc0ddc8d08e686e56be4403dfcd35856 Mon Sep 17 00:00:00 2001 From: Eric Abrahamsen Date: Mon, 15 Apr 2024 20:14:50 -0700 Subject: [PATCH] ; Improvements to PEG documentation * doc/lispref/peg.texi: Make more use of defmac/defmacro, and try to clarify the relationships between the various macros and functions. * lisp/progmodes/peg.el (peg-parse): Remove claim that PEXS can also be a single list of rules. (cherry picked from commit 930c578c1042e6372e5433e31b2ea801315c01c9) --- doc/lispref/peg.texi | 128 +++++++++++++++--------------------------- lisp/progmodes/peg.el | 7 ++- 2 files changed, 48 insertions(+), 87 deletions(-) diff --git a/doc/lispref/peg.texi b/doc/lispref/peg.texi index fbf57852ee0..90aa76988db 100644 --- a/doc/lispref/peg.texi +++ b/doc/lispref/peg.texi @@ -1,78 +1,31 @@ -@c -*-texinfo-*- -@c This is part of the GNU Emacs Lisp Reference Manual. -@c Copyright (C) 1990--1995, 1998--1999, 2001--2023 Free Software -@c Foundation, Inc. -@c See the file elisp.texi for copying conditions. -@node Parsing Expression Grammars -@chapter Parsing Expression Grammars -@cindex text parsing -@cindex parsing expression grammar -@cindex PEG - - Emacs Lisp provides several tools for parsing and matching text, -from regular expressions (@pxref{Regular Expressions}) to full -left-to-right (a.k.a.@: @acronym{LL}) grammar parsers (@pxref{Top,, -Bovine parser development,bovine}). @dfn{Parsing Expression Grammars} -(@acronym{PEG}) are another approach to text parsing that offer more -structure and composibility than regular expressions, but less -complexity than context-free grammars. - -A Parsing Expression Grammar (@acronym{PEG}) describes a formal language -in terms of a set of rules for recognizing strings in the language. In -Emacs, a @acronym{PEG} parser is defined as a list of named rules, each -of which matches text patterns and/or contains references to other -rules. Parsing is initiated with the function @code{peg-run} or the -macro @code{peg-parse} (see below), and parses text after point in the -current buffer, using a given set of rules. - -@cindex parsing expression -@cindex root, of parsing expression grammar -@cindex entry-point, of parsing expression grammar -Each rule in a @acronym{PEG} is referred to as a @dfn{parsing -expression} (@acronym{PEX}), and can be specified a a literal string, a -regexp-like character range or set, a peg-specific construct resembling -an Emacs Lisp function call, a reference to another rule, or a -combination of any of these. A grammar is expressed as a tree of rules -in which one rule is typically treated as a ``root'' or ``entry-point'' -rule. For instance: - -@example -@group -((number sign digit (* digit)) - (sign (or "+" "-" "")) - (digit [0-9])) -@end group -@end example - -Once defined, grammars can be used to parse text after point in the -current buffer, in the following ways: - -@defmac peg-parse &rest pexs -Match @var{pexs} at point. If @var{pexs} is a list of PEG rules, the -first rule is considered the ``entry-point'': +struct makes a set of rules available within its +body. The actual parsing is initiated with @code{peg-run}: + +@defun peg-run peg-matcher &optional failure-function success-function +This function accepts a single @var{peg-matcher}, which is the result of +calling @code{peg} (see below) on a named rule, usually the entry-point +of a larger grammar. + +At the end of parsing, one of @var{failure-function} or +@var{success-function} is called, depending on whether the parsing +succeeded or not. If @var{success-function} is called, it is passed a +lambda form that runs all the actions collected on the stack during +parsing -- by default this lambda form is simply executed. If parsing +fails, the @var{failure-function} is called with a list of @acronym{PEG} +expressions that failed during parsing; by default this list is +discarded. +@end defun + +The @var{peg-matcher} passed to @code{peg-run} is produced by a call to +@code{peg}: + +@defmac peg &rest pexs +Convert @var{pexs} into a single peg-matcher suitable for passing to +@code{peg-run}. @end defmac -@example -@group -(peg-parse - ((number sign digit (* digit)) - (sign (or "+" "-" "")) - (digit [0-9]))) -@end group -@end example - -@c FIXME: These two should be formally defined using @defmac and @defun. -@findex with-peg-rules -@findex peg-run -The @code{peg-parse} macro represents the simplest use of the -@acronym{PEG} library, but also the least flexible, as the rules must be -written directly into the source code. A more flexible approach -involves use of three macros in conjunction: @code{with-peg-rules}, a -@code{let}-like construct that makes a set of rules available within the -macro body; @code{peg-run}, which initiates parsing given a single rule; -and @code{peg}, which is used to wrap the entry-point rule name. In -fact, a call to @code{peg-parse} expands to just this set of calls. The -above example could be written as: +The @code{peg-parse} example above expands to just this set of calls, +and could be written as: @example @group @@ -84,14 +37,19 @@ above example could be written as: @end group @end example -This allows more explicit control over the ``entry-point'' of parsing, -and allows the combination of rules from different sources. +This approach allows more explicit control over the ``entry-point'' of +parsing, and allows the combination of rules from different sources. -@c FIXME: Use @defmac. -@findex define-peg-rule Individual rules can also be defined using a more @code{defun}-like syntax, using the macro @code{define-peg-rule}: +@defmac define-peg-rule name args &rest pexs +Define @var{name} as a PEG rule that accepts @var{args} and matches +@var{pexs} at point. +@end defmac + +For instance: + @example @group (define-peg-rule digit () @@ -99,14 +57,16 @@ syntax, using the macro @code{define-peg-rule}: @end group @end example -This also allows for rules that accept an argument (supplied by the -@code{funcall} PEG rule, @pxref{PEX Definitions}). +Arguments can be supplied to rules by the @code{funcall} PEG rule +(@pxref{PEX Definitions}). -@c FIXME: Use @defmac. -@findex define-peg-ruleset Another possibility is to define a named set of rules with @code{define-peg-ruleset}: +@defmac define-peg-ruleset name &rest rules +Define @var{name} as an identifier for @var{rules}. +@end defmac + @example @group (define-peg-ruleset number-grammar @@ -240,10 +200,10 @@ Returns non-@code{nil} if parsing @acronym{PEX} @var{e} from point fails Treats the value of the Lisp expression @var{exp} as a boolean. @end table -@c FIXME: peg-char-classes should be mentioned in the text below. @vindex peg-char-classes -Character class matching can use the same named character classes as -in regular expressions (@pxref{Top,, Character Classes,elisp}) +Character-class matching can refer to the classes named in +@code{peg-char-classes}, equivalent to character classes in regular +expressions (@pxref{Top,, Character Classes,elisp}) @node Parsing Actions @section Parsing Actions diff --git a/lisp/progmodes/peg.el b/lisp/progmodes/peg.el index bb57650d883..938f8da910d 100644 --- a/lisp/progmodes/peg.el +++ b/lisp/progmodes/peg.el @@ -316,13 +316,14 @@ EXPS is a list of rules/expressions that failed.") "Match PEXS at point. PEXS is a sequence of PEG expressions, implicitly combined with `and'. Returns STACK if the match succeed and signals an error on failure, -moving point along the way. -PEXS can also be a list of PEG rules, in which case the first rule is used." +moving point along the way." (if (and (consp (car pexs)) (symbolp (caar pexs)) (not (ignore-errors (not (eq 'call (car (peg-normalize (car pexs)))))))) - ;; `pexs' is a list of rules: use the first rule as entry point. + ;; The first of `pexs' has not been defined as a rule, so assume + ;; that none of them have been and they should be fed to + ;; `with-peg-rules' `(with-peg-rules ,pexs (peg-run (peg ,(caar pexs)) #'peg-signal-failure)) `(peg-run (peg ,@pexs) #'peg-signal-failure))) -- 2.39.5