[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Overloading and extending operators, (l)PEGs and grammars
- From: Sam Roberts <sroberts@...>
- Date: Wed, 11 Apr 2007 10:26:27 -0700
I can appreciate how difficult it is to make a small language with flexible
operator extensions. Its too bad, things like lpeg could benefit from it.
The PEG operators (*, +, /, etc.) are easy and mnemonic:
’’ Literal string
"" Literal string
[] Character class
. Any character
(e) Grouping
e? Optional
e* Zero-or-more
e+ One-or-more
e1 e2 Sequence
e1/e2 Prioritized Choice
&e And-predicate
!e Not-predicate
Anybody who has used a regex, or one of dozens of EBNF variants, can
remember this easily.
With lpeg, we have:
Operator Description
lpeg.P(string) Matches string literally
lpeg.P(number) Matches exactly number characters
lpeg.S(string) Matches any character in string (set)
lpeg.R("xy") Matches any character between x and y (range)
patt^n Matches at least n repetitions of patt
patt^-n Matches at most n repetitions of patt
patt1 * patt2 Matches patt1 followed by patt2
patt1 + patt2 Matches patt1 or patt2 (ordered choice)
patt1 - patt2 Matches patt1 if patt2 does not match
-patt Equivalent to "" - patt
patt1 / ... Used to capture matches? Why not have the same meaning as PEG?
There isn't any commonality here, I find it quite anti-mnemonic (all the
operators are used for different purposes than in the original PEG grammars). I
can't read a PEG without the table above taped to my monitor.
With boost::sprit (which looks pretty similar to PEGs, though their
might be theoretic differences in capability), you can use C++'s much
more flexible operator overloading to get:
Unary:
!P Matches P or an empty string
*P Matches P zero or more times
+P Matches P one or more times
~P Matches anything that does not match P
Binary:
P1 | P2 Matches P1 or P2
P1 - P2 Matches P1 but not P2
P1 >> P2 Matches P1 followed by P2
P1 % P2 Matches one or more P1 separated by P2
P1 & P2 Matches both P1 and P2
P1 ^ P2 Matches P1 or P2, but not both
P1 && P2 Synonym for P1 >> P2
P1 || P2 Matches P1 | P2 | P1 >> P2
It starts off well, the unary operators are pretty familiar, as is |.
After that it gets successively worse as various boolean and mathematical
operators are stolen for things with no particular relation to their common
usage.
I have mixed feelings about (ab)using operator overloading to support inline
expression of grammars. I can see the appeal to somehow add grammars as
elements of the language, rather than strings, but strings aren't so hard to
use, and are fairly flexible. I wonder if it wouldn't be better to use lpeg to
write something like:
equalcount = lpeg.grammar[[
S = "0" B
/ "1" A
/ ""
A = "0" S
/ "1" A A
B = "1" S
/ "0" B B
]]
instead of:
local S, A, B = 1, 2, 3
equalcount = lpeg.P{
[S] = "0" * lpeg.V(B) + "1" * lpeg.V(A) + "",
[A] = "0" * lpeg.V(S) + "1" * lpeg.V(A) * lpeg.V(A),
[B] = "1" * lpeg.V(S) + "0" * lpeg.V(B) * lpeg.V(B),
} * -1
or
pascalcomment = lpeg.grammar[[
C = "(*" N* "*)"
N = C
/ !"(*" .
]]
instead of
...
Cheers,
Sam