Captures and locales are not yet implemented, but the rest works quite well.
I wrote it because I was curious of the kind of performance you could get out of a pure Lua implementation. It relies heavily on memoization, and does some algebraic simplifications. As a consequence:
P"a" * "b" == P"ab" -- (the same table/proxy, actually)
S"ab" + S"cd" == S"abcd" -- ditto.
* sequences and + alternatives are also flattened. Much more could be done, but I don't think that a full factorization would be beneficial for the current back end. Patterns are actually compiled to anonymous functions, and going too far would add more levels of recursion.
On the other hand, the frontend could be used as an optimising preprocessor for LPeg itself, and in that case, a more agressive factorisation may come in handy. Things like
P"prefix" + "prefax" --> P"pref" * (P"i" + P"a") * "x"
or even
P"prefix" + "prefax" --> P"pref" * (S"ai") * "x"
UTF-8 is supported out of the box:
plpeg.set_charset"UTF-8"
s = plpeg.S"ß∂ƒ©˙"
s:match"©" --> 3 (since © is two bytes wide).
The patterns are validated at creation time. The input is expected to be valid UTF-8.
A validating matcher could be added, at the expense of performance.
More encodings can be easily added (see the charset section), by adding a
few appropriate functions.
I hope that Roberto is ok with the name. Otherwise, I'll change it, of course.
This should work as expected.
P(function) -- works as expected, but captures are thrown out.