[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPeg - new version
- From: Philippe Lhoste <PhiLho@...>
- Date: Fri, 23 Mar 2007 12:23:58 +0100
Roberto Ierusalimschy wrote:
I have released a new version of LPeg (0.5). The main changes are
several optimizations, which should make LPeg much faster for several
common tasks.
(On the other hand, these optimizations make patterns
less regular, and therefore more difficult to test...)
Mike Pall wrote:
I would have picked the low-hanging fruit first:
- Remove the s<e test from IChar for non-NUL chars
and add ICharZ which checks for s<e and NUL.
- Merge 2 or 4 successive IChars to IChar2 and IChar4.
- Let IAny check for more than one char.
Interesting. I haven't seen the code yet, but it seems to go in the way
I chose for my own library... Which is of course, the way shown by
Roberto, that I tried to push further.
Let me explain: I am working for some time now on specifications (no
code yet!) of a Lua-independent Peg library in C.
I chose to stick to Bryan Ford's more classical syntax, using a textual
format, but changing some rules and adding some operators (just
syntactic sugar, language is still regular).
Advantage: a more familiar look, not restricted by Lua's set of
overridden operators and precedence rules. Inconveniences: less
flexibility, and I can't rely on Lua code to parse (I will use the VM
for that, of course) or to store captures.
The purpose is to allow embedding in other languages (Lua might be a
first target!) or in other programs (text editor, search/replace utility...)
Of course, I had a close look at Roberto's engine, which was very
educative, both on ideas of implementation and on C optimization.
Dumping simple expressions shown the purpose of the opcodes, and finally
made me understand what were the semi-cryptic notations on top of the
code (implementation of operators in opcodes).
So I went ahead and created my own opcodes, reusing most of Roberto's
ideas and creating new codes to make some common expressions optimal. I
understand that's what Roberto did in the new version.
I have current around 25 opcodes, and I was even shy to create some
others: my OP_CHARS can handle one or two chars, at the cost of
comparison of a flag to 0. It shouldn't be too costly as single (or
pairs of) chars aren't that common in pure repetitive parts of Pegs (ie.
we rarely write 'a'+ in real expressions) and other repetitive
operations like (!'c' .)* to reach a char in a non-greedy way has its
own opcode which iterates independently of the VM (ie. it has its own
internal loop, it doesn't loop on opcodes).
That's why I see Mike's comments with interest...
On the other hand, I couldn't resist and made opcodes to loop on
patterns, implementing the repetition functionality I wished (I don't
have a language to build Pegs anyway): I know it would be faster to
internally repeat the patterns, but I chose a more compact code, at an
assumed little performance cost.
Anyway, I am finishing the specification (which is also user's manual,
design document, etc.) and I will start coding soon. I still have to do
more work on capture design...
It is a bit early, but if anybody express an interest, I can make the
document (some 1300 lines of pure text...) available, Having early
remarks might help... :-)
Oh, well, it is there:
http://www.autohotkey.net/~PhiLho/Docs/PegTop.txt
Any comment should be private, I think, unless it is on topic with Lua.
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --