[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Argh! Regular expressions in Lua
- From: Philippe Lhoste <PhiLho@...>
- Date: Wed, 22 Nov 2006 13:33:53 +0100
For a long time, I didn't used regular expressions (REs), except perhaps
in simple cases with Unix tools.
Then, I (re-)discovered them in Lua, and some simple examples shown me
they are useful and not so hard to use.
Then SciTE, my favorite editor, added them and I found myself using them
more and more.
And I started using them in programming, with PHP, JavaScript, Java, and
recently AutoHotkey (which added PCRE support).
I even looked at various engines (SciTE's one, simple and easy to
follow, Henry Spencer's ones (more complicated, less readable), Gnu's
one (Java, bloated!), PCRE's one (very complex), etc.) and I hacked a
bit SciTE's engine to add support for \d \s \w \xHH notations (to be
submitted).
I know that Lua's RE engine is hand-made by Roberto (IIRC) and intently
kept simple and small. That's why it has no alternatives (foo|bar) which
is annoying but we can live without that, nor advanced bells and
whistles like lookaround assertions.
I recently wrote in AutoHotkey a program to parse a script in this
language, to extract a list of functions definitions.
It is a simple automaton, using simple REs to match expected syntax in
each line. So I thought it would be easy to rewrite it in Lua, so I can
use it in SciTE.
Alas, it did not worked, as I forgot an important limitation: repetition
symbols apply only to character classes, not to sub-patterns!
So to take a classical stupid example, I can't write (ab)* to match
ababab...
And, more practically, I can't write something like: %s*(%s;.*)?$
Nor: (%)%s*(%{)?)?
The (%})? can be rewritten as (}?) of course, but still I can't write
the complete expression.
I can, and will, workaround this, searching after the last match and so
on. Yet, this is frustrating, adding complexity to the script.
So, the point of my message is: is there a compelling reason for such
limitation?
I can understand reasons like lack of time for making better engine, or
such feature would have grown the engine too much.
And the second point is: would the Lua team accept a patch in this
domain, for the future v.5.2?
If that's 'no', I won't even try. Instead of hacking, I would write a
wrapper for PCRE, for example. But it wouldn't be usable in SciTE, for
example.
If that's 'maybe', I can take a look, no promise done, but I believe I
have some time before 5.2 is out anyway...
I can accept that my patch is rejected if badly written, buggy or too
big, but I won't spend time there if rejected by principle.
Well, if patches are accepted, I might attempt as well to implement
repetition ranges, like {m,n}.
Alternatives are out of question, as it would need to rewrite completely
the engine...
BTW, I found myself wishing to have a continue keyword to use in the
parsing loop, avoiding excessive 'if' nesting and indenting.
I recall having read many times debates on its usefulness, and still
can't recall an official reason why it isn't in the language. I am too
lazy (read 'no time for that') to search the mailing list archive.
Perhaps somebody will be courageous / bored enough to dig out the
reasons / arguments and put them in the Faq on the Wiki.
Would be useful too for the global-by-default / local-by-default
discussions, and perhaps the ++ -- += -= bitwise-operators... :-)
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --