[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: OR, quantifier support in Lua patterns
- From: nobody <nobody+lua-list@...>
- Date: Tue, 9 Oct 2018 00:06:49 +0200
On 2018-09-27 06:39, Sai Manoj Kumar Yadlapati wrote:
Hi all,
Lua supports its own version of regular expression matching. But it
doesn't have the | (pipe symbol) support and the quantifier support -
a{1,5} meaning a can occur anywhere from 1 to 5 times.
Both of these are present in PCRE. I am curious to know why these are
not supported. Is it not supported intentionally or was it never
considered?
There's been plenty said on this already, but one thing is missing: In
Lua, the substitution doesn't have to be a string – it can be a table or
even a function. (I *think* that's not possible with PCRE – never used
it, just looked at the manpages.) What this means is that your pattern
only needs to be an approximate pre-filter and doesn't have to match
_exactly_ what you want. Some examples…
Matching a bunch of fixed words? (a common use of | in REs), e. g.
/(TODO|NYI|BUG)/FIXME/ – do a fuzzy match (say, "%u%u%u+" a.k.a.
("%u"):rep(3).."+" or even just "%u+"), do the details with the table.
str:gsub( "%u+", { TODO = "FIXME", NYI = "FIXME", BUG = "FIXME" } )
(nil means leave as-is, only if there's a value the match actually gets
substituted. So any other matches don't matter.)
A table isn't enough? Use a function. It can do arbitrary filtering
and decide not to do anything (return nil), it can recursively match on
the match, ... so just do the same approximation trick. The a{1,5}
example can probably be done by matching "a+" and then checking the
match length in the function (it could even split the match internally
and then treat it as multiple consecutive matches… but that might get
too complicated, so…)
There's also LPeg, which does whole grammars, can produce arbitrary
structured data, happily does the same "run matches through a function"
trick, etc. ...and if I counted correctly, it's still ~12% the size of
PCRE. (Lol.)
-- nobody