[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: OR, quantifier support in Lua patterns
- From: Roberto Ierusalimschy <roberto@...>
- Date: Wed, 10 Oct 2018 16:13:15 -0300
> > See https://swtch.com/~rsc/regexp/regexp1.html for implementation in less than
> > 400 lines of C (probably less rewritten in a "modern" style).
>
> That's some really nice code, but it's missing *at least* one
> essential feature that people expect
> from a regex engine: captures.
>
> The author calls captures "submatch extraction" and claims that
> "Thompson-style algorithms
> can be adapted to track submatch boundaries without giving up
> efficient performance". If true,
> it would be nice to see how simple and small a fast implementation can be.
It also misses character classes. Although it shouldn't be difficult
to add them, I don't think the proposed implementation would handle
them nicely. Translating '[a-z]' to 'a|b|c|d|...|z' and then running
that non-deterministically would involve a lot of states. It also
doesn't show the 're2post' function, doesn't handle errors (e.g.,
out of memory), etc.
And, of course, it cannot implement back-references, as they are
NP-complete.
-- Roberto