[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Safe user-facing pattern matching.
- From: Martin <eden_martin_fuhrspam@...>
- Date: Sat, 22 Apr 2017 13:40:54 -0700
On 04/20/2017 05:48 PM, Gé Weijers wrote:
> On Mon, Apr 10, 2017 at 13:38 Martin <eden_martin_fuhrspam@gmx.de
> <mailto:eden_martin_fuhrspam@gmx.de>> wrote:
>
> On 04/05/2017 06:14 PM, Gé Weijers wrote:
> > I don't yet see how to do this in linear time by bolting a recognizer
> > for the "%b()" pattern onto a finite state automaton.
>
> I think this is impossible for canonic finite state automata (which is
> just a series of triples (state, cur_char, next_state) from limited
> alphabet).
>
> But is possible if we add some "condition" and "action" fields
> that contain functions and procedures:
>
> FSA (finite state automata):
>
> state | cur_char | cond | action | next_state
> -------+----------+----------+----------------------------+-----------
> SCAN | EOL | | | DONE
> SCAN | ( | | starts[++deep] = POS | SCAN
> SCAN | ) | deep > 0 | capt = seg(starts[deep--], POS) | SCAN
> SCAN | ) | | | SCAN
> SCAN | ANY | | | SCAN
>
>
> A regular expression like ".*%b()" will match "(((((((())", which your
> automaton will not unless it's a nondeterministic one.
> You cannot know how many '('s are matched by ".*" until you get to the
> end of the string.
>
> I don't think there's a linear time algorithm to match expressions like
> this in the general case. You either backtrack, or your deterministic
> state representation is a list only bounded by the input size.
>
> -- Gė
Yes, AFAIK it's impossible to match ".*%b()" in linear time. That
message was regarding FSA to match "%b()".
This is due dreaded left recursion: in ('(((((((())'):match('.*%b()')
for any unprocessed char we don't know whether it belongs to "%b()" or
to ".*" part.
So we're diving, moving all chars to ".*" part until end of string,
where is no space for "%b()". Then, one level up we're moving last ")"
to %b part just to fail again. Finally we succeed in matching "()" as
"%b()" and return this result:
Lua 5.3.3 Copyright (C) 1994-2016 Lua.org, PUC-Rio
> ('(((((((())'):match('.*%b()')
(((((((()
BTW there is tricky part: probably you want "(())" in %b part.
In this case pattern should be ".-%b()":
> ('(((((((())'):match('(.-)(%b())')
(((((( (())
-- Martin