[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Safe user-facing pattern matching.
- From: Coda Highland <chighland@...>
- Date: Thu, 20 Apr 2017 18:54:23 -0700
On Thu, Apr 20, 2017 at 5:48 PM, Gé Weijers <ge@weijers.org> wrote:
> On Mon, Apr 10, 2017 at 13:38 Martin <eden_martin_fuhrspam@gmx.de> wrote:
>>
>> On 04/05/2017 06:14 PM, Gé Weijers wrote:
>> > I don't yet see how to do this in linear time by bolting a recognizer
>> > for the "%b()" pattern onto a finite state automaton.
>>
>> I think this is impossible for canonic finite state automata (which is
>> just a series of triples (state, cur_char, next_state) from limited
>> alphabet).
>>
>> But is possible if we add some "condition" and "action" fields
>> that contain functions and procedures:
>>
>> FSA (finite state automata):
>>
>> state | cur_char | cond | action | next_state
>> -------+----------+----------+----------------------------+-----------
>> SCAN | EOL | | | DONE
>> SCAN | ( | | starts[++deep] = POS | SCAN
>> SCAN | ) | deep > 0 | capt = seg(starts[deep--], POS) | SCAN
>> SCAN | ) | | | SCAN
>> SCAN | ANY | | | SCAN
>
>
> A regular expression like ".*%b()" will match "(((((((())", which your
> automaton will not unless it's a nondeterministic one.
> You cannot know how many '('s are matched by ".*" until you get to the end
> of the string.
>
> I don't think there's a linear time algorithm to match expressions like this
> in the general case. You either backtrack, or your deterministic state
> representation is a list only bounded by the input size.
>
> Gé
> --
> -- Gė
In the general case, no, you're right. Context-free languages require
polynomial time.
However, the LR subset of the context-free languages CAN be matched in
linear time. A great many grammars can be transformed into LR. The
language of matched parentheses, for example, is in LR. XML is in LR.
Javascript was in LR at one point in its history. (I don't know if all
of the newer features still keep it there.)
You ARE right that LR parsers need an internal stack, and that stack
in the worst case can grow in the size of the input. (That said, this
is true for backtracking parsers as well.)
/s/ Adam