[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPEG documentation needs more clarification
- From: Roberto Ierusalimschy <roberto@...>
- Date: Tue, 19 Feb 2019 16:57:59 -0300
> local lpeg = require "lpeg"
> local Cg = lpeg.Cg
> local Cc = lpeg.Cc
> local Cb = lpeg.Cb
> local P = lpeg.P
>
> local cnt = Cg(Cc(0),'count')
> * (P(1) * Cg(Cb'count' / function(c) return c + 1 end,'count'))^0
> * Cb'count'
>
> print(cnt:match(string.rep("x",512)))
> print(cnt:match(string.rep("x",512+128))) -- CRASH at some point past this line
> [...]
>
> What I did not expect was the 2,900 C callstack entries this produced (thus
> causing the crash).
You are doing a kind of lazy evaluation. Each new definition of 'count'
depends on the previous one. When LPeg goes on to evaluate the last
definition, it needs to evaluate (using recursion in the C code)
the previous one, and so and so. So, you have many recursion levels
multiplied by the number of intermediate functions inside the indirect
recursion.
Nevertheless, this is a bug in LPeg. It should limit its recursion level
and give a decent error. (This is always a nightmare in C.)
As a side note, folding was created exaclty for that kind of use:
local cnt = lpeg.Cf(Cc(0) * lpeg.C(1)^0, function(c) return c + 1 end)
> When reading the LPEG documentation trying to find a
> warning about this (especially the large call stack) I didn't find much.
> There is this line:
>
> Therefore, captures should avoid side effects.
>
> but the previous parenthetical gives an example of a "side effect":
>
> (As an example, consider the pattern lpeg.P"a" / func / 0. Because
> the "division" by 0 instructs LPeg to throw away the results from
> the pattern, LPeg may or may not call func.)
That is exactly the point: to show that if there is a side effect, final
result may be undefined.
> Then there's this bit about lpeg.Cb():
>
> Creates a *back capture*. This pattern matches the empty string and
> produces the values produced by the *most recent* group capture
> named name (where name can be any Lua value).
>
> *Most recent* means the last *complete outermost group* capture with
> the given name. A *Complete* capture means that the entire pattern
> corresponding to the capture has matched. An *Outermost* capture
> means that the capture is not inside another complete capture.
>
> In the same way that LPeg does not specify when it evaluates
> captures, it does not specify whether it reuses values previously
> produced by the group or re-evaluates them.
>
> That last bit *could* mean (in the sample code above) that lpeg.Cb()
> re-evaluates the lpeg.Cc(0) each instance and thus, my attempt to count
> using lpeg.Cg() and lpeg.Cb() could return 0 or 1 just as well as the actual
> count [4].
Note the "previously produced by *the* group". Two groups named 'count'
are not the same group, so there is nothing previously produced by each
group there. Each new group will use the previous one, which is the most
recent from its point of view. Your original pattern has a well-defined
behavior, without any side effect. The problem is that it creates a long
list of group captures, all called 'count' and each one depending on the
previous one.
-- Roberto