[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: lpeg.U ?
- From: Sean Conner <sean@...>
- Date: Thu, 25 Jan 2018 20:03:05 -0500
It is at this point I'm going to ask what you are trying to accomplish
here, because it's coming across as an XY problem [1]. I also saw this you
posted on the forum linked to earlier [2]:
All of my code is for learning lpeg re, since many claimed that lpeg
can do everything lua pattern can do.
I do believe that LPeg can do everything that patterns can do, but nothing
in that statement says anything about speed or ease.
With that said ...
It was thus said that the Great albertmcchan once stated:
> pat = re.compile "{g <- .g / &'and' }" -- lua pattern "(.*)and"
> = lpeg.pcode( pat ) -- using debug version of lpeg
>
> i noticed its pcode has a "behind 3" instruction to not consume the last 'and'
I didn't find this lpeg.pcode() function. There is an lpeg.ptree()
function, and the re expression above generates:
[1 = g ]
capture kind: 'simple' key: 0
grammar 1
rule n: 0 key: 1
choice
seq
any
call key: 1 (rule: 0)
and
seq
char 'a'
seq
char 'n'
char 'd'
I'm not sure what you mean by "behind 3" instruction.
> there is a lpeg.B function to do look-behind, but how to go back to it if B matched ?
>
> Is there a lpeg.U(n) (for undo n characters) or something similar ?
I think that lpeg.Cmt() can do something like that, but as I wrote in a
previous message [3] to Jonathan Goble:
> I think you are thinking about LPeg with the wrong mindset---yes, you can
> look for patters in text with LPeg [1] but it's for *parsing*---pulling out
> semantic information from text, rather than just patterns. I've written a
> lot of LPeg code [2], and not once have I needed a greedy, non-possessive
> repetition to parse text.
Now, back to your message:
> As an example of its usefulness, say # is lpeg re for undo 1 character
>
> REDO above re pattern, but without backtrack stack overflow problem:
> NOTE: I want to capture ALL except LAST 'and'
>
> pat = re.compile "{ (g <- 'and' / . [^a]* g)+ ### }"
>
> Without UNDO, I have to do this (likely much slower):
>
> pat = re.compile( "(g <- 'and' / . [^a]* g)+ -> drop3", {drop3 = function(s) return s:sub(1,-4) end} )
If you are looking for a final "and" (which ends the input), then this
works:
last_and = P"and" * P(-1)
char = R("\0\96","b\255")^1
+ -last_and * P"a"
pat = C((char)^0) * last_and
print(pat:match(string.rep("this and that land",400) .. "and"))
The wierd production of 'char' is to burn through large sequences of
charaters that don't contain the letter 'a'. It's probably faster than this
version:
last_and = P"and" * P(-1)
pat = C((P(1) - last_and)^0) * last_and
but I did not bother to benchmark it. Personally, I would probably use the
above version since it's easier to understand. If it became an issue, then
I might go with the version with 'char' and if that was still slow, then I
would take stock with what I'm really trying to accomplish and adjust
accordingly.
-spc (So, what is it you are really trying to do?)
[1] http://xyproblem.info/
[2] http://www.gammon.com.au/forum/?id=14149&reply=31#reply31
[3] http://lua-users.org/lists/lua-l/2017-10/msg00143.html