lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


  It is at this point I'm going to ask what you are trying to accomplish
here, because it's coming across as an XY problem [1].  I also saw this you
posted on the forum linked to earlier [2]:

	All of my code is for learning lpeg re, since many claimed that lpeg
	can do everything lua pattern can do.

  I do believe that LPeg can do everything that patterns can do, but nothing
in that statement says anything about speed or ease.

  With that said ... 

It was thus said that the Great albertmcchan once stated:
> pat = re.compile "{g <- .g / &'and' }"   -- lua pattern "(.*)and"
> = lpeg.pcode( pat )                             -- using debug version of lpeg
> 
> i noticed its pcode has a "behind 3" instruction to not consume the last 'and'

  I didn't find this lpeg.pcode() function.  There is an lpeg.ptree()
function, and the re expression above generates:
	
	[1 = g  ]
	capture kind: 'simple'  key: 0
	  grammar 1
	    rule n: 0  key: 1
	      choice
	        seq
	          any
	          call key: 1  (rule: 0)
	        and
	          seq
	            char 'a'
	            seq
	              char 'n'
	              char 'd'

  I'm not sure what you mean by "behind 3" instruction.

> there is a lpeg.B function to do look-behind, but how to go back to it if B matched ?
> 
> Is there a lpeg.U(n) (for undo n characters) or something similar ?

  I think that lpeg.Cmt() can do something like that, but as I wrote in a
previous message [3] to Jonathan Goble:

>   I think you are thinking about LPeg with the wrong mindset---yes, you can
> look for patters in text with LPeg [1] but it's for *parsing*---pulling out
> semantic information from text, rather than just patterns.  I've written a
> lot of LPeg code [2], and not once have I needed a greedy, non-possessive
> repetition to parse text.

  Now, back to your message:
  
> As an example of its usefulness, say # is lpeg re for undo 1 character
> 
> REDO above re pattern, but without backtrack stack overflow problem:
> NOTE: I want to capture ALL except LAST 'and'
> 
> pat = re.compile "{ (g <- 'and' / . [^a]* g)+ ### }"
> 
> Without UNDO, I have to do this (likely much slower):
> 
> pat = re.compile( "(g <- 'and' / . [^a]* g)+ -> drop3", {drop3 = function(s) return s:sub(1,-4) end} )

  If you are looking for a final "and" (which ends the input), then this
works:

	last_and = P"and" * P(-1)
	char     = R("\0\96","b\255")^1
                 + -last_and * P"a"
	pat      = C((char)^0) * last_and

	print(pat:match(string.rep("this and that land",400) .. "and"))

  The wierd production of 'char' is to burn through large sequences of
charaters that don't contain the letter 'a'.  It's probably faster than this
version:

	last_and = P"and" * P(-1)
	pat      = C((P(1) - last_and)^0) * last_and

but I did not bother to benchmark it.  Personally, I would probably use the
above version since it's easier to understand.  If it became an issue, then
I might go with the version with 'char' and if that was still slow, then I
would take stock with what I'm really trying to accomplish and adjust
accordingly.

  -spc (So, what is it you are really trying to do?)

[1]	http://xyproblem.info/

[2]	http://www.gammon.com.au/forum/?id=14149&reply=31#reply31

[3]	http://lua-users.org/lists/lua-l/2017-10/msg00143.html