Hi, my name is Ben and I'm working with LabLua on a parser generator tool to make writing of parsers in Lua easier to implement. The main focus of the project is to build an Lpeglabel grammar that automatically handles whitespace characters and generates error labels.
The problem I'm facing now is that whitespace characters are captured together with terminals and I could not come up with a workaround that does not change the syntax of the tool. This is what I've tried:
local SPACES = lpeg.P ' '
solution 1)
patt * SPACES^0 -- captures spaces when inside a capture (undesired)
solution 2)
lpeg.Cs( patt * (SPACES^0 / '') ) -- throws an error when patt is a table capture. Also, this creates a new capture, which is undesired
solution 3)
local x = patt * lpeg.Cmt(SPACES, registerspaces)^0
function registerspaces(s,i, ...)
spaces[i] = true
return i -- no capture
end
Then, when we use lpeg.C (x), we apply a match-time function "removespaces" to remove spaces from the capture list based on spaces[] -- very inefficient!
solution 4)
Changing lpeg.C(patt) to lpeg.Cg( lpeg.C(term1) * SPACES^0 * lpeg.C(term2) * ... ) where patt = term1 * term2 * ...
This solution should work in theory, but for some reason this does not group the captures into one string, which is what it's supposed to do.
I would appreciate if anyone could help me create a better solution.
I'm suggesting an improvement to the Lpeg tool as well:
1) a new function for a "silent match" lpeg.Sm, that consumes the input but does not produce it in the capture list.
Say we have patt1 and patt2 that do not produce any captures.
Current behaviour:
lpeg.C(patt1 * lpeg.P ' ' * patt2) --> captures patt1 .. " " .. patt2
Wanted behaviour:
lpeg.C(patt1 * lpeg.Sm(lpeg.P ' ') * patt2) --> captures patt1 .. patt2
patt1 * lpeg.Sm(lpeg.P ' ') * patt2 --> no capture
In a way, it is similar to lpeg.P ' ' / '', but it does not create a new capture. I think this function could be useful for other purposes as well, to create simpler grammars.
I don't know if this is possible with the current implementation of lpeg, so I'm open to other suggestions as well.