[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Problems with lpeg.Cmt captures
- From: Glenn McAllister <glenn@...>
- Date: Wed, 09 Apr 2008 14:02:10 -0400
I've got an awkward grammar that requires me to reach back to a
previously parsed element to see what it is (is it a newline or not)
before I can go accept the match. Because of the way I need to
implement things, the newline is actually captured in a previous match
and there should be a NewlineChunk object as the capture. That object
isn't available for me to use in the next match, so I decided to try and
use Cmt and Cb to figure out what I need to do. My function is getting
called correctly, but after I return true (or i) LPEG re-evaluates the
same text but provides me with a different back reference to compare,
which fails and then sends me down the wrong path.
Some examples will clear this up I hope. I need to implement what
amounts to an if statement, with very specific rules about what
whitespace I must strip. Specifically if I have
'one$if(foo)$ two \n$endif$\n three'
The the newline before and after the $endif has to be stripped. The
resulting text would be 'one two three'.
If I have
'one$if(foo)$ two $endif$\n three'
the resulting text would be 'one two\n three'. Note that I'm not
supposed to chew the newline after $endif$, as it isn't on a line by itself.
The 'two' bit can obviously be more than just text, and in particular it
can be more $if(..)$ blah $endif$ statements, meaning I need to treat
blah as an arbitrary series of chunks to be parsed. Because of the way
I need to handle newlines, they are treated as a separate flyweight object.
So in the first case where I have to chew that newline object, I can't
just do (ignoring captures, etc.):
EndifExpr = s.NEWLINE * ExprStart * s.ENDIF * ExprEnd * s.NEWLINE
The first newline is captured in the parsing of the chunks. So the rule
I came up with is (sorry for any bad wrapping):
EndifExpr = Cmt(Cb(1) * ExprStart * s.ENDIF * ExprEnd * s.NEWLINE,
function(s,i,a)
print('checking if we need to kill a newline in
\'' .. s ..
'\' at position ' .. i)
if a.isA then
if a:isA(NewlineChunk) then
print('really need to kill newline')
return i, "kill"
else
print('not a NewlineChunk class')
return false
end
else
print('not a ST class (not expected)', a)
return false
end
end) +
Cs((ExprStart * C(s.ENDIF) * ExprEnd) / "dontkill"),
Basically I want to see "kill" or "dontkill" to decide what to do with
the last chunk in my table of chunks. When the overall match is
determined, a function is called to create the if chunk, which is where
I make my decision.
And yes, this match does actually execute. The problem for me is that
it executes twice. Before you ask, if I replace 'return i, "kill"' with
'return i+1, "kill"' in an attempt to advance the match position, I
still see this same behavior. The following is debug output from LPEG:
|| s: |$endif$
|| three| stck: 9 c: 14 195: choice -> 198 (0)
|| s: |$endif$
|| three| stck: 10 c: 14 196: call -> 69
|| s: |$endif$
|| three| stck: 11 c: 14 69: choice -> 90 (0)
|| s: |$endif$
|| three| stck: 12 c: 14 70: opencapture runtime(n = 0) (off = 8)
|| s: |$endif$
|| three| stck: 12 c: 15 71: emptycapture backref(n = 0) (off = 1)
|| s: |$endif$
|| three| stck: 12 c: 16 72: call -> 309
|| s: |$endif$
|| three| stck: 13 c: 16 309: set [(24)]
|| s: |endif$
|| three| stck: 13 c: 16 318: ret
|| s: |endif$
|| three| stck: 12 c: 16 73: char 'e'
|| s: |ndif$
|| three| stck: 12 c: 16 74: char 'n'
|| s: |dif$
|| three| stck: 12 c: 16 75: char 'd'
|| s: |if$
|| three| stck: 12 c: 16 76: char 'i'
|| s: |f$
|| three| stck: 12 c: 16 77: char 'f'
|| s: |$
|| three| stck: 12 c: 16 78: call -> 307
|| s: |$
|| three| stck: 13 c: 16 307: char '$'
|| s: |
|| three| stck: 13 c: 16 308: ret
|| s: |
|| three| stck: 12 c: 16 79: set [(0a)(0d)]
|| s: | three| stck: 12 c: 16 88: closeruntime close(n = 0) (off = 0)
|| checking if we need to kill a newline in 'one$if(foo)$ two
|| $endif$
|| three' at position 27
|| really need to kill newline
|| s: | three| stck: 12 c: 16 89: commit -> 102
|| s: | three| stck: 11 c: 16 102: ret
|| s: | three| stck: 10 c: 16 197: failtwice
|| s: |$endif$
|| three| stck: 7 c: 14 435: closecapture close(n = 0) (off = 0)
|| s: |$endif$
|| three| stck: 7 c: 15 436: call -> 69
|| s: |$endif$
|| three| stck: 8 c: 15 69: choice -> 90 (0)
|| s: |$endif$
|| three| stck: 9 c: 15 70: opencapture runtime(n = 0) (off = 8)
|| s: |$endif$
|| three| stck: 9 c: 16 71: emptycapture backref(n = 0) (off = 1)
|| s: |$endif$
|| three| stck: 9 c: 17 72: call -> 309
|| s: |$endif$
|| three| stck: 10 c: 17 309: set [(24)]
|| s: |endif$
|| three| stck: 10 c: 17 318: ret
|| s: |endif$
|| three| stck: 9 c: 17 73: char 'e'
|| s: |ndif$
|| three| stck: 9 c: 17 74: char 'n'
|| s: |dif$
|| three| stck: 9 c: 17 75: char 'd'
|| s: |if$
|| three| stck: 9 c: 17 76: char 'i'
|| s: |f$
|| three| stck: 9 c: 17 77: char 'f'
|| s: |$
|| three| stck: 9 c: 17 78: call -> 307
|| s: |$
|| three| stck: 10 c: 17 307: char '$'
|| s: |
|| three| stck: 10 c: 17 308: ret
|| s: |
|| three| stck: 9 c: 17 79: set [(0a)(0d)]
|| s: | three| stck: 9 c: 17 88: closeruntime close(n = 0) (off = 0)
|| checking if we need to kill a newline in 'one$if(foo)$ two
|| $endif$
|| three' at position 27
|| not a ST class (not expected) table: 0x809ad88
So for some reason, despite returning the fact that I said the match
succeeded, it appears to have failed and is calling it again. At this
point, I'm not sure what captured element is being provided, unless its
the entire table of chunks.
I'm stumped. Anyone have any pointers as to why my Cmt capture isn't
working as I expected?
--
Glenn McAllister <glenn@somanetworks.com> +1 416 348 1594
SOMA Networks, Inc. http://www.somanetworks.com/ +1 416 977 1414