Problems with lpeg.Cmt captures

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Problems with lpeg.Cmt captures
From: Glenn McAllister <glenn@...>
Date: Wed, 09 Apr 2008 14:02:10 -0400

I've got an awkward grammar that requires me to reach back to apreviously parsed element to see what it is (is it a newline or not)before I can go accept the match. Because of the way I need toimplement things, the newline is actually captured in a previous matchand there should be a NewlineChunk object as the capture. That objectisn't available for me to use in the next match, so I decided to try anduse Cmt and Cb to figure out what I need to do. My function is gettingcalled correctly, but after I return true (or i) LPEG re-evaluates thesame text but provides me with a different back reference to compare,which fails and then sends me down the wrong path.

Some examples will clear this up I hope. I need to implement whatamounts to an if statement, with very specific rules about whatwhitespace I must strip. Specifically if I have


'one$if(foo)$ two \n$endif$\n three'

The the newline before and after the $endif has to be stripped. Theresulting text would be 'one two three'.


If I have

'one$if(foo)$ two $endif$\n three'

the resulting text would be 'one two\n three'. Note that I'm notsupposed to chew the newline after $endif$, as it isn't on a line by itself.

The 'two' bit can obviously be more than just text, and in particular itcan be more $if(..)$ blah $endif$ statements, meaning I need to treatblah as an arbitrary series of chunks to be parsed. Because of the wayI need to handle newlines, they are treated as a separate flyweight object.

So in the first case where I have to chew that newline object, I can'tjust do (ignoring captures, etc.):


EndifExpr = s.NEWLINE * ExprStart * s.ENDIF * ExprEnd * s.NEWLINE

The first newline is captured in the parsing of the chunks. So the ruleI came up with is (sorry for any bad wrapping):


 EndifExpr = Cmt(Cb(1) * ExprStart * s.ENDIF * ExprEnd * s.NEWLINE,
                    function(s,i,a)

print('checking if we need to kill a newline in\'' .. s ..

                              '\' at position ' .. i)
                        if a.isA then
                            if a:isA(NewlineChunk) then
                                print('really need to kill newline')
                                return i, "kill"
                            else
                                print('not a NewlineChunk class')
                                return false
                            end
                        else
                            print('not a ST class (not expected)', a)
                            return false
                        end
                    end) +
                Cs((ExprStart * C(s.ENDIF) * ExprEnd) / "dontkill"),

Basically I want to see "kill" or "dontkill" to decide what to do withthe last chunk in my table of chunks. When the overall match isdetermined, a function is called to create the if chunk, which is whereI make my decision.

And yes, this match does actually execute. The problem for me is thatit executes twice. Before you ask, if I replace 'return i, "kill"' with'return i+1, "kill"' in an attempt to advance the match position, Istill see this same behavior. The following is debug output from LPEG:


|| s: |$endif$
||  three| stck: 9 c: 14  195: choice -> 198 (0)
|| s: |$endif$
||  three| stck: 10 c: 14  196: call -> 69
|| s: |$endif$
||  three| stck: 11 c: 14  69: choice -> 90 (0)
|| s: |$endif$
||  three| stck: 12 c: 14  70: opencapture runtime(n = 0)  (off = 8)
|| s: |$endif$
||  three| stck: 12 c: 15  71: emptycapture backref(n = 0)  (off = 1)
|| s: |$endif$
||  three| stck: 12 c: 16  72: call -> 309
|| s: |$endif$
||  three| stck: 13 c: 16  309: set [(24)]
|| s: |endif$
||  three| stck: 13 c: 16  318: ret
|| s: |endif$
||  three| stck: 12 c: 16  73: char 'e'
|| s: |ndif$
||  three| stck: 12 c: 16  74: char 'n'
|| s: |dif$
||  three| stck: 12 c: 16  75: char 'd'
|| s: |if$
||  three| stck: 12 c: 16  76: char 'i'
|| s: |f$
||  three| stck: 12 c: 16  77: char 'f'
|| s: |$
||  three| stck: 12 c: 16  78: call -> 307
|| s: |$
||  three| stck: 13 c: 16  307: char '$'
|| s: |
||  three| stck: 13 c: 16  308: ret
|| s: |
||  three| stck: 12 c: 16  79: set [(0a)(0d)]
|| s: | three| stck: 12 c: 16  88: closeruntime close(n = 0)  (off = 0)
|| checking if we need to kill a newline in 'one$if(foo)$ two
|| $endif$
||  three' at position 27
|| really need to kill newline
|| s: | three| stck: 12 c: 16  89: commit -> 102
|| s: | three| stck: 11 c: 16  102: ret
|| s: | three| stck: 10 c: 16  197: failtwice
|| s: |$endif$
||  three| stck: 7 c: 14  435: closecapture close(n = 0)  (off = 0)
|| s: |$endif$
||  three| stck: 7 c: 15  436: call -> 69
|| s: |$endif$
||  three| stck: 8 c: 15  69: choice -> 90 (0)
|| s: |$endif$
||  three| stck: 9 c: 15  70: opencapture runtime(n = 0)  (off = 8)
|| s: |$endif$
||  three| stck: 9 c: 16  71: emptycapture backref(n = 0)  (off = 1)
|| s: |$endif$
||  three| stck: 9 c: 17  72: call -> 309
|| s: |$endif$
||  three| stck: 10 c: 17  309: set [(24)]
|| s: |endif$
||  three| stck: 10 c: 17  318: ret
|| s: |endif$
||  three| stck: 9 c: 17  73: char 'e'
|| s: |ndif$
||  three| stck: 9 c: 17  74: char 'n'
|| s: |dif$
||  three| stck: 9 c: 17  75: char 'd'
|| s: |if$
||  three| stck: 9 c: 17  76: char 'i'
|| s: |f$
||  three| stck: 9 c: 17  77: char 'f'
|| s: |$
||  three| stck: 9 c: 17  78: call -> 307
|| s: |$
||  three| stck: 10 c: 17  307: char '$'
|| s: |
||  three| stck: 10 c: 17  308: ret
|| s: |
||  three| stck: 9 c: 17  79: set [(0a)(0d)]
|| s: | three| stck: 9 c: 17  88: closeruntime close(n = 0)  (off = 0)
|| checking if we need to kill a newline in 'one$if(foo)$ two
|| $endif$
||  three' at position 27
|| not a ST class (not expected)        table: 0x809ad88

So for some reason, despite returning the fact that I said the matchsucceeded, it appears to have failed and is calling it again. At thispoint, I'm not sure what captured element is being provided, unless itsthe entire table of chunks.

I'm stumped. Anyone have any pointers as to why my Cmt capture isn'tworking as I expected?


--
Glenn McAllister     <glenn@somanetworks.com>      +1 416 348 1594
SOMA Networks, Inc.  http://www.somanetworks.com/  +1 416 977 1414

Follow-Ups:
- Re: Problems with lpeg.Cmt captures, Roberto Ierusalimschy

Prev by Date: RE: Another table question...
Next by Date: Condition of the lua_State in __gc handler after call to lua_close() ?
Previous by thread: Re: Another table question...
Next by thread: Re: Problems with lpeg.Cmt captures
Index(es):
- Date
- Thread