[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Questions about Lpeg (semantics of captures)
- From: "Eduardo Ochs" <eduardoochs@...>
- Date: Wed, 14 Mar 2007 15:49:12 -0300
A few questions about lpeg...
I still don't understand lpeg very well, and I have the (naive?)
impression that patterns-with-captures are implemented on top of
patterns-without-captured in a way that even allows "projecting" a
pattern-with-captures into the lowel level, by discarding all the
information about captures... also, matching a pattern-with-captures
involves some backtracking, and some operations on the captures - like
"patt / function" - should only be performed after the (super)pattern
succeeds; so, in a first moment lpeg.match keeps backtracking
information and instructions for performing captures; at some point
the pattern is "closed", the backtracking information is dropped, and
the instructions for performing captures are executed...
Is that mental model correct? Is there a way to force a subpattern to
be closed, and its captures performed?
Now let me show why I stumbled on that question, and why I was
somewhat surprised when I discovered that the execution of the
function in "patt / function" is delayed.
I am trying to htmlize some files that have lots of "Elisp hyperlinks"
embedded in comments. For example, in
# (info "(bash)Shell Parameter Expansion")
the "(info ...)" can be used as a hyperlink inside Emacs - executing
it as Lisp opens a page of the Bash manual. Not all sexps are
hyperlinks, and only a few of the sexps that work as hyperlinks inside
Emacs can be htmlized in meaningful ways. I have a table whose keys
are the symbols that can be heads of htmlizable hyperlink sexps, and I
was trying to build a pattern that would fail immediately when it
noticed that it was processing a sexp that is not htmlizable.
My first attempts to build patterns that would match only the "head
symbols" were more or less like this (I'm reconstructing that from
memory - it didn't work...):
SSymbol = lpeg.R("AZ", "az", "09") + lpeg.S("-+_")
headsymbols = { ["info"]=true, ["man"]=true }
setsymbol = function (str) symbol = str end
isheadsymbol = function (subj, pos)
return headsymbols[symbol] and pos
end
SHeadSymbol = (SSymbol / setsymbol) * lpeg.P(isheadsymbol)
but then I discovered that the the "/ setsymbol" part was being
executed after the "lpeg.P(isheadsymbol)", not before...
My current solution (which works!) is like this - again, I'm
reconstructing this from from memory; the real implementation is more
complex:
SSymbol = lpeg.R("AZ", "az", "09") + lpeg.S("-+_")
headsymbols = { ["info"]=true, ["man"]=true }
setmark = function (subj, pos)
mark = pos
return pos
end
isheadsymbol = function (subj, pos)
local symbol = string.sub(subj, mark, pos - 1)
return headsymbols[symbol] and pos
end
SHeadSymbol = lpeg.P(setmark) * SSymbol * lpeg.P(isheadsymbol)
Cheers, more later, thanks in advance, etc,
Eduardo Ochs
eduardoochs@gmail.com
http://angg.twu.net/