Lazy Tree |
|
(This is part of LazyKit.)
LazyTree constructs an conventional-seeming XmlTree as its contents are referenced, parsing more source document as necessary to fill out the contents on demand.
Why is this interesting? Trees are a natural data model to process XML documents. A simple tree implementation reads the entire document into memory at once. For large documents, this can be too expensive. Although callback and event APIs are memory-efficient, they are painful to program to.
Given stylized iterators, memory consumption can be limited to particular subtrees. Consider:
for i,child in xnpairs_c(tree) do if child.attr.href then print(child.name) table.insert(references, child) end end
where the _c
family of iterators nils out nodes
from their parent before returning them. If the body of the
loop does not retain a reference to the child elsewhere, it will
become eligible for garbage collection as soon as the next
iteration begins. See ConsumingXml.
Although not currently implemented, other consuming forms may interact with the XML parser for greater savings:
<document> <firstname>Jay</firstname> <lastname>Carlson</lastname> <bodytext>Spending too much time listening to <ref>In Utero</ref> can be [...] <title>I Think I'm DOM</title>
lastname, title = xmlextract.strings_consume(tree, "lastname", "title")
strings_consume
filter can potentially turn
off character data and other events inside any node it knows it
doesn't need (like bodytext
), as references to them
cannot possibly affect the rest of the program.
lazytree
depends on lxpevent
to
generate event queues, which depends on LuaExpat?.
Calling the normal ipairs
iterator on a lazytree
that has not been completely loaded will not work, as
ipairs(lz)
does not directly reference
lz.n
. Use the XmlIter iterators
xpairs{_c
} and xnpairs{_c
}; the second
is more convenient anyway. See XmlIter.
Returns a tree lazily parsed from the string s
.
Returns a tree lazily parsed from file
. If
file
is a string, it is interpreted as a filename
and opened; otherwise, file
is treated as an io
library file object.
Returns a tree lazily parsed from the lxpevent
event_source
.
Force the entire contents of tree
to be read in.
It is safe to call this on non-lazy trees.
Indicate tree
is no longer needed and may be
destroyed. tree
may be either a lazytree or a
conventional tree, and should be the last reference to it.
Although not currently implemented, calling consume on the part
of the lazytree currently being built could tell the lazy parser
not to bother populating that portion of the tree. This is not
intended as a general user tool; rather, it is a primitive that
can be used by consuming filtering iterators such as
xmliter.switch
when they notice that a tree they
have encountered will be skipped and have no visibility in the
application.
Prints the current contents of a lazytree without further parsing. Useful for demonstration purposes.
<paragraph justify='centered'>first child<b>bold</b>second child</paragraph>
lz = {name="paragraph", attr={justify="centered"}, "first child", {name="b", "bold", n=1}, "second child", n=3 }
lz = {name="paragraph", attr={justify="centered"}, _read_so_far=0 }
lz[1]
, it will contain:
lz = {name="paragraph", attr={justify="centered"}, "first child", _read_so_far=1 }
lz[2]
:
lz = {name="paragraph", attr={justify="centered"}, "first child", {name="b", _read_so_far=0} }
lz[3]
will force all of
lz[2]
to be completed:
lz = {name="paragraph", attr={justify="centered"}, "first child", {name="b", "bold", n=1} "second child", _read_so_far=3 }
lz[4]
(which is nil) or
lz.n
will force the completion of the tree.
Note that reading from lz.n
will force the
remainder of the tree to be read, as we don't know how long it's
going to be until it closes.