Meta Lua Abstract Syntax Tree |
|
Trees are written below with some Metalua syntax sugar, which increases their readability. the backquote symbol introduces a `tag`, i.e. a string stored in the `"tag"` field of a table:
`Foo{ 1, 2, 3 }
is a shortcut for {tag="Foo", 1, 2, 3}
;
`Foo
is a shortcut for {tag="Foo"}
;
`Foo 123
is a shortcut for `Foo{ 123 }
, and therefore {tag="Foo", 123 }
; the expression after the tag must be a literal number or string: accepting tables would lead to ambiguities.
When using a Metalua interpreter or compiler, the backtick syntax is supported and can be used directly. Metalua's pretty-printing helpers also try to use backtick syntax whenever applicable.
Tree elements are mainly categorized into statements stat
,
expressions expr
and lists of statements block
. Auxiliary
definitions include function applications/method invocation apply
,
are both valid statements and expressions, expressions admissible on
the left-hand-side of an assignment statement lhs
.
block: { stat* } stat: `Do{ stat* } | `Set{ {lhs+} {expr+} } -- lhs1, lhs2... = e1, e2... | `While{ expr block } -- while e do b end | `Repeat{ block expr } -- repeat b until e | `If{ (expr block)+ block? } -- if e1 then b1 [elseif e2 then b2] ... [else bn] end | `Fornum{ ident expr expr expr? block } -- for ident = e, e[, e] do b end | `Forin{ {ident+} {expr+} block } -- for i1, i2... in e1, e2... do b end | `Local{ {ident+} {expr+}? } -- local i1, i2... = e1, e2... | `Localrec{ ident expr } -- only used for 'local function' | `Goto{ <string> } -- goto str | `Label{ <string> } -- ::str:: | `Return{ <expr*> } -- return e1, e2... | `Break -- break | apply expr: `Nil | `Dots | `True | `False | `Number{ <number> } | `String{ <string> } | `Function{ { ident* `Dots? } block } | `Table{ ( `Pair{ expr expr } | expr )* } | `Op{ opid expr expr? } | `Paren{ expr } -- significant to cut multiple values returns | apply | lhs apply: `Call{ expr expr* } | `Invoke{ expr `String{ <string> } expr* } ident: `Id{ <string> } lhs: ident | `Index{ expr expr } opid: 'add' | 'sub' | 'mul' | 'div' | 'mod' | 'pow' | 'concat'| 'eq' | 'lt' | 'le' | 'and' | 'or' | 'not' | 'len'
ASTs also embed some metadata, allowing to map them to their source
representation. Those informations are stored in a "lineinfo"
field
in each tree node, which points to the range of characters in the
source string which represents it, and to the content of any comment
that would appear immediately before or after that node.
Lineinfo objects have two fields, "first"
and "last"
, describing
respectively the beginning and the end of the subtree in the
sources. For instance, the sub-node `Number{123}
produced by parsing
[[return 123]]
will have lineinfo.first
describing offset 8, and
lineinfo.last
describing offset 10:
> mlc = require 'metalua.compiler'.new() > ast = mlc :src_to_ast "return 123 -- comment" > print(ast[1][1].lineinfo) <?|L1|C8-10|K8-10|C> >
A lineinfo keeps track of character offsets relative to the beginning of the source string/file ("K8-10" above), line numbers (L1 above; a lineinfo spanning on several lines would read something like "L1-10"), columns i.e. offset within the line ("C8-10" above), and a filename if available (the "?" mark above indicating that we have no file name, as the AST comes from a string). The final "|C>" indicates that there's a comment immediately after the node; an initial "<C|" would have meant that there was a comment immediately before the node.
Positions represent either the end of a token and the beginning of an
inter-token space ("last"
fields) or the beginning of a token, and
the end of an inter-token space ("first"
fields). Inter-token spaces
might be empty. They can also contain comments, which might be useful
to link with surrounding tokens and AST subtrees.
Positions are chained with their "dual" one: a position at the
beginning of and inter-token space keeps a reference to the position at
the end of that inter-token space in its "facing"
field, and
conversely, end-of-inter-token positions keep track of the inter-token
space beginning, also in "facing"
. An inter-token space can be
empty, e.g. in "2+2"
, in which case lineinfo==lineinfo.facing
.
Comments are also kept in the "comments"
field. If present, this
field contains a list of comments, with a "lineinfo"
field
describing the span between the first and last comment. Each comment
is represented by a list of one string, with a "lineinfo"
describing
the span of this comment only. Consecutive lines of --
comments are
considered as one comment: "-- foo\n-- bar\n"
parses as one comment
whose text is "foo\nbar"
, whereas "-- foo\n\n-- bar\n"
parses as
two comments "foo"
and "bar"
.
So for instance, if f
is the AST of a function and I want to
retrieve the comment before the function, I'd do:
f_comment = f.lineinfo.first.comments[1][1]
The informations in lineinfo positions, i.e. in each "first"
and
"last"
field, are held in the following fields:
"source"
the filename (optional);
"offset"
the 1-based offset relative to the beginning of the string/file;
"line"
the 1-based line number;
"column"
the 1-based offset within the line;
"facing"
the position at the opposite end of the inter-token space.
"comments"
the comments in the associated inter-token space (optional).
"id"
an arbitrary number, which uniquely identifies an inter-token space within a given tokens stream.