Detecting Undefined Variables |
|
How can access of [undefined variables] (undeclared variables) be caught in Lua? is a frequently asked question.
Various approaches have been used as described below. These approaches differ in terms of when and how access to undefined global variables are detected. First, let's consider the nature of the problem...
In Lua programs, typos in variable names can be hard to spot because, in general, Lua will not complain that a variable is undefined. For example, consider this program that defines two functions:
function f(x) print(X) end function g(x) print(X + 1) end
Lua gives no error when loading this code. The first two lines
might be wrong (e.g. "x
" mistyped as "X
") or it might not be (maybe
X
is some other global variable). In fact, Lua has no way of
knowing if the code is wrong. The reason is that if a variable is
not recognized by Lua as a local variable (e.g. by static
declaration of the variable using a "local" keyword or function parameter definition), the variable is instead
interpreted as a global variable (as is the case for "X
"). Now,
whether a global variable is defined is not as easy to determine or
describe. X
has the value t['X']
where t = getfenv()
is the "environment table"
of the currently running function. X
always has a value, though it
is probably nil
if X
was a typo. We might interpret X
being nil
as
X
being undefined, but whether X
is nil can only be determined at
run-time. For example:
-- X is "undefined" f(X) -- print nil X = 2 -- X is defined f(X) -- prints 2 X = nil -- X is "undefined" again f(X) -- prints nil
Even the above runs without error. When X
is nil
, print(X)
becomes
print(nil)
, and it is valid to print a nil
value. However, consider
calling the function g
:
g(X)
This fails with the error "attempt to perform arithmetic on global 'X' (a nil value)"
. The reason is that print(X + 1)
becomes
print(nil + 1)
, and it is invalid to add nil
to a number. The error
is not observed, however, until the code nil + 1
actually executes.
Obviously, we may want to detect undefined global variables more proactively, such as detecting them at compile time or at least prior to production release (e.g. inside a test suite). The following methods have been devised.
Reads from and writes to undefined globals can be detected when they happen, at run-time. These approaches operate by overriding the __index
and __newindex
metamethods in the environment table of the currently running function. Lua sends reads and writes to undefined global variables to these metamethods that in turn can be programmed to raise run-time errors.
This approach is taken by the "strict" module in the Lua distribution (etc/strict.lua
(downloads for [Lua 5.1] and [Lua 5.2]). Alternately, see [LuaStrict] by ThomasLauer for an extension of the strict
approach.
Here are some advantages and disadvantages of this approach:
Advantages:
Disadvantages:
Local
DeclarationThe code below written by Niklas Frykholm was found in the Lua mail archive. I thought it would nice to document it in the wiki as gems like this can be easily lost or forgotten amongst the hundreds of mails. The concept about enforcing local variable declaration is to stop yourself from using a variable that hasn't been declared. This in effect also stops you from accidentally using an undeclared variable that was meant to be local in scope but gets treated as global which can come back and haunt you while debugging.
There are many effective solution to enforcing variable declaration, however, personally I have found Niklas Frykholm solution to be most elegant and unintrusive (also hardly a hit on performance as most variables declared in programs are local scope and the code only gets hit when declaring global variables).
Basically anytime you call GLOBAL_lock(_G)
(note the _G
is for the global variables table) somewhere in your code,
from that point onwards anytime you try to use a variable without explicitly declaring it as 'local'
Lua will return an error.
I have made a slight modification to the code to enable the convenience for one to also explicitly allow global declarations
by prefixing variables with double underscore (eg. __name
, __global_count
), however you may choose to change the
code for another naming method to suit your own taste (eg G_name
, G_global_count
). (Question from a reader: does this on-the-fly declaration of global variables prefixed with "__" not once again enable typos - i.e. setting __valueX and __valueX are both accepted as legal, kind of defying (a large part of) the original idea?)
--=================================================== --= Niklas Frykholm -- basically if user tries to create global variable -- the system will not let them!! -- call GLOBAL_lock(_G) -- --=================================================== function GLOBAL_lock(t) local mt = getmetatable(t) or {} mt.__newindex = lock_new_index setmetatable(t, mt) end --=================================================== -- call GLOBAL_unlock(_G) -- to change things back to normal. --=================================================== function GLOBAL_unlock(t) local mt = getmetatable(t) or {} mt.__newindex = unlock_new_index setmetatable(t, mt) end function lock_new_index(t, k, v) if (k~="_" and string.sub(k,1,2) ~= "__") then GLOBAL_unlock(_G) error("GLOBALS are locked -- " .. k .. " must be declared local or prefix with '__' for globals.", 2) else rawset(t, k, v) end end function unlock_new_index(t, k, v) rawset(t, k, v) end
--SamLie?
An alternative method is to detect undefined globals at compile time. Of course, Lua can be used as an interpreted language without an explicit compilation step (though internally it does compile to bytecode). What we mean by this, however, is that undefined globals are detected before the code executes as normal. It can be done without really executing all the code but rather only parsing it. This is sometimes called "static analysis" of source code.
To detect these at compile time you may (under a *nix-like operating system) use the following command-line trick with the Lua 5.1 compiler (luac):
Fully automated command for 5.1 [analyzelua.sh]:
For Lua 5.2/5.3 instead:
Fully automated command for 5.2/5.3 [analyzelua.sh]:
This lists all gets and sets to global variables (both defined and undefined ones). You may find that some gets/sets are interpreted as globals when you really wanted them to be locals (missing "local
" statement or misspelling variable name). The above approach works well if you follow a coding style of "avoiding globals like the plague" (i.e. using locals (lexicals) whenever possible).
An extension to this approach is in tests/globals.lua
in the Lua 5.1.2 distribution, which implements the *nix pipe " | grep ETGLOBAL" instead in Lua and does so more effectively by filtering out pre-defined globals (e.g. print
, math
, string
, etc.). See also LuaList:2006-05/msg00306.html, as well as LuaLint. Also see Egil Hjelmeland's [globals]. A more advanced version of globals.lua is [globalsplus.lua] (DavidManura), which looks in fields of global tables too. A yet more advanced bytecode analysis is done in [lglob] [3] (SteveDonovan).
An external "linter" tool or semantically aware text editor (like [Lua for IntelliJ IDEA], LuaInspect, the older LuaFish, or the Metalua code below) that parses and statically analyzes Lua code can achieve a similar effect, as well as detecting other classes of coding errors or questionable coding practices. For example, LuaFish (which is fairly experimental) can even detect that
string:length()
or math.cos("hello")
are invalid.
[Lua Checker] (5.1) is one such tool, which analyzes Lua source for common programming errors, much as the "lint" program does for C. It contains a Lua 5.1 bison parser.
love-studio [OptionalTypeSystem] allows type annotations in regular Lua comments:
-- this is a description -- @param(a : number) some parameter -- @ret(number) first return value -- @ret(string) second return value function Thing:Method(a) return 3,"blarg" end --@var(number) The x coordinate --@var(number) The y coordinate local x,y = 0,0
It is described as an "optional type system (as defined by Gilad Bracha in his paper Pluggable Type Systems) is a type system that a.) has no effect on the run-time semantics of the programming language, and b.) does not mandate type annotations in the syntax."
Another approach is to patch the Lua parser itself. See LuaList:2006-10/msg00206.html for such an example.
/* based on 5.1.4 */ static void singlevar (LexState *ls, expdesc *var) { TString *varname; FuncState *fs; check(ls, TK_NAME); varname = ls->t.seminfo.ts; fs = ls->fs; singlevaraux(fs, varname, var, 1); luaX_next(ls); /* luaX_next should occur after any luaX_syntaxerror */ }
Here are some advantages and disadvantages of this approach:
Advantages:
Disadvantages:
The following utility will lint Lua source code, detecting undefined variables (and could be expanded to do other interesting things).
-- lint.lua - A lua linter. -- -- Warning: In a work in progress. Not currently well tested. -- -- This relies on Metalua 0.2 ( http://metalua.luaforge.net/ ) -- libraries (but doesn't need to run under Metalua). -- The metalua parsing is a bit slow, but does the job well. -- -- Usage: -- lua lint.lua myfile.lua -- -- Features: -- - Outputs list of undefined variables used. -- (note: this works well for locals, but globals requires -- some guessing) -- - TODO: add other lint stuff. -- -- David Manura, 2007-03 -- Licensed under the same terms as Lua itself. -- Capture default list of globals. local globals = {}; for k,v in pairs(_G) do globals[k] = "global" end -- Metalua imports require "mlp_stat" require "mstd" --debug require "disp" --debug local filename = assert(arg[1]) -- Load source. local fh = assert(io.open(filename)) local source = fh:read("*a") fh:close() -- Convert source to AST (syntax tree). local c = mlp.block(mll.new(source)) --Display AST. --print(tostringv(c)) --print(disp.ast(c)) --print("---") --for k,v in pairs(c) do print(k,disp.ast(v)) end -- Helper function: Parse current node in AST recursively. function traverse(ast, scope, level) level = level or 1 scope = scope or {} local blockrecurse if ast.tag == "Local" or ast.tag == "Localrec" then local vnames, vvalues = ast[1], ast[2] for i,v in ipairs(vnames) do assert(v.tag == "Id") local vname = v[1] --print(level, "deflocal",v[1]) local parentscope = getmetatable(scope).__index parentscope[vname] = "local" end blockrecurse = 1 elseif ast.tag == "Id" then local vname = ast[1] --print(level, "ref", vname, scope[vname]) if not scope[vname] then print(string.format("undefined %s at line %d", vname, ast.line)) end elseif ast.tag == "Function" then local params = ast[1] local body = ast[2] for i,v in ipairs(params) do local vname = v[1] assert(v.tag == "Id" or v.tag == "Dots") if v.tag == "Id" then scope[vname] = "local" end end blockrecurse = 1 elseif ast.tag == "Let" then local vnames, vvalues = ast[1], ast[2] for i,v in ipairs(vnames) do local vname = v[1] local parentscope = getmetatable(scope).__index parentscope[vname] = "global" -- note: imperfect end blockrecurse = 1 elseif ast.tag == "Fornum" then local vname = ast[1][1] scope[vname] = "local" blockrecurse = 1 elseif ast.tag == "Forin" then local vnames = ast[1] for i,v in ipairs(vnames) do local vname = v[1] scope[vname] = "local" end blockrecurse = 1 end -- recurse (depth-first search through AST) for i,v in ipairs(ast) do if i ~= blockrecurse and type(v) == "table" then local scope = setmetatable({}, {__index = scope}) traverse(v, scope, level+1) end end end -- Default list of defined variables. local scope = setmetatable({}, {__index = globals}) traverse(c, scope) -- Start check.
Example:
-- test1.lua local y = 5 local function test(x) print("123",x,y,z) end local factorial function factorial(n) return n == 1 and 1 or n * factorial(n-1) end g = function(w) return w*2 end for k=1,2 do print(k) end for k,v in pairs{1,2} do print(v) end test(2) print(g(2))
Output:
$ lua lint.lua test1.lua undefined z at line 4
A much more extensive version is in LuaInspect. Another more Metalua-ish (and possibly better) Metalua implementation given by Fabien is in [1], and and even simpler one is below. See also MetaLua info.
Something similar could be down using other Lua parsers (see LuaGrammar and in particular LpegRecipes), such as Leg [2].
This piece of Metalua code uses the standard walker libraries to print a list of all global variables used in the program where it's inserted:
-{ block: require 'walk.id' -- Load scope-aware walker library -- This function lists all the free variables used in `ast' function list_globals (ast) -- Free variable names will be accumulated as keys in table `globals' local walk_cfg, globals = { id = { } }, { } function walk_cfg.id.free(v) globals[v[1]] = true end walk_id.block(walk_cfg, ast) -- accumulate global var names in the table "globals" print "Global vars used in this chunk:" for v in keys(globals) do print(" - "..v) end end -- Hook the globals lister after the generation of a chunk's AST: mlp.chunk.transformers:add(list_globals) }
"Metalint [4] is a utility that checks Lua and Metalua source files for global variables usage. Beyond checking toplevel global variables, it also checks fields in modules: for instance, it will catch typos such as taable.insert(), both also table.iinsert(). Metalint works with declaration files, which list which globals are declared, and what can be done with them...." [4]
Hybrid approaches are possible. Note that detection of global variable accesses (at least direct ones not through _G
or getfenv()
) is best done at compile time, while determination of whether those global variables are defined may best be done at run-time (or possibly, sufficiently so, at "load time", about when loadfile
is done). So, a compromise would be to split these two concerns and do them when most appropriate. Such a mixed approach is taken by the ["checkglobals" module+patch], which provides a checkglobals(f, env)
function (implemented entirely in Lua). In short, checkglobals
validates that the function f
(which by default is taken to be the calling function) uses only global variables defined in the table env
(which by default is taken to be the environment of f
). checkglobals requires a small patch to add an additional 'g'
option to the debug library's
debug.getinfo / lua_getinfo
function to list the global variable accesses lexically inside the
function f
.
See editors/IDE's under ProgramAnalysis for editors that highlight undefined variables. This can be implemented by static analysis and/or by invoking the Lua interpreter. This manner is convenient because any errors are immediately displayed in context on the screen without invoking any external build tool and browsing through its output.
A few syntax extensions have been proposed to handle undefined variables more automatically by the Lua compiler:
Here's a quick and crude solution to prevent assignment to undefined globals, in Lua 4.0:
function undefed_global(varname, newvalue) error("assignment to undefined global " .. varname) end function guard_globals() settagmethod(tag(nil), "setglobal", undefed_global) end
Once guard_globals()
has been called, any assignment to a global with a nil value will generate an error. So typically you would call guard_globals()
after you've loaded your scripts, and before you run them. For example:
SomeVariable = 0 function ClearVariable() SomeVariabl = 1 -- typo here end -- now demonstrate that we catch the typo guard_globals() ClearVariable() -- generates an error at the typo line
The "getglobal" tag method can similarly be used to catch reads of undefined globals. Also, with more code, a separate table can be used to distinguish between "defined" globals that happen to have a nil value, and "undefined" globals which have never been accessed before.