[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [ANN] lglob 0.8 Extended Globals Checker for Lua
- From: Philipp Janda <siffiejoe@...>
- Date: Tue, 30 Apr 2013 16:15:52 +0200
Am 30.04.2013 08:48 schröbte steve donovan:
On Tue, Apr 30, 2013 at 1:52 AM, Philipp Janda <siffiejoe@gmx.net> wrote:
I propose the following definition of "globals" in the context of static
global checkers:
* Any access to a chunk's _ENV upvalue (not a local variable) is a
globals access, unless the chunk itself or any function sharing the same
_ENV upvalue potentially assigns to the _ENV upvalue.
* Any access to a functions _ENV upvalue (not a local variable) is a
globals access, if the _ENV upvalue of the chunk was the only _ENV in scope
during the functions definition *and* unless any function sharing the same
_ENV upvalue potentially assigns to the _ENV upvalue.
* Anything else not covered above is not a globals access.
OK, I had to read that several times ;)
I'm sorry. I'll try to explain my thinking:
IMO there are three use cases for _ENV:
1) standard access to globals in regular Lua code
2) access to globals in sandboxed Lua code (reduced/slightly modified
set of default globals)
3) a Lua dialect where you want to use Lua's syntax for a domain
specific language.
For 1) the globals checker is most useful out of the box, because it can
catch typos of predefined globals, and it can assume that no metatable
magic is at work (or that at least it is compatible with the usual Lua
semantics), so you can match reads and writes, etc.
The sandbox case 2) is similar to 1) except you have to supply a
different list of predefined globals. The point here is, that sandboxes
usually load code via one of the load* functions, so no lexical _ENV
tampering takes place. And sandboxed code is stored in a separate chunk,
so you *can* change the list of predefined globals via a commandline switch.
The third case is the most interesting, because here the above rules
come into play. You use _ENV to specify a program in a DSL, like e.g. an
LPeg grammar[1], meaning that you probably provide a completely
different set of predefined globals, and/or that you catch global
accesses via a metatable. In that scenario the usual rules like
"anything that you read you must write before" often don't apply, and
the usual Lua library is not available as globals (that would interfere
with the metatable magic, and it probably isn't that useful for the DSL
anyway). Additionally such code is usually embedded in a chunk of normal
Lua code, so you have no (easy) way of telling the globals checker about
the different set of globals. In short: For such "dialects" a globals
checker probably produces more false hits than it catches actual typos,
so it should leave those dialects alone and concentrate on the
surrounding regular Lua code.
The rules above are for figuring out which parts of the code are
considered regular Lua code, and for which parts a customized _ENV has
been set lexically (-> "dialect"), so that all bets are off anyway.
[1]: http://siffiejoe.github.io/lua-luaepnf/#Basic_Usage
Right, globals are usually upvalue references to the special symbol _ENV.
It's not guaranteed that this upvalue actually points to _G, of course,
Right. My assumption is that you mostly apply a globals checker to a Lua
file which contains regular Lua code at the topmost level. If not you
can always *not* use the globals checker or supply a different list of
predefined global names for this file.
and
_ENV may not be an upvalue if defined as a local (look at code for
That's the point: If _ENV is a local, the programmer has modified the
environment (typically to add/remove/replace/collect globals), which
means that some embedded Lua dialect is in effect, and the globals
checker should react to this (by shutting up, at least for the
predefined globals).
print(boo()) here)
local print = print
Here the _ENV in _ENV.print refers to the chunks's upvalue (AFAICT from
the code snippet), so the globals checker should check `print`.
local _ENV = {X = 'hoo'}
function boo() return X end
Here, a local _ENV is in effect, so the globals checker should *not*
report a write to `boo`. Since a local _ENV (not the chunk's upvalue)
was in scope during the definition of `boo`, the globals checker should
not report access to X either.
print(boo())
Local _ENV still in effect, so don't report access to `boo` here as well.
Static checkers _could_ be taught to handle this case, but in general _ENV
might be assigned to something dynamically.
But you can check if there is any assignment to the _ENV upvalue, which
for some circumstances would change the environment for all functions
that share the same upvalue, and you must assume (ignoring a programmer
error), that all those functions might use the globals in this changed
environment. So the Lua code in those functions is a Lua "dialect", and
the default list of globals is incomplete or wrong. You are right that
you cannot detect with certainty if two _ENV accesses actually use the
same _ENV value (at least not statically), but IMHO that does'nt matter
because the usual common rules like "write before read" don't apply anyway.
[...]
So we're going beyond plain 'global' access here. Just finding globals is
fine and dandy, but David M's insight was that we could track _fields_ of
known globals as well. Further, lglob tracks _aliases_ to known globals
and imported modules.
My rules above are only concerned with globals accesses.
lglob does get plain module() right (as its 5.2 friend '_ENV={}') because
it regards everything after module() or _ENV as a separate scope, and then
tracks accesses in that scope specially. So 'Answer' here is considered a
problem:
_ENV = {} -- or spell it 'module(...)' ;)
function answer() return 42 end
function life() return Answer() end
return _ENV
Module code using `module` (or _ENV) is a special variant of the
"dialect" case: You have metatable magic (if using package.seeall) or a
completely different (empty) set of global library functions (if not
using package.seeall). And you have fields you never write to (like _M,
_PACKAGE, _NAME, etc.). The only thing is that `module`-using module
code is still mostly regular Lua code, so it feels like you could almost
support it in a globals checker ...
Now PA's case involves tracking multiple scopes. This is a silly example,
but it shows the issue.
local function private_business(val)
local _ENV = {}
X = val + 1
Y = val - 1
return X + Y
end
Again, this can be done, by tracking the scope of local _ENV in functions,
but it seemed a lot of work for a case I did not particularly find
interesting.
I would actually consider this a misuse of _ENV, and I agree that Lua
"dialects" are rare, but they will come up from time to time.
Applying my rules would result in the globals checker shutting up for
this private_business function (and for any chunk using `module` or
`_ENV = {}`), thus reducing the number of false hits.
And as for Tim Hill's point - yes, Lua is the best parser, and that's
exactly why we're using the output of the Lua compiler for checking.
The way I understood it, Tim Hill wants to detect free variable accesses
(as opposed to _ENV field accesses), and IMO this just doesn't make a
difference. The current luac and lbci are good enough by catching _ENV
accesses.
steve d.
Philipp
- References:
- [ANN] lglob 0.8 Extended Globals Checker for Lua, steve donovan
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, Luiz Henrique de Figueiredo
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, steve donovan
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, Petite Abeille
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, steve donovan
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, Petite Abeille
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, Luiz Henrique de Figueiredo
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, Petite Abeille
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, Luiz Henrique de Figueiredo
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, Petite Abeille
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, Philipp Janda
- Re: [ANN] lglob 0.8 Extended Globals Checker for Lua, steve donovan