Lua Module Function Critiqued |
|
module
function [1]
has design flaws that encourage poor practices in module design,
potentially leading to code bugs and ambiguities through side-effects in global
variables, and this function should be avoided. It is the hope that
this article will further deter the use of the module
function
and that this function would be either removed or improved upon in a
future version of Lua.
(It is acknowledged that there are proponents of this view, as well as detractors and those indifferent -- e.g. in the thread [15].)
Before detailing the perils of the module
function, we'll note that the choice of whether or not to use the module
function is more than just a personal choice, but it affects other authors. It is quite easy for a Lua module author to avoid writing module
calls. Indeed, this function is never required for defining modules, as it is just a simple helper function that wraps common behaviors that themselves are
required by neither Lua nor the other much more useful parts of the Lua 5.1
module system such as require
. [*A] However, since modules often use other modules written by other authors who themselves might have used the module
function, and the module
function causes global side-effects, its effects are not entirely avoidable by choice and without modifying the implementation of those other modules. In practice, the use of the module
function is somewhat common, likely because the module
function is included in the Lua standard libraries,
presumably as a convenience and standardized best practice for module
definition, and a number of official or reputable Lua sources, such as
the Lua Reference Manual [2] and Programming in Lua (PiL) [3] encourage the
use of the module
function and even suggest it is a good one.
Therefore, new users quickly become accustomed to using the module
function.
The usual way to define a module with the module
function is like
this:
-- hello/world.lua module(..., package.seeall) local function test(n) print(n) end function test1() test(123) end function test2() test1(); test1() end
and it is used like this:
require "hello.world" require "anothermodule" hello.world.test2()
There are two main complaints presented on the module
function, which are both seen if anothermodule
is defined like this:
-- anothermodule.lua module(..., package.seeall) assert(hello.world.hello.world.print == _G.print) -- weird assert(hello ~= nil) -- where'd this come from anyway?
First, the global namespace is accessible by indexing the module table; second, hello
is visible in this module even though it was not requested by it.
The first complaint it is less inherent
to the module
function but rather due only to the
package.seeall
option. package.seeall
allows a module to see
global variables, which are normally hidden since the module
function replaces the current environment of the module with a local
one. What package.seeall
does is muck with the metatable of the
module's environment to fallback to _G
. This allows not only the
module itself it access _G
, but the variables in _G
also become part
of the module's interface. Among various things, the behavior of
exposing the global environment through the module table could be
detrimental to sandboxing (see SandBoxes), and these variables might be used
accidentally, but more glaringly it's just plain weird.
Luckily, package.seeall
is only a convenience option and can
be avoided as such:
-- hello/world.lua local _G = _G module(...) function test() _G.print(123) end
or
-- hello/world.lua local print = print module(...) function test() print(123) end
Those are a bit awkward, but there may be other more syntactically pleasing ways to avoid it, such as by recognizing that the module table and the module environment table need not be the same (e.g. see ModuleDefinition -- "Module System with Public/Private Namespaces"). We won't go into further detail on this first point.
The second points is that the module
function has the side
effect of creating global variables named in ways the programmer
doesn't fully control. On executing module("hello.world")
, the
function creates a table named "hello"
in the global environment (the
initial global environment, not the current environment set through
setfenv
), and stores the module table under the key "world"
in that
table. However, if any of those variables already exist (e.g. someone
else placed them there), the function raises and error, which at least
provides some level of safety. The behavior of the module function
can best be understood with the following representation of it in Lua
taken from LuaCompat [4] (the real
version is in
loadlib.c
).
local _LOADED = package.loaded function _G.module (modname, ...) local ns = _LOADED[modname] if type(ns) ~= "table" then ns = findtable (_G, modname) if not ns then error (string.format ("name conflict for module '%s'", modname)) end _LOADED[modname] = ns end if not ns._NAME then ns._NAME = modname ns._M = ns ns._PACKAGE = gsub (modname, "[^.]*$", "") end setfenv (2, ns) for i, f in ipairs (arg) do f (ns) end end
The problem results since we have different modules maintained by different people writing to the global environment. Furthermore, an application using those modules may be writing to the global environment as well. Due to information hiding, [5] the modules and the application should have no knowledge of the internal workings / implementation of those modules--nor, possibly, even the names of the modules those modules require. The result is that a program lacks control over which global variables get set. Various types of this problem that result from this are illustrated below.
In the following examples, we will as a convenience define modules inline rather than in separate files. For example, rather than creating two files like such
-- mymodule.lua module(...) function test() return 1+2 end -- mymodule_test.lua require "mymodule" print(mymodule.test())
we will simply write
(function() module("mymodule") function test() return 1+2 end end)(); print(mymodule.test())
Here is the first example:
(function() local require = require local print = print local module = module module("yourmodule"); (function() module("mymodule") end)() print(mymodule ~= nil) -- prints false (where is it?) end)(); print(mymodule ~= nil) -- prints true (where did this come from?)
As shown, loading modules like "mymodule" always populates the global environment rather than the current environment where the module is used. This is the reverse of what is needed. Many such module loads can fill the global environment with variables intended to be private.
Another problem is as Mark Hamburg notes [16],
putting modules into the global namespace hides dependencies. Assume your
program loads module "bar"
and loading module "bar"
also loads
module "foo"
. Now module "foo"
will also be available in the global
namespace. In your program you start using module "foo"
from the global
namespace. If module "bar"
now removes the dependency on
module "foo"
, it will also no longer be available in the global namespace
and break your program. It is not immediately apparent where foo
in the
global namespace came from, nor that it is actually a module (that used to be
a dependency of module "bar"
).
The following two examples are related to each other:
function test() return 1+2 end (function() module("mymodule", package.seeall); (function() module("test.more") -- fails: name conflict for module 'test.more' function hello() return 1+2 end end)() end)()
and
(function() module("test") function check() return true end end)(); (function() module("test.check") -- fails: name conflict for module 'test.check' function hello() return 1+2 end end)();
As seen, package names and regular variable names conflict.
The module
function does detect and raise an error if a global
variable it's overwriting already exists. That's what we want, right?
Well, this also means that it's particularly indeterminant whether
loading a module will succeed since the module may load other modules
whose names (and names of its members) we might not know and that
conflict with global variables.
As a side note, in some other languages (e.g. Perl), variables and package names are maintained in separate namespaces and so are prevented from conflicting. [*3] It's also noteworthy, that the module naming conventions affect if and how names conflict. For example, Java package names [6] are conventionally prefixed by a (unique) domain name under the author's control, which is verbose but provides a mechanism to avoid conflict. In Perl, CPAN provides a central naming registry to prevent conflicts, and modules with the same prefix indicate a common function rather than a common maintainer (e.g. "CGI" [7] and "CGI::Minimal" [8] are maintained independently by different authors, and "CGI::Minimial" is not stored inside the "CGI" table).
(function() module("mymodule", package.seeall); (function() module("test.more") function hello() return 1+2 end end)() function greet() test.more.hello() -- fails -- attempt to index global 'test' (a function value) end end)(); function test() mymodule.greet() end test()
Here, the program inadvertently overwrites a global variable set by the module function. The module function does not detect this. Rather, there is program failure (possibly a silent one) when a module that depends on this global variable attempts to access this variable.
(function() local require = require local module = module local print = print local _P = package.loaded module('yourmodule.two'); (function() module('mymodule.one') end)() print(_P['mymodule.one'] ~= nil) -- prints true end)(); local _P = package.loaded print(_P['mymodule.one'] ~= nil) -- prints true
Storing modules in the global environment is in fact somewhat redundant
since they are also stored in package.loaded
(though without
creating nested tables for the periods in the module name).
~~~
The problems above can be avoided by not using the module
function
but instead defining modules in the following simple way: [*1][*2]
-- hello/world.lua local M = {} local function test(n) print(n) end function M.test1() test(123) end function M.test2() M.test1(); M.test1() end return M
and importing modules this way:
local MT = require "hello.world" MT.test2()
Note that the public functions are clearly indicated with the M.
prefix. Unlike when using module
, the global environment is not
visible though the MT
table (i.e. MT.print == nil
), the
hello.world
table has not been exported (or polluted) to the
global environment but is rather a lexical, and modules with the same prefix
(e.g. hello.world.again
) would not alter the hello.world
table.
In the client code, the module hello.world
can be given a short
abbreviation local to that module (e.g. MT
). The approach
also works well with DetectingUndefinedVariables. This is great. The
one complaint is that public functions need to be prefixed with M.
in the module itself, but then the other solutions are often proposed
introducing their own problems and complexities, such as
package.seeall
noted above. It does not
particularly hurt to be explicit with M.
(two characters),
especially when code size gets larger.
A related note on C code: The luaL_register
[9] function in C is somewhat analogous to the module
function in Lua, so luaL_register
shares similar problems, at least when a non-NULL libname
is used. Furthermore, the luaL_newmetatable
/luaL_getmetatable
/luaL_checkudata
functions use a C string as a key into the global registry. This poses some potential for name conflicts--either because the modules were written by different people or because they are different versions of the same module loaded simultaneously. To address this, one may instead use a lightuserdata (pointer to variable of static linkage to ensure global uniqueness) for this key or store the metatable as an upvalue--either way is a bit more efficient and less error prone.
The module
function (and its ilk) may introduce more problems than it solves.
[*1] (Advocates of the above style include RiciLake, DavidManura, others who have mentioned it on IRC, MikePall [17][18][19], ... (add your name here))
[*2] There has also been the suggestion to move the standard libraries in this direction [20].
[*3] Example in Perl where modules and variables of the same name do not conflict:
package One; our $Two = 2; package One::Two; our $Three = 3; package main; print "$One::Two,$One::Two::Three" # prints 2,3
Many of the additional points below were taken from the Oct 2011 discussion on module [21][22].
With modules defined using the module
function, we can sometimes just concatenate them (cat *.lua > bundle.lua
) if it's desired to bundle them into a single file [21]. However, this does not work in the general case:
module("one", package.seeall) require "two" -- This fails unless you sort the modules according to their dependency graph -- (assuming, as is best design, it has no cycles and can be computed statically) local function foo() print 'one.foo' end function bar() foo() two.foo() end module("two", package.seeall) function foo() print 'two.foo' end -- This overwrite a previous local module("main", package.seeall) require "one" one.bar()
A general solution, which works for modules both with and without module
involves package.preload
as follows:
package.preload['one'] = function() module("one", package.seeall) require "two" local function foo() print 'one.foo' end function bar() foo() two.foo() end end package.preload['two'] = function() module("two", package.seeall) function foo() print 'two.foo' end end package.preload['main'] = function() module("main", package.seeall) require "one" one.bar() end require 'main'
A number of bundling utilities listed on the bottom of BinToCee utilize approaches like this.
One criticism placed on the "M" table style of module definition is that if a function definition in the module is changed from public to private then all references to that function must be renamed (e.g. M.foo
to foo
) [23].
function M.foo() end -- change to "local function foo() end" function M.bar() M.foo() end -- and also change "M.foo()" to "foo()"
A mitigating factor is that references to M.foo()
are localized to the current module and may typically be relatively few in number. The refactoring operation required here is also the same for when you want to rename a function, which you'll need anyway. Text editors can assist in this refactoring, and some editors with knowledge of the Lua language can also rename variables quite robustly. In some languages, e.g. Python, private variables are informally differentiated from public variables with leading underscores, so the same criticism would apply.
One technique to avoid renaming is to keep all functions local, and insert any functions that should be public into the public table right after their definition:
local function foo() end; M.foo = foo
Some performance critical code does that anyway for the small performance advantage. The triplicate use of foo
in the definition is unfortunate, and workarounds to avoid this (such as localmodule in ModuleDefinition or token filters) are likely not worth it.
Finally, note that changing a function from public (table or global variable) to private (local variable) may also require moving the function definition. local variables, unlike table or global variables, are lexically scoped, so they must be declared (or forward declared) prior to use. New users not versed in lexical scoping can be confused by this. We can avoid this by declaring all variables (public and private) uniformly with either locals (as in the example above) or table/global variables (as will be shown below). The latter can involve a Python-like technique of prefixing private variables with underscores or using two tables:
local M = {} function M._foo() print 'foo' end function M.bar() M._foo() end return M
local M = {} -- public local V = {} -- private function V.foo() print 'foo' end function M.bar() V.foo() end return M
Neither of those addresses the problem, however, of needing to replace references when a function is changed from public to/from private. We may also solve that problem by using a technique like this:
local M = {} -- public local V = setmetatable({}, {__index = M}) -- private and public function V.foo() print 'foo' end function M.bar() V.foo() end return M
Now, we can always safely change a function from public to/from private by changing only one character in the file. If we wanted to avoid some of the cruft, we could move some of it into the module loader so that modules need only be written as
function V.foo() print 'foo' end function M.bar() V.foo() end
It may not be the tersest, but the differentiation between public and private scopes (V/M) is explicit.
"Despite our “mechanisms, not policy” rule — which we have found valuable in guiding the evolution of Lua — we should have provided a precise set of policies for modules and packages earlier. The lack of a common policy for building modules and installing packages prevents different groups from sharing code and discourages the development of a community code base. Lua 5.1 provides a set of policies for modules and packages that we hope will remedy this situation." -- The Evolution of Lua, http://www.lua.org/doc/hopl.pdf
"Usually, Lua does not set policies. Instead, Lua provides mechanisms that are powerful enough for groups of developers to implement the policies that best suit them. However, this approach does not work well for modules. One of the main goals of a module system is to allow different groups to share code. The lack of a common policy impedes this sharing." -- http://www.inf.puc-rio.br/~roberto/pil2/chapter15.pdf
See also MechanismNotPolicy.
The majority of pure-Lua modules in repositories currently use the module
function:
LuaList:2011-10/msg00686.html argues that it's preferable for the name of the module to be explicitly specified in the module text so that it's clear how to load it:
module("foo.bar") -- encouraged module(...) -- discouraged -- module: foo.bar -- name in informal comment better than nothing local M = {} return M -- anonymous and likewise discouraged local M = {}; M._NAME = "foo.bar"; return M -- better than above
On the other hand, this doesn't make the package as easily relocatable.
The Lua 5.1 module
function can be used in Lua 5.0 via [LuaCompat]. Lua 5.2.0-beta has a compatibility mode, and furthermore "It is quite easy to write a 'module' function in 5.2, using the debug library. (But it will not allow multiple modules in a single file, which is a kind of hack anyway)" (Roberto, LuaList:2011-10/msg00488.html).
The "M" table style of module definition is also compatible in 5.0, 5.1, and 5.2.0-beta.
_ENV
is not supported directly in 5.1, so its use can prevent a module from remaining compatible with 5.1. Maybe you can simulate _ENV
with setfenv
and trapping gets/sets to it via __index
/__newindex
metamethods, or just avoid _ENV
.
In 5.2.0-beta, you can just continue to use the "M" table style of module definition, and there are those that recommend it [24]. On the other hand, some have suggested that in Lua 5.2, modules will be written like this, using the new _ENV
variable (which largely supplants setfenv
):
_ENV = module(...) function foo() end
Some argue against new users needing to be aware of the obscure looking _ENV
. Although that might be avoided by having someone else set up _ENV
when the chunk is loaded (e.g. by require
or the searcher function), others continue to argue that the module's private environment and public tables should not be mixed, and there is no need for _ENV
at all.
The argument in LuaModuleFunctionCritiqued was that module
has technical defects (side-effects and obscure corner cases), which hinder a core property of modularity: composability. In practice, this means that an application loading two different modules written by two different authors should not experience any surprising interactions between the two modules. On the other hand, the "M" table style (if properly used) does not have these defects, due to its very simple semantics without side-effects (formally, the module loader can often be thought of as a [pure function] in the functional programming sense).
Hisham has argued [25] that even though module
has technical defects, these can largely be fixed and they are minor compared to its success in module promoting a more standard policy for module definition (absent in 5.0). This success appears to be on a sociological rather than technical level. module
is said to have fostered code sharing and development of a community code base, and most modules in LuaRocks use module
. Moreover, the use of module
(which is a built-in keyword in some other languages) has a concern self-documenting property, announcing the intention of the code (I am a module, this is my name, and here are my public functions) with minimal boilerplate. We also all seem to agree than obscure boilerplate (setfenv
/_ENV
things) in modules is a negative for readability.
It can be argued, however, that some years after the introduction of the 5.1 module system, complaints are still heard with some frequency about the quantity, quality, and consistency of Lua modules, even for the basics like StandardLibraries. There are other facts in play, and efforts like LuaRocks are addressing some of these areas, but the more difficult question is separating out whether module
has helped or hurt and whether changes in 5.2 will help.
Given that a "standard library" like penlight [10] has removed module
calls from its implementation [26], it's not apparent this has negatively affected anyone. However, the fact that Lua 5.2.0-beta has deprecated module
, suggesting existing modules using it might no longer work without loading a compatibility function or rewriting has caused some concern [27].
In putting globals (e.g. print
) and module functions (e.g. foo.print
) in the same namespace, as package.seeall
does, the module function may overshadow a global of the same name. This is one reason some prefer to be explicit by giving module functions a unique prefix (e.g. M.print
). This may also help readability in that it's obvious that M.print
is a public variable exported from the current module and not a local, global, or imported package name. Sometimes this type of prefixing is needed anyway in other parts of the code (e.g. "self.
" or "ClassName:
") This explicitness also avoids any issues or overhead with merging the two namespaces with a metatable. This practice does, however, introduce some repetition (see "Switching between private and public" above).
A similar debate has occurred in the C++ community concerning things like "using std;
"
[11], which imports possibly conflicting names into the current namespace and therefore is safest to avoid. Moreover, in C, there's a common practice of prepending "g_
" or "s_
" to global or static variables (and similarly to "m_
" for members in C++), although intelligent IDE's can mitigate some of the need for this.
Python has an issue similar to package.seeall
[23]:
-- bar.py import logging # logging is now available as bar.logging
Perl can also have this issue:
package One; use Carp qw(carp); # carp is now available via One::carp carp("test"); 1
unless you avoid symbol imports (and instead fully qualify names):
package One; use Carp qw(); # carp is now available via One::carp Carp::carp("test"); 1
or use [namespace::clean], [namespace::autoclean], or [namespace::sweep] modules.
Python (like Lua module
) imposes a relationship between the modules foo
and foo.baz
. If your module loads foo
and another module loads foo.baz
, then baz
will then be placed inside your module. This likely accounts for why the Hitchhiker's Guide to Packaging [12] suggests that the first part of the module name ("foo.
") be globally unique, and it seems that Python packages tend to share the same prefix only if they are managed by the same entity (e.g. numerous, but not all, Zope packages/modules are under a "zope." prefix). The same guideline should apply to Lua in its current state.
Perl is not quite the same since you can have a package Foo
with variable $Foo::Baz
and another package Foo::Bar
, and these do not conflict since packages exist in their own namespace. Perl does, however, share the following issue with Lua module
:
# One.pm package One; sub test { }; 1 # Two.pm package Two; One::test(); 1 # main.pl use One; use Two; # This succeeds as written but fails if the lines are later reversed
Require globally registers modules in package.loaded
. So, regardless how modules are defined, you still have global access to loaded modules if you want it:
local L = package.loaded ..... L['foo.bar'].baz() L.foo.bar.baz() -- if require'foo'.bar == require 'foo.bar'
Doing this may be considered laborious:
local FBAR = require 'foo.bar' local FBAZ = require 'foo.baz' local FBUZ = require 'foo.buz' ...
You may just accept it, or you could find ways to simplify it:
-- foo.lua return { bar = require 'foo.bar', baz = require 'foo.baz', buz = require 'foo.buz' }
or ways to automate it:
local _G = require 'autoload' -- under appropriate definition of autoload _G.foo.bar.qux() -- note: penlight offers something like this
The latter does not have problems with missing hidden dependencies since modules are always loaded on demand if needed. On the other hand, module loading it not localized to the module loader function but rather can occur later wherever the module functions are used, which means that error detection may be delayed and usages of these functions may be more prone to fail (e.g. module loading failure due to module not installed), complicating error handling. There are ways this might be addressed though.
A concern similar to standardizing module definition is standardizing class definition (ObjectOrientedProgramming). Moreover, some modules are also classes.
As an example, ratchet's code [13] does something like this:
..... module("ratchet.http.server", package.seeall) local class = getfenv() -- why not _M? __index = class function new(socket, from, handlers, send_size) local self = {} setmetatable(self, class) ..... return self end function handle(self) ..... end
If modules should utilize module
, then we should ask if this is how class modules should be defined.
The lua -l
command line switch is not useful if it doesn't have some side-effect. With M
-style modules in 5.1, -l
will at least create a variable in package.loaded
, but accessing that is long-winded, and the purpose of -l
is typically for short-hand on invoking the interpreter (after all, the same can be achieved via -e "require....."
). In 5.2.0-beta, -l
will create a global table using lua_pushglobaltable
even for M
-style modules. The global variable created via lua_pushglobaltable
may be longer than desired though, and you might instead want the effect of -e 'FB = "require 'mypackage.foo.bar'"
. See the discussion LuaList:2011-11/msg00016.html .
If -l
is used to create a global variable, should this be added to _G
? or somehow limited to just the main program chunk (e.g. the effect of -lfoo
would be to add local foo = require 'foo'
or _ENV = setmetatable({foo = require 'foo'}, {__index = _G})
to the top of the main chunk)? The latter is cleaner.
There was also a suggestion that -l
should accept parameters [14][19].
Modules defined with "M" tables, at least without metatables, even though they may have some variation in form, can be statically analyzed from first principles (e.g. behavior of lexical variables and tables), as LuaInspect does [28].
The 5.1 module
function has more complicated semantics (side-effects and metatable behavior). Nevertheless, we can still infer meaning on a higher level, particularly if conventions are followed, as tools like [LuaDoc] have done. Changes in 5.2 and suggestions for improving the module
function may also affect this area.
Some proposals for making module
better rather than tossing it out are in the thread http://lua-users.org/lists/lua-l/2011-10/threads.html#00481 . (TODO: post best recommendations here)
Not using the module function means that by omitting the local keyword, it could be very easy to pollute the global environment (which is bad, that's the purpose of that article). So we can improve the module function by changing the environment to something private (that can inherit from _G) and define in it the _M table (as now) that will contains the module public interface. I also was concerned about these issues, and there is a tricky way to use the module function and not clutter the global environment. There it is:
package.loaded[...]={} module(...) -- you might want to add package.seeall
However this is not a solution to solve the global namespace being accessed through the module. For that we would need a modified module function. Hopefully in the next Lua release.
-- mod.lua local _E = setmetatable({}, {__index=_G}) local _M = {} package.loaded[...] = _M module(...) _E.setfenv(1, _E) function _M.test() return math.sqrt(9) end test2 = 1 --modtest.lua local m = require "mod" assert(not mod) assert(m.test() == 3) assert(not test) assert(not test2) assert(not m.print) print 'done' $ luac -p -l mod.lua | lua /usr/local/lua-5.1.3/test/globals.lua setmetatable 1 _G 1 package 3 module 4 test2 9* math 7