[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: [Experiment] Module system based on static hashing
- From: Stefan <ste@...>
- Date: Thu, 9 Apr 2020 02:14:37 +0200
Hello,
I've conducted an experiment to speed up Lua state creation with static
hash tables generated by GNU gperf and got some interesting results.
Introduction
============
Lua states offer a very light-weight way to execute independent scripts,
which is a much desirable feature for programs that execute a large
number of them (e.g. web servers).
Unfortunately the standard library is too small for many tasks and
adding modules by hand is quite a hassle. Furthermore are dynamic
libraries a platform-dependent mess.
The goal of this experiment was to find a way to add many more functions
to Lua a) without using dynamic loading and b) without slowing down the
creation of new states.
luaL_openlibs loads all functions, tables and values such as print,
string, _VERSION, math.pi etc. that make up the standard library into
the Lua state so that the script can access them via table lookups.
But rarely does a script use ALL of them and the more functions get
added, the more unnecessary work luaL_openli has to do.
So, the less unused Lua values get loaded into RAM, the better.
OK, so what if we don't actually load them and just set a metatable
with a __index metamethod that fetches the values as the script needs
them? The script won't notice absent values it doesn't use -- Great!
...
But wait, where does the __index metamethod get the values from?
A static hash table that just sits idle in the code section!
How does it work?
=================
To make a static hash table, all keywords that map to Lua values
need to be known at compile time. Therefore every module needs
a description file (module.lua) that contains all keys of the module.
The modules are grouped together in a directory (lx/[library name])
and make up a library.
A script (lx/makelib.lua) reads each module's description (module.lua)
and uses gperf to generate a static hash table that maps all keys to
Lua values (or other modules). Then the C code of the module (module.c),
the static hash table and an exposed entry point (struct lx_module)
are concatenated together into a single file in the output directory
(generatedlib).
A C program can use a version of the Lua core without libraries
(lua-5.4.0-beta-nolibs) to create Lua states and then add the libraries
by setting a metatable for the global table with:
lx_set_lookup_metatable(L, lx_[base module name]);
There are three different LX libraries that can be selected in the
Makefile:
* base contains only the Lua 5.4 standard libraries
* basex additionally contains LuaFileSystem and some LHF libraries
(complex, base64, ascii85, mathx)
* heavy gets auto-generated by generate_heavy.lua and contains 200
modules with 500 functions each. This is to test how it behaves
with a large amount of ballast. (Warning GCC chews a while on it)
What are the results?
=====================
All files can be found here: https://github.com/evelance/lxlib
(I tested it on a laptop with Linux in a virtual machine, so the results
aren't terribly accurate given how hard it is to get reproducible
measurements on modern CPUs with caching, thermal throttling etc - so
just run them yourself, they only need a Linux with gperf installed.)
Usual time needed to perform following operations on 100K Lua states:
Operation ~time in seconds
luaL_newstate: 1
luaL_openlibs: 10
Load bytecode: 0.5 (small script, loop and coroutine)
Execute said script: 7
Full GC cycle: 0.8
Require single module: 40
lx_set_lookup_metatable: 0.1
lua_close: 1.5
Observations
============
* luaL_openlibs takes significant part of the total time
for small scripts
* require is really, really slow
* The execution time of the script with lx_set_lookup_metatable
is slightly longer since missing values need to be fetched
using a CFunction
* lua_states with LX need less RAM than luaL_openlibs (8KB vs 22KB)
100K states with bytecode executed + coroutine need ca. 1GB vs 2.5GB
* With LX the state setup time remains constant, even with 100.000
Lua values
* The many additional hash functions slightly increase the code size
(298KB vs 285KB)
* The startup time of Lua as a CLI remains basically 0 even with 40MB
of bloat caused by the 100.000 functions of lx/heavy
* LX modules are only initialized after they have been accessed, this
means that the string metamethods only work after string has been
accessed
* Is it a good idea to set the global table's metatable?
* There are some other quirks...
Conclusion
==========
With static hashing it is possible to add hundreds of
thousands of functions to Lua that are accessible for all
scripts/states without slowing it down at all.
It decreases the memory usage of individual scripts/states but
slightly increases the executable size.
It slows down scripts that really use most functions available but
decreases the strain on the garbage collector a bit for those that
don't.
The generated code is fully portable and does not depend on ldl.
The code generation that is neccessary anyways could be upgraded to
a script that easily builds a custom Lua with modules "à la carte".
All feedback (or more test results) welcome :)