lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi all,

I have a proposition to enhance the Lua language that can be
interesting to address some shortcomings of the language to be used in
the field of numerical computations.

Let me explain the problem. In Lua you can have two type of variables
local and global variables. The former, local variable are very fast
because are allocated in the Lua stack they does not require heap
memory allocation. For global variable it is a slightly different
affair because they are stored as a key/value pair is a table of
globals so accessing them aways imply a table lookup. This is the
reason why using global is inherently less efficient of local
variable.

It is therefore quite clear that we can compare local variables in Lua
to automatic variables in C that are allocated in the stack because:
- their scope is local like in Lua
- since they are stored in the stack no heap memory allocation is needed (fast)
but there is an important difference between Lua local variables and C
automatic variables since these latters can have any size and you can
store in the stack even a whole struct that comprises several values.
Here a C example of what I mean:

struct foo {
  double x;
  int a, b;
  const char *p;
};

int my_function(const char *str)
{
  /* the variable below is a whole structure allocated on stack */
  struct foo a = {3.14, 2, 3, str};

  /* access to a's member values is fast and can be fully optimized
     by the compiler */
  do_something(a.x, a.p);
}

In Lua is different because local variables can only be elementary
types like number (normally a double), string or objects (pointers).
These latter object are stored in the Lua stack as pointers and their
content is always allocated in the heap. So if I want to write the
equivalent of the code above in Lua I could write:

function my_function(str)
   local a = {x= 3.14, a=2, b=3, p= str}
   do_something(a.x, a.p)
end

The obvious difference is that with the code above there will be un
object created on the fly and allocated on the heap. This pattern can
affect adversely the performance because the JIT compiler may not be
able to eliminate the heap-allocation. If the objects are allocated on
the heap this can create a lot of objects that will further slow down
the execution because of the garbage collection. Note that in the case
of userdata we will have exactly the same problem and it will be even
worst because in this case the JIT will have no chance to eliminate
the memory allocation or optimize the access to his members.

For the sake of clarity I will give a small example that illustrate
how serious the problem can be, the case of complex number
calculation. Let us suppose than we want to implement the mathematical
function 'z -> a * exp(i*z) + b' where a, b and z are complex number
and 'i' is the imaginary unit. In Lua we can implement complex number
either as:
- userdata objects (see LHF lcomplex implementation)
- table with two fields for real and imaginary tables
In either case we will need to setup a metatable to define the
arithmetic operations for the complex numbers.

Let us suppose that all this configuration has been prepared and we
can create a complex number with:

z = complex.new(2,3)

then the function could be implemented like the following one:

local i = complex.i
function f(a, b, z)
   return a * exp(i*z) + b
end

where we suppose that 'exp' have been suitably defined to accept
complex values. The problem of the function above is that it looks
clear and simple but in reality it is a disaster in term of run-time
execution because a lot of small objects have to be created on the
heap every time that it is executed. This is acceptable for a toy
language or for a script language that is used only for glue over some
low-level C routines or for configuration purpose but if you want to
do serious work with it then the execution time will be a disaster.
Also, if the complex number is a userdata it will be even worst
because the Lua/C transition will prevent a lot of optimizations that
the JIT could otherwise perform.

THE PROPOSITION:

So here we came to my proposition: we should allow to declare a type
of vector local variables that encompass multiple values like in the
following:

local a[2]

The meaning is 'allocate two local variables but associate them to a
single symbol 'a'. This modification should be integrated in the
parser and no VM modifications are needed.

Then we need to further define the semantic of this new kind of vector
variables. Let us try to cover all the cases.

ASSIGNMENT

example:
-- we declare two local variables each one with two components
local a[2], b[2]

-- be is initialized with two values
b = 2, 3

-- the following means two assignements!
a = b

The example above should exactly equivalent to:
local a1, a2, b1, b2
b1, b2 = 2, 3
a1, a2 = b1, b2

What we mean with the example is that when a vector variable appears
in the left side of an assignment it is like if there were multiple
comma separated scalar values of the left side. For the other side,
when a vector variable appears on the right side it does evaluate to
multiple values very much like a function call f() that returns
multiple values. This should completely clarify the semantic for
assignment. We remark also that no VM modifications are needed.

Then the other cases are parameter passing for function call and table
assignment. For the function call the same logic of above applies,
when we call a function with a vector argument the argument is
expanded to multiple values and these are all passed to the functions.
For the function declarations it is simple: the function does not need
to know that we give them a vector value because it does just see that
multiple scalar values are passed. Then we can also declare in the
function prototype that some arguments are vectors: in this case the
same logic of assignment applies, a vector value corresponds to
multiple scalar values.

Example:

local function f(x, y)
   return math.sqrt(x*x + y*y)
end

local function g(x[2])
   -- we access to each component of x
   -- NB: the parser *know* that this is not a table lookup, no need to have
  --      a specific syntax
   return math.sqrt(x[1]^2 + x[2]^2)
end

local a[2]
a = 2, 3

-- the function f will be called with two scalar arguments
print(f(a))

-- the function g will be called like f, the only difference that in
-- its declaration it will be aware of the vector nature of its first argument
print(g(a))

In the case of table assignment the rules defined above applies. Example

local a[2] = 2, 3
local t = {}

-- in this case t[1] get only the value of the first component of a.
-- equivalent to: "t[1] = a1, a2" ==> a2 is discarded
t[1] = a

VECTOR ARITHMETIC

Since we have vector values we can extend naturally the vector
arithmetic. For '+' and '-' is straightforward. The operation is
effectuated for each component and multiple results are returned. The
following example should be self-explaining and not ambiguous:

local a[2], b[2], c[2]
c = a + b

For the multiplication is a different matter. In my opinion the best
option is to now allow the multiplication of two vectors except when
the dimension of both is two, in this latter case the complex
multiplication should be performed. In addition scalar by vector
multiplication should be allowed with the standard meaning of scalar
by vector multiplication.

It seems to me that the extension that I propose is coherent and the
meaning of its operations are non ambiguous. In addition we will not
need to modify the VM so there will be a good benefit because the
current VM and JIT can be both used without any modifications. The
advantage is that it will allow a very convenient pattern for fast
execution of function on small struct or vectors like complex numbers,
3d vectors or may be just small structs that pack a few values like
rgb colors like {r= 27, g= 224, b= 0}.

I hope I will have a positive feedback but in any case I will welcome
any critic, comment of remarks.

Best regards,
Francesco