[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lua as a data description language
- From: Juergen Fuhrmann <fuhrmann@...>
- Date: Fri, 04 Jun 2004 11:30:43 +0200 (CEST)
Hi,
the bitching part of Lua is even worse...
Imagine you want to read in one huge array of doubles which you will
need in a C Array The best you can do is the following
Define
function load_array(lua_array)
local n=table.getn(lua_array)
local c_array=c_array_create(n)
local i
for i=1,n do c_array_set(c_array,i,lua_array[i]) end
return c_array
end
This uses C code bound to Lua (e.g. using tolua).
c_array * c_array_create(int n);
void c_array_set(c_array *a, int i, double val);
Now you can write in the data file
a=load_array{1,2,3,5,12.33}
... lua code using a
But what happens when this file is read by Lua ?
1) The whole file is loaded into memory (?)
2) The file is translated to byte code doubling all the data
3) The bytecode is executed. Only then data gets where it is needed.
So all data enter the memory three (?, at least two) times while it is
needed only once. We speak here about 10^6 ... 10^7 values.
What about binary data in strings ? IMHO the proposed ascii representation
is much too long for this case. One could go with base64, though. But this
has IMHO considerable decoding overhead.
My workaround so far is a mechanism wich subdivides input files
into chunks separated by $ characters.
So the example above would be
a=c_array_create(5)
Data{a}
$
1
2
3
5
12.33
$
... lua code using a
[EOF]
When executed, the first chunk is loaded, byte compiled, executed.
The data statement internally tells how to parse the next chunk, and
where to put the data. Then, the middle chunk is parsed by another
parser (written by hand...) directly transferring the data into the C
array. The last chunk is again handled by Lua. Lua5 perfectly
supports this chunk handling. For Lua4 I published the lua_dolines
patch on the wiki.
To handle binary data, I do the following
a=c_array_create(5)
Data{a, encoding="native"}
$
/=)EPEJDPDJP°D!"
$
... lua code using a
[EOF]
where /=)EPEJDPDJP°D!" is _pure_ (not base64, but xdr) encoded binary
data. It is read directly read in by fread() without any overhead.
If you want portable binary files, you can use xdr encoding instead of
native. One could imagine base64 as well.
You also can write
Data{a, encoding="native", file="f", pos=12334}
Then data is taken from another file by the very same mechanism.
In reality, data sets a linehandler or a binhandler used to handle
the next chunk, which can be written in C or Lua.
[[ ]] strings instead of these chunks would be stored in memory,thus
doubling the needed data space.
Please note that I researched XML for these topics, it gives no better
solution because you are left alone with pure ascii data chunks. Pure
binary (not base64) is even impossible.
Matlab and co IMHO have slow parsers. Some communities speak CDF and
HDF which IMHO are incredibly bloated and intransparent. I don't know
about perl, python and ruby as I _love_ Lua.
While I see my approach more as a workaround than as a solution, I
really think that Lua could win from being able to handle huge data
without bloat. My code is part of a larger system. Time permitting, I
could try to cut out the basics and to make them available.
Juergen
Juergen Fuhrmann
__ __ __ __ Numerical Mathematics & Scientific Computing
|W |I |A |S Weierstrass Institute for Applied Analysis and Stochastics
Mohrenstr. 39 10117 Berlin fon:+49 30 20372560 fax:+49 30 2044975
http://www.wias-berlin.de/~fuhrmann mailto:fuhrmann@wias-berlin.de