[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: string optimization ideas [Re: precompiled regex's to lua]
- From: Karel Tuma <ktuma@...>
- Date: Fri, 10 Nov 2006 18:51:55 +0100
hi,
On Fri, Nov 10, 2006 at 05:49:24PM +0100, Andy Stark wrote:
> Lua list <lua@bazar2.conectiva.com.br> on Friday, November 10, 2006 at
> 13:54 +0000 wrote:
> >
> >working
>
> >with strings is generally slower (except comparing and hash keying)
> >in lua because the way theyre implented
> .......
> >ideas, anyone?
>
> Ropes?
> http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf
too complex to implement - i've already tried something
very similiar using userdata and it was _way_ slower than legacy lua
strings, simply because of impossibility to do linear char-by-char
operation on which many c interfaces relies on.
however, having strings in binary search tree would allow us to play some
nice string matching tricks (while on the other hand, would make
string matching exceedingly complex sometimes, depending on the pattern),
but that's not all the fruit we need.
generally, with today heavily pipelined cpus dealing with trees is magnitudes
slower than just plain copying/comparing/hashing, so the paper is more of a theory
rather than practical use for language like lua, but similiar paradigm is
used widely for effective storage of keywords in dictionaries and compression in general.
>
> http://lua-users.org/lists/lua-l/2005-04/msg00182.html
>
> ...but the thread seemed to peter out before the subject was properly
> examined. However, this completely unrelated posting from Roberto:-
> http://lua-users.org/lists/lua-l/2006-06/msg00371.html
>
> ...states that the idea of strings as raw sequences of bytes is a part of
> the spec and not just an implementation detail.
there's no doubt about this. lua_tolstring _must_ return proper string,
that is, array of bytes, otherwise every library implementation known to real man
(read: lua programmer) breaks.
only possible hack is to provide sub-strings of already hashed ones making string
slicing operations much more efficient. however even implementing this breaks the
spec, but just a little, it's the fact that every string returned by lua_tolstring()
has to be implicitly terminated by \0, fortunatelly not much api calls and 3rd party
libraries depend on such behaviour (and if they do, its easy to fix)
i dare to say it's even a bit of inconsistency since lua strings are
technically immutable byte array, can contain \0 everywhere which may
lead to undesired effects when relying on additional \0 appended.
this is not the only issue, another example is how to implement
sub-string equality test and using it as a table key. simply convert
them to real strings at that time and proceed as usual. again, this may
have huge performance hit.
i better write some test code rather than publicly theoretize to figure out
whether its the way to go or not :)
thanks for interesting input and references
//kt