[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Opcode dispatch
- From: "Samuel Greear" <lua@...>
- Date: Thu, 1 Jan 2009 14:51:15 -0700
On Wed, Dec 31, 2008 at 9:55 PM, David Manura <dm.lua@math2.org> wrote:
> On Mon, Dec 29, 2008 at 3:24 PM, Samuel Greear wrote:
>> ...I have seen a 1.5-28% performance increase...
>> http://evilcode.net/sjg/patches/lua.patch
>
> Here's a few runtime comparisons on some of my Lua only code, with Lua
> compiled using "make linux" (gcc -O2) under Cygwin / P4.
>
> Runtimes for the pure Lua implementation of the DEFLATE/gzip
> algorithm[1] on decompressing lua-5.1.4.tar.gz with CRC checking
> enabled:
>
> unpatched: 10.107 / 10.831 / 10.112 s
> patched: 9.306 / 9.285 / 9.308 s
> (About 10% faster.)
>
> Runtimes for 50 iterations of the LuaMatrix[2] test suite:
>
> unpatched: 10.931 / 10.901 / 10.872 s
> patched: 11.283 / 11.095 / 10.881 s
> (Not conclusive.)
>
> Runtimes for StringLibraryInLua[3] performance test (perftest.lua):
>
> unpatched: 35.615 / 36.759 / 34.527 s
> patched: 36.149 / 35.362 / 33.285 s
> (Not conclusive.)
>
> [1] http://lua-users.org/wiki/ModuleCompressDeflateLua
> [2] http://lua-users.org/wiki/LuaMatrix
> [3] http://lua-users.org/wiki/StringLibraryInLua
>
David,
Thanks for the results. I am seeing that the implementation of the
dispatch_op macro may have to depend somewhat on the host cpu (and the
compiler to a much lesser extent) to maximize performance. This makes
sense, of course, considering the differences in branch predictors. I
have been testing on a recent Intel Core2Duo 1.66 and an Intel
Quad-core Xeon 2.4 to date, but was able to mimic your results fairly
closely on an Athlon XP. I don't have a P4.
http://evilcode.net/sjg/patches/lua2.patch
This version seems to mitigate most of the performance loss on my
Athlon at the expense of some of the performance gained on the Intel
chips.
Again, I have no idea how interesting this is to the Lua community at
large (probably not terribly), but it would be fairly easy to
modularize the main loop of the vm to support numerous variations with
conditional compilation.
Sam