lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

Jay Carlson wrote:
> From http://ftp.debian.org/debian/pool/main/l/lua50/lua50_5.0.2-5.diff.gz :
> 
> -MYCFLAGS= -O2
> +MYCFLAGS= -O3 -g

Ah! Thank you for your time pointing this out.

> -O3 is a reasonable setting.  If you know of additional flags that provide
> consistently better performance across the range of machines supported by
> Debian, please send them to this list along with the benchmark data!

Yes, -O3 is what I'm using, too. I generally use -fno-crossjumping only
for the core virtual machine loop as recommended in various places (search
for "gcc gforth" in Google Groups). This would be lvm.c in Lua and requires
adding the following two lines to src/Makefile:

lvm.o:	lvm.c
	$(CC) $(CFLAGS) -fno-crossjumping -c -o lvm.o lvm.c

(Careful: tabs required)

The gain depends a lot on how much the benchmark uses the core loop:

ackermann.lua    +8%
ary.lua          +4%
echo.lua        +11%
except.lua       +2%
heapsort.lua     +5%
methcall.lua     +2%
nestedloop.lua   +4%
prodcons.lua     +2%
random.lua       +5%
sieve.lua        +3%
wordfreq.lua     +9%

All others are more or less unchanged (<2%). Results obtained with GCC 3.3
and Lua 5.1-work2 (yes, I know the shootout uses Lua 5.0.2, but lvm.c is
similar enough). Benchmarks taken from the shootout CVS plus the updates
I posted recently.

Since -fomit-frame-pointer is against Debian policies I will mention only
in passing that this improves most tests by another 5% or more. Yes, this
is not applicable to the shootout, since all other languages are compiled
without it. But anyone who really needs the speed, could use it.

> More likely the performance difference you see between the Debian
> /usr/bin/lua50 and the one you built by hand is the result of the
> performance penalty of shared libraries.  /usr/bin/lua50 doesn't contain the
> Lua core; /usr/lib/liblua50.so.5.0 does.

Ok, that explains a lot. IMHO this is not the recommended method to build
Lua nor is it the default method. I suggest to back out this change and
revert to building the binary with static libraries.

Rationale: Lua is used very differently from (say) Perl or Python:
- Anyone embedding Lua is likely to change the core anyway and will bring
  his/her own patched Lua core (see all those games and frameworks using
  Lua as a scripting language).
- Anyone extending Lua is using the 'lua' binary only and would benefit
  from a binary built that includes the static libraries (faster loading
  and faster execution).
- The Lua core is so small that using dynamic libraries does not noticeably
  improve memory sharing or disk space sharing, anyway.

I.e. the shared libraries are pretty useless for Lua. Upgradability is not
an issue since you can just upgrade the base package and get a new 'lua'
binary which will be used by the 'extenders'. The 'embedders' cannot benefit
from a generic upgrade, anyway (e.g. many of them are still at Lua4).

> [... i386 PIC problem described ...]
> perl doesn't have this issue, as its core is statically linked into
> /usr/bin/perl and therefore can be compiled without PIC.

Dito for Python AFAIK (at least in the default build). I suggest to do
the same with Lua.

> By the way, there seems to be a large penalty in the default scoreboard for
> missing tests.

This is good to know. Thank you for the hint!

> Zeroing the weight of the missing echo client/server test
> puts Lua ahead of Perl.  (And then zeroing the weight of matrix multiply
> puts Lua ahead of Python as well....)  Luasocket is in Debian, so somebody
> should contribute an implementation.

The echo client/server tests requires fork() and wait() which are available
from the lposix library (see http://www.tecgraf.puc-rio.br/~lhf/ftp/lua/ ).
I cannot find this in Debian, but maybe I was looking in the wrong places.

The appended script requires luasocket 2.0 beta and lposix together with
the new require logic (as posted on this list recently).

I've tuned the test to the maximum possible, but refrained from shortcutting
socket method resolution (because this is an untypical practice).

The performance is not bad, but unfortunately a little bit slower than Perl
or Python. This is mainly due to the fact that Luasocket has been tuned
for socket operations with timeouts and non-blocking sockets (which is
closer to real world usage -- nobody is using an indefinitely blocking
receive, I hope).

About ackermann.lua: Pity that even my improved version fails for N=10
with a stack overflow. Otherwise the performance would be extremly good
and move Lua up the scale quite a bit. I do not know what to do about it
other than asking the test maintainers to reduce N to 9, since 4096 is
a pretty realistic callstack limit. Redefining LUA_MAXCALLS during the
build process would be the pragmatic alternative. :-/

As for the other missing tests (which seem to be new entries and have
implementations only for uncommon languages): Sorry, but I cannot figure
out what they really do. Maybe someone who knows Erlang can translate them.

Bye,
     Mike
local posix = require "posix"
local socket = require "socket"

local d = "Hello there sailor\n"

local ls = socket.bind("127.0.0.1", 0)

if posix.fork() == 0 then
  -- Child is client
  local n = tonumber((arg and arg[1]) or 1)
  local cs = socket.connect(ls:getsockname())
  for i=1,n do
    cs:send(d)
    local r, err = cs:receive(19)
    if r ~= d then
      error(r and string.format("client: %q ~= %q\n", r, d) or "client: "..err)
    end
  end
  cs:close()
  os.exit()
else
  -- Parent is server
  local ss = ls:accept()
  pcall(function(ss) repeat ss:send(ss:receive(19)) until false end, ss)
  io.write("server processed ", ss:getstats(), " bytes\n")
  posix.wait()
end