Re: New LuaJIT benchmark results (was Re: [ANN] LuaJIT-2.0.0-beta3)

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: New LuaJIT benchmark results (was Re: [ANN] LuaJIT-2.0.0-beta3)
From: Mike Pall <mikelu-1003@...>
Date: Tue, 9 Mar 2010 23:02:01 +0100

Geoff Leyland wrote:
> Given that you said
> 
> > This release includes many fixes and performance enhancements,
> > e.g. for recursive code
> 
> what's the story with the binary-trees benchmark?  Is it a GC
> thing, a recursive thing or is it a bad benchmark?

It's mainly a GC benchmark and only incidentally uses recursion.

> -joff makes it a bit slower, and collectgarbage("stop") at the
> top (for instances that fit in memory) make it a bit faster, but
> neither's a huge difference.

Since beta2 didn't compile recursion, it ran in the interpreter.
Now with beta3 it runs in (faster) native code, but the GC
overhead stayed the same. With higher N it spends 75% in the GC
and the memory allocator (and suffers from lots of cache misses).

I could tune the existing GC a bit more, but I guess I'll need to
completely redesign the GC to score well on this benchmark. But
there's one caveat: this benchmark is not necessarily a good
predictor for typical GC performance.

I have many ideas for a GC redesign, but I realize this is a
bigger undertaking, so I'm postponing it until 2.1.

> On the other hand, since the shootout appears to judge on the
> median, it's probably knucleotide you want to make faster.

Actually it's picking fannkuch as the median. And although it
looks really simple, it's the hardest to optimize of them all
(for a trace compiler).

That said, the addition of structured binary data (part of the
work on the FFI) will speed up many of the remaining outliers.
E.g. reverse-complement suffers most from the lack of a mutable
byte-buffer. It would be a trivial program and easy to compile,
if only it could use such a feature.

But I'm not just targeting the current set of shootout benchmarks.
SciMark scores would improve with typed low-level buffers, too.

And since there are no (plain) recursive benchmarks on the shootout
anymore, you can't see that beta3 gave a huge speed boost here
(showing only the x86 results):

$ time lua fib.lua 37
Fib(37): 39088169
7.320
$ time luajit-2.0.0-beta2 fib.lua 37
Fib(37): 39088169
2.044
$ time luajit-2.0.0-beta3 fib.lua 37
Fib(37): 39088169
0.368

Now it's 20x faster than Lua on the dreaded recursive fibonacci
benchmark. Similarly, ack (ackermann function) is 23x faster.
Tail-recursion is now more or less the same speed as plain loops.

--Mike

Follow-Ups:
- LuaJIT compilation problem on FreeBSD, Tony Finch
- Re: New LuaJIT benchmark results (was Re: [ANN] LuaJIT-2.0.0-beta3), Peter Cawley
- Re: New LuaJIT benchmark results (was Re: [ANN] LuaJIT-2.0.0-beta3), Geoff Leyland

References:
- [ANN] LuaJIT-2.0.0-beta3, Mike Pall
- New LuaJIT benchmark results (was Re: [ANN] LuaJIT-2.0.0-beta3), Mike Pall
- Re: New LuaJIT benchmark results (was Re: [ANN] LuaJIT-2.0.0-beta3), Geoff Leyland

Prev by Date: Re: New LuaJIT benchmark results
Next by Date: Issue with math.random on MIPSEL
Previous by thread: Re: New LuaJIT benchmark results
Next by thread: LuaJIT compilation problem on FreeBSD
Index(es):
- Date
- Thread