[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: New LuaJIT benchmark results (was Re: [ANN] LuaJIT-2.0.0-beta3)
- From: Mike Pall <mikelu-1003@...>
- Date: Tue, 9 Mar 2010 23:02:01 +0100
Geoff Leyland wrote:
> Given that you said
>
> > This release includes many fixes and performance enhancements,
> > e.g. for recursive code
>
> what's the story with the binary-trees benchmark? Is it a GC
> thing, a recursive thing or is it a bad benchmark?
It's mainly a GC benchmark and only incidentally uses recursion.
> -joff makes it a bit slower, and collectgarbage("stop") at the
> top (for instances that fit in memory) make it a bit faster, but
> neither's a huge difference.
Since beta2 didn't compile recursion, it ran in the interpreter.
Now with beta3 it runs in (faster) native code, but the GC
overhead stayed the same. With higher N it spends 75% in the GC
and the memory allocator (and suffers from lots of cache misses).
I could tune the existing GC a bit more, but I guess I'll need to
completely redesign the GC to score well on this benchmark. But
there's one caveat: this benchmark is not necessarily a good
predictor for typical GC performance.
I have many ideas for a GC redesign, but I realize this is a
bigger undertaking, so I'm postponing it until 2.1.
> On the other hand, since the shootout appears to judge on the
> median, it's probably knucleotide you want to make faster.
Actually it's picking fannkuch as the median. And although it
looks really simple, it's the hardest to optimize of them all
(for a trace compiler).
That said, the addition of structured binary data (part of the
work on the FFI) will speed up many of the remaining outliers.
E.g. reverse-complement suffers most from the lack of a mutable
byte-buffer. It would be a trivial program and easy to compile,
if only it could use such a feature.
But I'm not just targeting the current set of shootout benchmarks.
SciMark scores would improve with typed low-level buffers, too.
And since there are no (plain) recursive benchmarks on the shootout
anymore, you can't see that beta3 gave a huge speed boost here
(showing only the x86 results):
$ time lua fib.lua 37
Fib(37): 39088169
7.320
$ time luajit-2.0.0-beta2 fib.lua 37
Fib(37): 39088169
2.044
$ time luajit-2.0.0-beta3 fib.lua 37
Fib(37): 39088169
0.368
Now it's 20x faster than Lua on the dreaded recursive fibonacci
benchmark. Similarly, ack (ackermann function) is 23x faster.
Tail-recursion is now more or less the same speed as plain loops.
--Mike