[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LuaJIT2 performance for number crunching
- From: Francesco Abbate <francesco.bbt@...>
- Date: Sun, 13 Feb 2011 21:21:59 +0100
2011/2/13 Mike Pall <mikelu-1102@mike.de>:
> This is wrong. It anchors code at loops, but it compiles and
> optimizes _all_ kinds of hot code paths.
>
> The region selection heuristics fail for the specific example
> because of the sheer mass of way-too-short loops.
Ok, I understand that I've done an oversimplification. Yet the
heuristic implemented in LuaJIT failed to understand the part of the
code that needs to be optimized. It is also very difficult for the
user guess what LuaJIT is able to optimize well and what is not.
> I beg to differ. It's a common idiom in Lua to generate code at
> runtime. And it's quite easy, too.
>
> If someone tells you to use templates for C++ because that will
> make your code an order of magnitude faster -- do you tell him
> that you're too lazy to implement that or would you rather take
> that order of magnitude speedup?
hmmm... I understand but I'm not convinced. An optimizing C of Fortran
compiler does not make this sort of jokes for correctly written
algorithms. For me it would be difficult to explain to other people
why they shouldn't write small nested loops like is made in the RK4
algorithm.
>> Well, this is what I've done but it was not working!!! :-)
>
> Because you didn't try hard enough. Naive translations of programs
> written in other languages almost never perform well. An idiomatic
> rewrite in Lua would look quite different. It could also take
> advantage of multiple-return values and other features the
> language has to offer.
>
> I showed you how to make it even faster than the C version. It's
> your decision, whether you'll heed that advice or not.
No, you didn't show me anything because your solution is not
acceptable. The algorithm needs to work for ODE system of arbtrary
dimension N and your solution was only for the case N=2.
> Your assumptions are wrong, so your conclusion is wrong. LuaJIT is
> quite competitive on SciMark, which basically consists solely of
> nested loops.
I assume that the Lua translations of the RK4 algorithm was perfectly
fine as it was the original C code and LuaJIT2 failed completely to
optimize the implemented algorithm. For me this means that LuaJIT2
have some major limitations for implementing numerical algorithms.
I know perfectly well the performance of LuaJIT2 in the SciMark2
benchmarks and also at the computer language game. For examples I know
about the spectral-norm test where LuaJIT2 rivals with Fortran
optimized code. But you have to be cautious with the all these
benchmarks because they test the compiler for relatively simple
algorithms. For example the FFT algorithm in SciMark2 takes just one
screen page of code. If I wrote the test with the RK4 is because I was
willing to test LuaJIT2 with a more complicated algorithms from a real
library of numerical routines.
For me it would be more interesting to know why the heuristic
algorithm failed with the RK4 algorithm and if this could be
eventually improved.
--
Francesco