|
2011/2/23 Francesco Abbate <francesco.bbt@gmail.com>: > luajit2/git HEAD / array impl / 0m0.296s > luajit2/git HEAD / unroll impl / 0m0.109s > luajit2/ beta6 / array impl / 0m10.860s > luajit2/ beta6 / unroll impl / 0m0.109s > C (GSL) / C opt(*) / array impl / 0m0.206s > > (*) CFLAGS="-O2 -march=native -mfpmath=sse" > > The difference between git HEAD and 2.0-beta6 is huge (~ 100x) > (compiled trace vs interpreted code I guess). Could you tell us more > about what you have done in LuaJIT2 ? Mike, sorry to bother you again but I've made a very small change in the code and I get again poor results (10.86sec instead of 0.296sec) :( What I've done is to not explicitely unroll the last loop to calculate the error, you will find the modified code in attachment. The reason to not explicitely unroll it is that in the dimension of the ODE system is very big it is better to avoid to unroll a huge number of lines in the code. For the other side I'm not able for the moment to vectorize the code because it does involve arithmetic on absolute values which is outside of standard BLAS operations. For me there is something problematic there because the addition of a one small loop can complerely spoil the results in term of execution speed. Do you think this problem could be eventually fixed in LuaJIT2 ? For the moment the only idea that I have for this problem is to write a C routine that execute the specific operation that I need but this is a sort of defeat... we want to write everything in Lua+FFI :-) -- Francesco
Attachment:
rkf45vec-v2.lua.in
Description: Binary data
Attachment:
rkf45vec-v2-out.lua
Description: application/binary