lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



Le 25 Sep 2006 à 04:19, Glenn Maynard a écrit :


The only other thing I can think of is the pentium cache alignment
issue;
I don't think that could be happening here because you're not doing
any arithmetic, but in case it is, you might want to check by doing
the test reffing k things before you start the loop, for k ranging
from 0 to 5, and see if there are particular values of k which
cause slowdowns. (There was a change to storage format of tables
between 5.0 and 5.1, which causes the alignment problem to show
up for different indices, although it always shows up every sixth
element in a table or stack.)

Ick, that was it:

0.65user 0.00system 0:00.66elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0.22user 0.00system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k

That's pretty serious; it's a heisenbug generator, making code randomly
slow.  I assume there's no known good fix (or it'd be used); are there
any tradeoff fixes that will at least eliminate the unpredictability? I can live with a bit of memory waste and reduced cache efficiency to avoid this (at least on x86, which have the memory and large caches to cope with
it).

Umm, ouch.

I remember making the reverse of this change, going from 8-byte alignment to 4-byte alignment of doubles on a PowerPC platform. The _architecture_ says 4-byte aligned doubles may not work, but they worked just fine on the silicon we were using with only a 1-cycle penalty when crossing a cache line boundary. We saved space by going to 4-byte alignment, and the (cache) space saved meant we went faster overall.

drj