|
Le 25 Sep 2006 à 04:19, Glenn Maynard a écrit :
The only other thing I can think of is the pentium cache alignment issue; I don't think that could be happening here because you're not doing any arithmetic, but in case it is, you might want to check by doing the test reffing k things before you start the loop, for k ranging from 0 to 5, and see if there are particular values of k which cause slowdowns. (There was a change to storage format of tables between 5.0 and 5.1, which causes the alignment problem to show up for different indices, although it always shows up every sixth element in a table or stack.)Ick, that was it:0.65user 0.00system 0:00.66elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0.22user 0.00system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)kThat's pretty serious; it's a heisenbug generator, making code randomlyslow. I assume there's no known good fix (or it'd be used); are thereany tradeoff fixes that will at least eliminate the unpredictability? I can live with a bit of memory waste and reduced cache efficiency to avoid this (at least on x86, which have the memory and large caches to cope withit).
Umm, ouch.I remember making the reverse of this change, going from 8-byte alignment to 4-byte alignment of doubles on a PowerPC platform. The _architecture_ says 4-byte aligned doubles may not work, but they worked just fine on the silicon we were using with only a 1-cycle penalty when crossing a cache line boundary. We saved space by going to 4-byte alignment, and the (cache) space saved meant we went faster overall.
drj