--- Begin Message ---
- Subject: Re: Lua 5.0 to 5.1 performance regression?
- From: Glenn Maynard <glenn@...>
- Date: Sun, 24 Sep 2006 23:05:09 -0400
(Did you mean to mail me privately? If not, feel free to reply back to
the list.)
On Sun, Sep 24, 2006 at 09:20:02PM -0500, Rici Lake wrote:
> >>> iFreeListRef = ref;
> >>> if(ref == iMaxReference)
> >>> --iMaxReference;
> >>
> >>I think that should be:
> >>
> >> if (ref == iMaxReference)
> >> --iMaxReference;
> >> else
> >> iFreeListRef = ref;
> >>
> >>Otherwise, you'll give out the same references twice.
> >
> >The logic is the same as in lauxlib; it's just forming a linked list
> >of indexes, where iFreeListRef is the head.
>
> Yes, but lauxlib doesn't try to do that optimization (applicable if you
> are mostly using references as a stack.)
I think iMaxReference should actually never be decremented, since freed
reference indexes are never actually forgotton completely; they're always
either used or in the free list. (I guess you turned it into an
optimization that does do that, which I suppose works too.)
(Better to just keep a native array of available indexes, anyway.)
> True, but it may be able to do better than the code generated by
> a cast. Now that I think about it, it probably only applies to
> float->int
> coercions (so I should have said to use lua_tointeger).
True; lrintf() would be faster. I think the difference is too small to
matter here (we're looking at per-loop timings in microseconds, not
nanoseconds).
> The only other thing I can think of is the pentium cache alignment
> issue;
> I don't think that could be happening here because you're not doing
> any arithmetic, but in case it is, you might want to check by doing
> the test reffing k things before you start the loop, for k ranging
> from 0 to 5, and see if there are particular values of k which
> cause slowdowns. (There was a change to storage format of tables
> between 5.0 and 5.1, which causes the alignment problem to show
> up for different indices, although it always shows up every sixth
> element in a table or stack.)
Ick, that was it:
0.65user 0.00system 0:00.66elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0.22user 0.00system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
That's pretty serious; it's a heisenbug generator, making code randomly
slow. I assume there's no known good fix (or it'd be used); are there
any tradeoff fixes that will at least eliminate the unpredictability? I
can live with a bit of memory waste and reduced cache efficiency to avoid
this (at least on x86, which have the memory and large caches to cope with
it).
--
Glenn Maynard
--- End Message ---