[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LuaJIT - Is ffi.alloca possible?
- From: Mike Pall <mikelu-1103@...>
- Date: Wed, 30 Mar 2011 01:29:08 +0200
Josh Haberman wrote:
> Mike Pall <mikelu-1103 <at> mike.de> writes:
> > Recent JVMs do
> > escape analysis. But from what I hear, this is not such a big win,
> > except for trivial examples.
>
> Someone was claiming on HN a few days ago that escape analysis is a
> major win for PyPy.
PyPy has a relatively slow GC, whereas JVM has one of the fastest.
Also, PyPy uses a boxed representation for almost everything,
including doubles. Eliminating the boxing/unboxing overhead for
basic types is crucial for performance. Java only has to box
arrays and classes (most of the time).
> There was speculation that this isn't as
> important for LuaJIT thanks to its representation that does not
> require numbers to be boxed:
> http://news.ycombinator.com/item?id=2373591
That's correct. Tables and cdata are the only boxes. But Lua
programmers already know that it's important to avoid table
allocations. Now, with cdata gaining some popularity, there's more
demand for eliminating the boxing/unboxing overhead.
Note that I'm already using a common trick (a combined allocation
plus store IR node) to eliminate many cdata boxes. But that
doesn't work for all of them.
> In that same thread someone claimed that you are planning to
> implement escape analysis -- not sure where they got that info. :)
That's correct, too. Notice the difference between an 'analysis'
and an 'optimization' in compiler terminology. Escape analysis is
a requirement for various optimizations:
1) Eliminating stack allocations in C/C++ is done with an
optimization called "Scalar Replacement of Aggregates" (SRA). This
needs alias analysis and escape analysis.
2) Replacing a heap allocation with a stack allocation in Java
requires escape analysis. This is only moderately successful,
since the object often escapes into uncommon paths. Also, it's not
that important for the JVM, since a fast bump allocator needs more
or less the same number of operations as a stack allocation.
3) Allocation sinking requires store forwarding and store sinking.
It needs alias analysis and escape analysis. This is what I'm
planning for LuaJIT. Escape analysis tells us where an allocation
escapes to. If the allocated object is stored somewhere, this
usually means the allocation cannot be eliminated. But if it only
escapes to side exits (e.g. in a local variable), then we can do
better:
Basically an allocation is sunk (moved) to all side exits of a
trace where it's value escapes to. This can be done virtually with
a mark -- there's no need to actually add code for the allocation
anywhere. If the trace never leaves through such an exit, then the
allocation is never performed. In case such a side trace is
eventually compiled, the allocation is merged into the new trace.
This allows us to do the same optimization on this trace and sink
the allocation to a side trace of the side trace. And so on.
Together this effectively provides the effects of SRA. And it
works for a lot more cases than the JVM approach.
Short example:
-- Input 'a'
local t = {} -- Allocation!
t[1] = a
if unlikely_condition_1 then ... 't' escapes here ... end
if unlikely_condition_2 then ... 't' escapes here ... end
local b = t[1]
-- Output 'b' (let's assume 't' is dead here)
We want to optimize it into this:
-- Input 'a'
if unlikely_condition_1 then local t = {}; t[1] = a; ... end
if unlikely_condition_2 then local t = {}; t[1] = a; ... end
local b = a
-- Output 'b'
Here's the data flow of a (simplified) trace for the common path
and its transformation:
a a a
| |\ |
V V \ |
ALLOC->STORE ALLOC->STORE | |
| | / | ____________
| | / |/ V
==t==========> ==t==========> ==X=======*==> ALLOC->STORE side exit #1
| | | | _____________
| | | |/ V
==t==========> ==t==========> ==X=======*==> ALLOC->STORE side exit #2
| | |
t--->LOAD | |
| | |
V V V
b b b
(1) ----------> (2) ----------> (3)
alias analysis escape analysis
store forwarding store sinking
dead-code elim. allocation sinking
--Mike