[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [Benchmark] Chain calls
- From: "Alexander Gladysh" <agladysh@...>
- Date: Mon, 24 Nov 2008 23:13:38 +0300
On Mon, Nov 17, 2008 at 3:38 PM, Peter Cawley <lua@corsix.org> wrote:
> It may be worth looking at the generated Lua opcodes for these benchmarks in
> order to easier see the differences in what is happening in each. For
> example, return true v.s return nil are loadbool,return vs. loadnil,return.
> Then looking at the VM code for these operations, either in C or as the
> assembled output of the C, might make it clearer. Of course, this won't help
> with explaining the luajit results, as it skips the VM when JITing.
Sorry for the late reply.
Opcode listing (via luac -l -l) is indeed very helpful. Chaining calls
use less resources, since they do not require extra MOVE opcodes:
local function chain_local()
local chain = chain
chain () () () () () () () () () () -- 10 calls
end
function <chaincallbench2.lua:9,12> (13 instructions, 52 bytes at 0x100fb0)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
1 [10] GETUPVAL 0 0 ; chain
2 [11] MOVE 1 0
3 [11] CALL 1 1 2
4 [11] CALL 1 1 2
5 [11] CALL 1 1 2
6 [11] CALL 1 1 2
7 [11] CALL 1 1 2
8 [11] CALL 1 1 2
9 [11] CALL 1 1 2
10 [11] CALL 1 1 2
11 [11] CALL 1 1 2
12 [11] CALL 1 1 1
13 [12] RETURN 0 1
Whereas plain_local and plain_chain_local both require MOVEs to get
function to call:
local function plain_local()
local plain = plain
plain ()
...
plain () -- 10 calls
end
local function plain_chain_local()
local chain = chain
chain ()
...
chain () -- 10 calls
end
function <chaincallbench2.lua:14,26> (22 instructions, 88 bytes at 0x101190)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
1 [15] GETUPVAL 0 0 ; plain
2 [16] MOVE 1 0
3 [16] CALL 1 1 1
4 [17] MOVE 1 0
5 [17] CALL 1 1 1
6 [18] MOVE 1 0
7 [18] CALL 1 1 1
8 [19] MOVE 1 0
9 [19] CALL 1 1 1
10 [20] MOVE 1 0
11 [20] CALL 1 1 1
12 [21] MOVE 1 0
13 [21] CALL 1 1 1
14 [22] MOVE 1 0
15 [22] CALL 1 1 1
16 [23] MOVE 1 0
17 [23] CALL 1 1 1
18 [24] MOVE 1 0
19 [24] CALL 1 1 1
20 [25] MOVE 1 0
21 [25] CALL 1 1 1
22 [26] RETURN 0 1
function <chaincallbench2.lua:28,40> (22 instructions, 88 bytes at 0x101460)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
1 [29] GETUPVAL 0 0 ; chain
2 [30] MOVE 1 0
3 [30] CALL 1 1 1
4 [31] MOVE 1 0
5 [31] CALL 1 1 1
6 [32] MOVE 1 0
7 [32] CALL 1 1 1
8 [33] MOVE 1 0
9 [33] CALL 1 1 1
10 [34] MOVE 1 0
11 [34] CALL 1 1 1
12 [35] MOVE 1 0
13 [35] CALL 1 1 1
14 [36] MOVE 1 0
15 [36] CALL 1 1 1
16 [37] MOVE 1 0
17 [37] CALL 1 1 1
18 [38] MOVE 1 0
19 [38] CALL 1 1 1
20 [39] MOVE 1 0
21 [39] CALL 1 1 1
22 [40] RETURN 0 1
Note that in versions without upvalue caching MOVE is replaced with
GETUPVAL. From a quick look to Lua code, MOVE *looks* a bit faster due
to less lookups:
case OP_MOVE: {
setobjs2s(L, ra, RB(i));
continue;
}
case OP_GETUPVAL: {
int b = GETARG_B(i);
setobj2s(L, ra, cl->upvals[b]->v);
continue;
}
Still, the difference is in tenths of microseconds, and it looks like
both of my benchmark runs were with too little iterations to be
trusted (seconds in total time)...
Alexander.