[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Threaded vm core loop (Was: Re: how does lua arrange vmcase?)
- From: Roberto Ierusalimschy <roberto@...>
- Date: Wed, 5 Apr 2017 19:04:10 -0300
> It use jmp *%rax for switch case. The trick only remove the unused default
> case.
The real goal of the trick would be to have different copies of 'jmp
%rax' after each opcode (given the expansion of 'vmdispatch' at the end
of each instruction).
In a typical loop, the VM hardly executes twice the same opcode in
sequence, so the jump prediction of a single jump is very poor. However,
we can expect that the frequency of a given instruction being followed by
other particular fixed instruction can be high. For instance, consider
the following loop:
5 [2] FORPREP 1 1 ; to 7
6 [3] ADD 0 0 4
7 [2] FORLOOP 1 -2 ; to 6
ADD is always followed by FORLOOP which is always followed by ADD. So,
if we have different jump instructions ('jmp %rax') at the end of
each opcode, they would hava a 100% hit rate. By contrast, a unique
'jmp %rax' for both instructions (which happens when we use a switch)
would have a 0% hit rate.
Unfortunately, many compilers "optimize" this kind of code, unifying
the different copies of 'vmdispatch' into one to reduce code size,
and therfore they kill that kind of gain. :-(
-- Roberto