[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Computed goto optimization of vanilla Lua
- From: Roberto Ierusalimschy <roberto@...>
- Date: Thu, 4 Feb 2016 13:13:51 -0200
> > computed gotos are a feature of modern compilers [1]. They can be used
> > as a faster replacement for switch-based VM [2]. Many programming
> > languages have VMs implemented in computed gotos (at least, Ruby and
> > Python).
> >
> > The computed goto version is faster because of two reasons [2]:
> >
> > * The switch does a bit more per iteration because of bounds checking.
> > * The effects of hardware branch prediction.
> >
> > I have applied this optimization to VM of Lua 5.3.2, file src/lvm.c
> > [3]. It was very easy, because VM uses macros vmdispatch, vmcase and
> > vmbreak. I have redefined these macros and created a dispatch table.
> >
> > [...]
>
> Thanks for the results.
>
> These macros are there exactly for this reason. However, my ealier
> tests (a few years ago, when we introduced the macros) did not show
> any perceptible improvement. In particular, the GCC compiler insisted
> in applying a space optimization that merged the common code at the
> end of different branches in a conditional (most of 'vmbreak', in our
> case), therefore throwing away any possibility of optimized branch
> predictions. (All opcodes ended up using the same indirect jump in the
> final code.) I will try it again.
I (re)did some quick tests. For your particular test, I got a "speedup"
of 5%. In general, I got "speedups" around 5~8%. With clang (3.6), I
got "speedups" around 2% in all my (few) tests. (The quotes mean that I
am not sure whether these speedups are real, that is, due only to this
change and consistent among several compilers, versions, platforms,
tests, etc.) It would be great if other people could report their
results for diverse environments and tests.
-- Roberto