[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: single instruction multiple data (SIMD) in the Lua VM
- From: "David Manura" <dm.lua@...>
- Date: Fri, 2 Jan 2009 00:29:54 -0500
The following patch to Lua provides an experimental implementation of
a type of Single Instruction Multiple Data (SIMD) capability in the
Lua VM for increased performance on specialized computations:
http://lua-users.org/wiki/SimdExperiment
Importantly, the initial implementation here is ANSI C, and it does
not use any SSE instructions or multithreading. How can this be? The
opcode dispatch in the Lua VM imposes a non-negligible overhead. If,
however, we interpret each opcode (instruction) once and execute that
opcode on multiple data elements, we could expect to reduce the
relative overhead of the opcode dispatch, even if the data is
processed serially in each opcode.
A simple test of summing integers reduced runtimes by 70%. Adding
native SSE2 support might improve runtimes further.
-- test1-standard.lua (Standard version)
local sum = 0
for i=1,2^28 do sum = sum + i end
print(sum)
-- test1-simd.lua (SIMD version)
local N=_SIMD_LEN
local j; for k=1,N do packed(j,k) = k end
local psum; for k=1,N do packed(psum,k)= 0 end
local fi; for k=1,N do packed(fi,k) = k end
local fs; for k=1,N do packed(fs,k) = N end
for i=fi,2^28,fs do
psum = psum + i
end
local sum = 0
for i=1,N do
-- print('partial sum', i, packed(psum,i))
sum = sum + packed(psum,i)
end
print('sum:', sum)
Additional details are on the wiki page. I consider this a
proof-of-concept, with the hope it will inspire other implementations.