lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Erik Lindroos wrote:
> I'm trying to patch in reproducible (strictly IEEE 754) floating point
> arithmetic in LuaJIT 2. Only the interpreter is really affected as far as I
> can tell, since the JIT uses SSE2 exclusively.

The interpreter stores all intermediate results to stack slots,
i.e. to IEEE 754 double precision FP numbers. This is equivalent
to the -ffloat-store option of GCC or Java with strictfp.

This does not solve the double rounding problem, though.

> To patch the interpreter, I need to store some constants to scale down and
> up x87 floating point regs before/after multiplication and division.

Wouldn't it be easier to modify the interpreter to use SSE2, too?
I could certainly add a compile-time option to select either of
the two code paths (like the -DLUAJIT_CPU_NOCMOV option).

[BTW: That still doesn't solve the various issues with the x87
transcendental functions or with pow().]

> However, while patching lj_vm_foldarith, I noticed DynASM doesn't seem to
> relocate loads from globals correctly, e.g.:

DynASM does not deal with relocation. It's the driver program
that's responsible for this (i.e. buildvm for LJ2). I simply
didn't bother to add relocations for internal data references.

It wouldn't be a real solution, anyway, because this doesn't work
in a shared library or would necessitate text-relocations (which
is frowned upon in many distros). And I really don't want to add
GOT-relocations, since this is quite system-specific.

> How would I solve this without e.g. writing the constant to the stack and
> load it from there every time?

The current interpreter is pure position-independent code and I
want to keep it that way. I've used loads from upvalues or
construction of constants on the stack to get around these
limitations in some cases. I think this is fast enough. If you
really need this to be maximally fast, I'd keep them in the
global_State, which can be addressed via the DISPATCH register.

But IMHO the better solution is to (optionally) use SSE2 in the
interpreter, too. The intersection of the set of people who still
have an x87-only box and the set of people who really need
reproducible FP arithmetic is probably empty.

--Mike