[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: 5.1 rc mod giving -0 (never just 0)
- From: Mike Pall <mikelu-0601@...>
- Date: Mon, 23 Jan 2006 11:15:39 +0100
Hi,
D Burgess wrote:
> Seems that this is the 14/15 digit problem with doubles. How does
> fmod do it? At 2^54 we are beyond double capacity to show all digits.
> So how does it know what the remainder is? Some form of
> optimized arithmetic?
By using an iterative subtract-and-shift division algorithm.
A good example is the x87 FPU instruction 'fprem': it calculates
64 bits(*) in one pass. A flag is set if this was not sufficient.
You're supposed to run it in a loop, like this:
|1:
| fprem // [x y] -> [rem y] or [partial-rem y]+C2 flag set
| fnstsw ax
| sahf
| jp <1 // Retry if C2 flag is set.
(*) The x87 FPU registers have a higher internal precision.
The situation is worse for all the platforms that don't have good
hardware support for it. They are probably using a variant of:
http://www.netlib.org/fdlibm/e_fmod.c
... which is dead slow unless rewritten in native assembler.
Alas, glibc and the BSD libc's only have native versions for
x86/x87, x64 and ia64. Seems floating point mod is not very
popular (unlike integer mod).
My guess: This was the reason why the Lua authors decided to use
x-floor(x/y)*y. It's a lot faster on most platforms and more or
less the same for x86/x87 (since fmod is usually not inlined).
But it's very imprecise if the exponents of the two arguments
differ by more than the precision of the mantissa. I still think
it's 'good enough' for most use cases (args in the int32 range).
Anyone who really needs the precision should use math.fmod (it's
not going away -- but the compatibility math.mod alias is).
BTW: Rici mentioned a copysign() workaround. But IMHO this
doesn't work?
print(2%-3, -2%3, math.mod(2,-3), math.mod(-2,3))
--> -1 1 2 -2
It would also be pretty slow because copysign modifies the
highest bit of a 64 bit double with 32 bit integer ops (at least
on x86/x87). This incurs storing and retrieving it from memory
(no direct access to FPU regs from the integer path) and several
store-to-load forwarding stalls.
Bye,
Mike