[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lua 5.1 (beta) now available
- From: Rici Lake <lua@...>
- Date: Fri, 18 Nov 2005 18:45:14 -0500
On 18-Nov-05, at 12:18 PM, Mike Pall wrote:
On the other hand, I've certainly seen C compilers (though I
admit not for a long time) which would cheerfully optimize away the
(a)!=(a) check (which certainly should be optimized away if a is an
integer type.)
It's clearly a violation of the standard to optimize this
comparison away for floating point numbers. Abandon all
hope that such a compiler gets the other subtle issues
of FP arithmetic right.
Which standard would that be? :) All I see in the C standard is that
the value of a comparison or equality operator is "1 if the specified
relation is true and 0 if it is false". Even C99 does not mandate the
use of IEEE-754 floating point, and it is not intrinsic to floating
point that either (1) there is such a thing as "not a number" or (2) if
there is, that it tests unequal to itself.
OK, to be fair, C99 does say that if the implementation purports to
implement IEEE-754 floating point (by defining __STDC_IEC_559__), then
it has to make == and != work that way. On the other hand, in that
case, it also has to define isunordered(x, y). And, curiously, gcc (at
least on x86) *does* inline isunordered(x, x) even though it does not
inline isnan(x) (which is semantically identical). Go figure.
(gcc does not seem to define __STDC_IEC_559__, though. Perhaps the
implementation isn't considered complete yet. So it's under no
obligation to honour the definition of ==. See below.)
So I timed the following three little snippets in a hard loop:
double uno(double x) {
if (isunordered(x, x)) return 0;
else return x + 1;
}
double isn(double x) {
if (isnan(x)) return 0;
else return x + 1;
}
double cmp(double x) {
if (x != x) return 0;
else return x + 1;
}
using 0.0 and some NaN as an argument. Results: (nanoseconds per
iteration, timed with 100,000,000 iterations). (Remember when you could
benchmark without counting zeros? :)
isnan(x)
NaN |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
107.8
0.0 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 101
x != x
NaN |xxxxxxxxxx 19.6
0.0 |xxxxxxxxxxxx 24.5
isunordered(x, x)
NaN |xxxxxxxxxxxx 24.2
0.0 |xxxxxxxxxx 19.0
Conclusion: on gcc/x86 (and possibly other platforms), the best test is
'isunordered(x,x)'; it's apparently faster in the common case, and it
is semantically correct.
Now, gcc offers the interesting optimization flag -ffast-math. You
should never use this flag. We all know that, right? But I suppose
people do. Anyway, I tried it. Two interesting things arise:
First, with -ffast-math, gcc optimizes away the 'x != x' test. Of
course, it never claimed to be IEEE-754 compliant, so it's allowed to
do that.
Second, with -ffast-math, gcc also optimised away the call to
isunordered() (!).
Finally, -ffast-math is anything but fast when presented with NaNs. I
ran the same tests as above, but the results won't fit on the width of
the email:
isnan(x)
NaN |-----------------------------------------------------> 4247
0.0 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 104
x != x
NaN |----> 3933* (wrong answer)
0.0 |xxxxxxx 13.2
isunordered(x, x)
NaN |----> 3933* (wrong answer)
0.0 |xxxxxxx 13.1
So optimising away the tests does produce the right answer faster. But
why is it so slow to produce the wrong answer? And why does the
slowness affect a non-inlined call to isnan() as well?
The answer to the first question is, of course, the pathetic handling
of NaNs by Pentiums. Since the check for NaN has been removed from the
code, the addition NaN+1.0 takes place; this is disastrously slow on a
Pentium 4.
Examination of the assembly shows that this is very similar to the
isnan(x) case. In a vain attempt to squeeze every microcycle out of the
Pentium 4, gcc moves the addition to *prior* to the test of the result
of isnan(x). That is, it does the addition regardless of whether it
needs the value, because "it can't hurt". One presumes that it wouldn't
have done that had it been a division instead of an addition, and it
certainly does not do it without -ffast-math, presumably because in
that case it knows that the addition might change an fp exception flag.
But the joke's on gcc in the end; the "unnecessary but harmless"
addition ends up costing a 40x slowdown.