I have a question about the latest GCC compilers (version >= 5) with this code:
#include <math.h>
void test_nan (
const float * const __restrict__ in,
const int n,
char * const __restrict__ out )
{
for (int i = 0; i < n; ++i)
out[i] = isnan(in[i]);
}
The assembly listing from GCC:
test_nan:
movq %rdx, %rdi
testl %esi, %esi
jle .L1
movslq %esi, %rdx
xorl %esi, %esi
jmp memset
.L1:
ret
This looks like memset(out, 0, n)
.
Why does GCC assume that no entries can be NaN with -Ofast ?
With the same compilation options, ICC does not show this issue.
With GCC, the issue goes away with "-O3".
Note that with "-O3", this query gcc -c -Q -O3 --help=optimizers | egrep -i nan
gives -fsignaling-nans [disabled]
.
I verified this both locally and on godbolt, with the additional option "-std=c99".
Edit: by following the helpful answers below I can confirm that -Ofast -std=c99 -fno-finite-math-only
properly addresses this issue.
From the GCC Options That Control Optimizations documentation.
-Ofast
enables the following optimizations in addition to -O3
:
It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens.
-ffast-math
enables the following:
-fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.
-ffinite-math-only
does the following:
Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.
This allows it to assume that isnan()
always returns 0
.
Barmar's answer explains why -Ofast
causes the compiler to assume NaN never happens. I have two things to add to this.
First, you said something about seeing -fsignaling-nans [disabled]
in --help=optimize
output. Signaling NaNs are a subcategory of all NaN bit patterns. The CPU will fire a floating-point exception when they are used (consult the architecture manual for exactly what "when they are used" means). Normally people use only the other kind, quiet NaNs, because dealing with floating point exceptions is a pain; so, by default, GCC generates code that handles quiet NaNs (and ±Inf) but not signaling NaNs. isnan
is true for both quiet and signaling NaNs. In short, -fsignaling-nans
is a red herring; the option that directly controls the behavior you didn't like is -ffinite-math-only
.
Second, if you were using -Ofast
because you wanted this function to be vectorized, try -O3 -march=native
instead. Loop vectorization is enabled at -O3
, and -march=native
directs GCC to optimize for the full capabilities of the CPU it's running on. Without any -march
switches, GCC will assume it can only use CPU features that are guaranteed to be available by the psABI; for x86-64 (as it appears you have), that's SSE2 but nothing later, which leaves out most of the vector capabilities. On the computer I'm typing this on, -O3 -march=native
produces code for your example function that's half the size and probably about four times as fast as -O3
alone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With