Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC issue with -Ofast?

Tags:

c

gcc

I have a question about the latest GCC compilers (version >= 5) with this code:

#include <math.h>

void test_nan (
    const float * const __restrict__ in,
    const int n,
    char * const __restrict__ out )
{
    for (int i = 0; i < n; ++i)
        out[i] = isnan(in[i]);
}

The assembly listing from GCC:

test_nan:
        movq    %rdx, %rdi
        testl   %esi, %esi
        jle     .L1
        movslq  %esi, %rdx
        xorl    %esi, %esi
        jmp     memset
.L1:
        ret

This looks like memset(out, 0, n). Why does GCC assume that no entries can be NaN with -Ofast ? With the same compilation options, ICC does not show this issue. With GCC, the issue goes away with "-O3".

Note that with "-O3", this query gcc -c -Q -O3 --help=optimizers | egrep -i nan gives -fsignaling-nans [disabled].

I verified this both locally and on godbolt, with the additional option "-std=c99".

Edit: by following the helpful answers below I can confirm that -Ofast -std=c99 -fno-finite-math-only properly addresses this issue.

like image 512
diegor Avatar asked Jun 17 '20 14:06

diegor


2 Answers

From the GCC Options That Control Optimizations documentation.

-Ofast enables the following optimizations in addition to -O3:

It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens.

-ffast-math enables the following:

-fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.

-ffinite-math-only does the following:

Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.

This allows it to assume that isnan() always returns 0.

like image 103
Barmar Avatar answered Sep 28 '22 04:09

Barmar


Barmar's answer explains why -Ofast causes the compiler to assume NaN never happens. I have two things to add to this.

First, you said something about seeing -fsignaling-nans [disabled] in --help=optimize output. Signaling NaNs are a subcategory of all NaN bit patterns. The CPU will fire a floating-point exception when they are used (consult the architecture manual for exactly what "when they are used" means). Normally people use only the other kind, quiet NaNs, because dealing with floating point exceptions is a pain; so, by default, GCC generates code that handles quiet NaNs (and ±Inf) but not signaling NaNs. isnan is true for both quiet and signaling NaNs. In short, -fsignaling-nans is a red herring; the option that directly controls the behavior you didn't like is -ffinite-math-only.

Second, if you were using -Ofast because you wanted this function to be vectorized, try -O3 -march=native instead. Loop vectorization is enabled at -O3, and -march=native directs GCC to optimize for the full capabilities of the CPU it's running on. Without any -march switches, GCC will assume it can only use CPU features that are guaranteed to be available by the psABI; for x86-64 (as it appears you have), that's SSE2 but nothing later, which leaves out most of the vector capabilities. On the computer I'm typing this on, -O3 -march=native produces code for your example function that's half the size and probably about four times as fast as -O3 alone.

like image 40
zwol Avatar answered Sep 28 '22 04:09

zwol