Logo Questions Linux Laravel Mysql Ubuntu Git Menu

What does clang's `-Ofast` option do in practical terms especially for any differences from gcc?

Similar to the SO question of What does gcc's ffast-math actually do? and related to the SO question of Clang optimization levels, I'm wondering what clang's -Ofast optimization does in practical terms and whether these differ at all from gcc or is this more hardware dependent than compiler dependent.

According to the accepted answer for clang's optimization levels, -Ofast adds to the -O3 optimizations: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs. Which seems to be entirely floating point math related. But what will these optimizations mean in practical terms for things like C++ Common mathematical functions for floating point numbers on a CPU like an Intel Core i7 and how reliable are these differences?

For example, in practical terms:

The code std::isnan(std::numeric_limits<float>::infinity() * 0) returns true for me with -O3. I believe that this is what's expected of IEEE math compliant results.

With -Ofast however, I get a false return value. Additionally, the operation (std::numeric_limits<float>::infinity() * 0) == 0.0f returns true.

I don't know whether this is the same as what's seen with gcc. It's not clear to me how architecture dependent the results are, nor how compiler dependent they are, nor whether there's any applicable standard to -Ofast.

If anyone has perhaps produced something like a set of unit tests or code koans that answers this, that may be ideal. I've started to do something like this but would rather not reinvent the wheel.

like image 332
Louis Langholtz Avatar asked Aug 15 '17 01:08

Louis Langholtz

People also ask

What does -ofast actually do in Clang?

According to the accepted answer for clang's optimization levels, -Ofast adds to the -O3 optimizations: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs. Which seems to be entirely floating point math related.

What is Clang in C++?

Clang / ˈklæŋ / is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), supporting most of its compilation flags and unofficial language extensions.

Is Clang compatible with GCC?

Clang is compatible with GCC. Its command-line interface shares many of GCC's flags and options. Clang implements many GNU language extensions and compiler intrinsics, some of which are purely for compatibility.

How to find compiler optimization passes in Clang?

To sum it up, to find out about compiler optimization passes: llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments As pointed out in Geoff Nixon 's answer (+1), clang additionally runs some higher level optimizations, which we can retrieve with: Documentation of individual passes is available here.

1 Answers

Describing how each of these flags affect each of the math function would require too much work, I'll try to give an example for each instead.
Leaving to you the burden to see how each could affect a given function.


Assumes that your code doesn't depend on the sign of zero.
In FP arithmetic zero is not an absorbing element w.r.t. the multiplication: 0 · x = x · 0 ≠ 0 because zero has a sign and thus, for example -3 · 0 = -0 ≠ 0 (Where 0 usually denotes +0).

You can see this live on Godbolt where a multiplication by zero is unfolded to a constant zero only with -Ofast

float f(float a)
    return a*0;

;With -Ofast
f(float):                                  # @f(float)
        xorps   xmm0, xmm0

;With -O3
f(float): # @f(float)
  xorps xmm1, xmm1
  mulss xmm0, xmm1

A EOF noted in the comments this also depends on finite arithmetic.


Use reciprocals instead of divisors: a/b = a · (1/b).
Due to the limitedness of FP precision, the equal sign is really not there.
Multiplication is faster than division, see Fog's tables.
See also why-is-freciprocal-math-unsafe-in-gcc?.

Live example on Godbolt:

float f(float a){
    return a/3;

;With -Ofast
        .long   1051372203              # float 0.333333343
f(float):                                  # @f(float)
        mulss   xmm0, dword ptr [rip + .LCPI0_0]

;With -O3
  .long 1077936128 # float 3
f(float): # @f(float)
  divss xmm0, dword ptr [rip + .LCPI0_0]


Enable contraction of FP expression.
Contraction is an umbrella term for any law you can apply in the field ℝ that results in a simplified expression.
For example, a * k / k = a.

However, the FP numbers set equipped with + and · is not a field in general due to finite precision.
This flag allows the compiler to contract FP expression at the cost of correctness.

Live example on Godbolt:

float f(float a){
    return a/3*3;

;With -Ofast 
f(float):                                  # @f(float)

;With -O3
  .long 1077936128 # float 3
f(float): # @f(float)
  movss xmm1, dword ptr [rip + .LCPI0_0] # xmm1 = mem[0],zero,zero,zero
  divss xmm0, xmm1
  mulss xmm0, xmm1


Kind of the above but in a broader sense.

Enable optimizations that make unsafe assumptions about IEEE math (e.g. that addition is associative) or may not work for all input ranges. These optimizations allow the code generator to make use of some instructions which would otherwise not be usable (such as fsin on X86).

See this about the error precision of the fsin instruction.

Live example at Godbolt where a4 is exanded into (a2/sup>)2:

float f(float a){
    return a*a*a*a;

f(float):                                  # @f(float)
        mulss   xmm0, xmm0
        mulss   xmm0, xmm0

f(float): # @f(float)
  movaps xmm1, xmm0
  mulss xmm1, xmm1
  mulss xmm1, xmm0
  mulss xmm1, xmm0
  movaps xmm0, xmm1


Assumes the code generates no NaN values.
In a previous answer of mine I analysed how ICC dealt with complex number multiplication by assuming no NaNs.

Most of the FP instruction deals with NaNs automatically.
There are exceptions though, such as comparisons, this can be seen in this live at Godbolt

bool f(float a, float b){
    return a<b;

;With -Ofast
f(float, float):                                 # @f(float, float)
        ucomiss xmm0, xmm1
        setb    al

;With -O3
f(float, float): # @f(float, float)
  ucomiss xmm1, xmm0
  seta al

Note that the two versions are not equivalent as the -O3 one exluded the case where a and b are unordered while the other one include it in the true result.
While the performance is the same in this case, in complex expression this asymmetry can lead to different unfolding/optimisations.


Just like the above but for infinities.

I was unable to reproduce a simple example in Godbolt but the trigonometric functions need to deal with infinities carefully, especially for complex numbers.

If you browse the a glibc implementation's math dir (e.g. sinc) you'll see a lot of checks that should be omitted on compilation with -Ofast.

like image 115
Margaret Bloom Avatar answered Nov 09 '22 20:11

Margaret Bloom