Similar to the SO question of What does gcc's ffast-math actually do? and related to the SO question of Clang optimization levels, I'm wondering what clang
's -Ofast
optimization does in practical terms and whether these differ at all from gcc or is this more hardware dependent than compiler dependent.
According to the accepted answer for clang's optimization levels, -Ofast
adds to the -O3
optimizations: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs
. Which seems to be entirely floating point math related. But what will these optimizations mean in practical terms for things like C++ Common mathematical functions for floating point numbers on a CPU like an Intel Core i7 and how reliable are these differences?
For example, in practical terms:
The code std::isnan(std::numeric_limits<float>::infinity() * 0)
returns true for me with -O3
. I believe that this is what's expected of IEEE math compliant results.
With -Ofast
however, I get a false return value. Additionally, the operation (std::numeric_limits<float>::infinity() * 0) == 0.0f
returns true.
I don't know whether this is the same as what's seen with gcc. It's not clear to me how architecture dependent the results are, nor how compiler dependent they are, nor whether there's any applicable standard to -Ofast
.
If anyone has perhaps produced something like a set of unit tests or code koans that answers this, that may be ideal. I've started to do something like this but would rather not reinvent the wheel.
According to the accepted answer for clang's optimization levels, -Ofast adds to the -O3 optimizations: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs. Which seems to be entirely floating point math related.
Clang / ˈklæŋ / is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), supporting most of its compilation flags and unofficial language extensions.
Clang is compatible with GCC. Its command-line interface shares many of GCC's flags and options. Clang implements many GNU language extensions and compiler intrinsics, some of which are purely for compatibility.
To sum it up, to find out about compiler optimization passes: llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments As pointed out in Geoff Nixon 's answer (+1), clang additionally runs some higher level optimizations, which we can retrieve with: Documentation of individual passes is available here.
Describing how each of these flags affect each of the math function would require too much work, I'll try to give an example for each instead.
Leaving to you the burden to see how each could affect a given function.
-fno-signed-zeros
Assumes that your code doesn't depend on the sign of zero.
In FP arithmetic zero is not an absorbing element w.r.t. the multiplication: 0 · x = x · 0 ≠ 0 because zero has a sign and thus, for example -3 · 0 = -0 ≠ 0 (Where 0 usually denotes +0).
You can see this live on Godbolt where a multiplication by zero is unfolded to a constant zero only with -Ofast
float f(float a)
{
return a*0;
}
;With -Ofast
f(float): # @f(float)
xorps xmm0, xmm0
ret
;With -O3
f(float): # @f(float)
xorps xmm1, xmm1
mulss xmm0, xmm1
ret
A EOF noted in the comments this also depends on finite arithmetic.
-freciprocal-math
Use reciprocals instead of divisors: a/b = a · (1/b).
Due to the limitedness of FP precision, the equal sign is really not there.
Multiplication is faster than division, see Fog's tables.
See also why-is-freciprocal-math-unsafe-in-gcc?.
Live example on Godbolt:
float f(float a){
return a/3;
}
;With -Ofast
.LCPI0_0:
.long 1051372203 # float 0.333333343
f(float): # @f(float)
mulss xmm0, dword ptr [rip + .LCPI0_0]
ret
;With -O3
.LCPI0_0:
.long 1077936128 # float 3
f(float): # @f(float)
divss xmm0, dword ptr [rip + .LCPI0_0]
ret
-ffp-contract=fast
Enable contraction of FP expression.
Contraction is an umbrella term for any law you can apply in the field ℝ that results in a simplified expression.
For example, a * k / k = a.
However, the FP numbers set equipped with + and · is not a field in general due to finite precision.
This flag allows the compiler to contract FP expression at the cost of correctness.
Live example on Godbolt:
float f(float a){
return a/3*3;
}
;With -Ofast
f(float): # @f(float)
ret
;With -O3
.LCPI0_0:
.long 1077936128 # float 3
f(float): # @f(float)
movss xmm1, dword ptr [rip + .LCPI0_0] # xmm1 = mem[0],zero,zero,zero
divss xmm0, xmm1
mulss xmm0, xmm1
ret
-menable-unsafe-fp-math
Kind of the above but in a broader sense.
Enable optimizations that make unsafe assumptions about IEEE math (e.g. that addition is associative) or may not work for all input ranges. These optimizations allow the code generator to make use of some instructions which would otherwise not be usable (such as
fsin
on X86).
See this about the error precision of the fsin
instruction.
Live example at Godbolt where a4 is exanded into (a2/sup>)2:
float f(float a){
return a*a*a*a;
}
f(float): # @f(float)
mulss xmm0, xmm0
mulss xmm0, xmm0
ret
f(float): # @f(float)
movaps xmm1, xmm0
mulss xmm1, xmm1
mulss xmm1, xmm0
mulss xmm1, xmm0
movaps xmm0, xmm1
ret
-menable-no-nans
Assumes the code generates no NaN values.
In a previous answer of mine I analysed how ICC dealt with complex number multiplication by assuming no NaNs.
Most of the FP instruction deals with NaNs automatically.
There are exceptions though, such as comparisons, this can be seen in this live at Godbolt
bool f(float a, float b){
return a<b;
}
;With -Ofast
f(float, float): # @f(float, float)
ucomiss xmm0, xmm1
setb al
ret
;With -O3
f(float, float): # @f(float, float)
ucomiss xmm1, xmm0
seta al
ret
Note that the two versions are not equivalent as the -O3 one exluded the case where a
and b
are unordered while the other one include it in the true
result.
While the performance is the same in this case, in complex expression this asymmetry can lead to different unfolding/optimisations.
-menable-no-infs
Just like the above but for infinities.
I was unable to reproduce a simple example in Godbolt but the trigonometric functions need to deal with infinities carefully, especially for complex numbers.
If you browse the a glibc implementation's math dir (e.g. sinc) you'll see a lot of checks that should be omitted on compilation with -Ofast
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With