Similar to the SO question of What does gcc's ffast-math actually do? and related to the SO question of Clang optimization levels, I'm wondering what <code>clang</code>'s <code>-Ofast</code> optimization does in practical terms and whether these differ at all from gcc or is this more hardware dependent than compiler dependent. According to the accepted answer for clang's optimization levels, <code>-Ofast</code> adds to the <code>-O3</code> optimizations: <code>-fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs</code>. Which seems to be entirely floating point math related. But what will these optimizations mean in practical terms for things like C++ Common mathematical functions for floating point numbers on a CPU like an Intel Core i7 and how reliable are these differences? For example, in practical terms: The code <code>std::isnan(std::numeric_limits<float>::infinity() * 0)</code> returns true for me with <code>-O3</code>. I believe that this is what's expected of IEEE math compliant results. With <code>-Ofast</code> however, I get a false return value. Additionally, the operation <code>(std::numeric_limits<float>::infinity() * 0) == 0.0f</code> returns true. I don't know whether this is the same as what's seen with gcc. It's not clear to me how architecture dependent the results are, nor how compiler dependent they are, nor whether there's any applicable standard to <code>-Ofast</code>. If anyone has perhaps produced something like a set of unit tests or code koans that answers this, that may be ideal. I've started to do something like this but would rather not reinvent the wheel.

Describing how each of these flags affect each of the math function would require too much work, I'll try to give an example for each instead. Leaving to you the burden to see how each could affect a given function. <hr> <h3><code>-fno-signed-zeros</code></h3> Assumes that your code doesn't depend on the sign of zero. In FP arithmetic zero is not an absorbing element w.r.t. the multiplication: 0 · x = x · 0 ≠ 0 because zero has a sign and thus, for example -3 · 0 = -0 ≠ 0 (Where 0 usually denotes +0). You can see this live on Godbolt where a multiplication by zero is unfolded to a constant zero only with <code>-Ofast</code> <pre class="prettyprint"><code>float f(float a) { return a*0; } ;With -Ofast f(float): # @f(float) xorps xmm0, xmm0 ret ;With -O3 f(float): # @f(float) xorps xmm1, xmm1 mulss xmm0, xmm1 ret </code></pre> A EOF noted in the comments this also depends on finite arithmetic. <h3><code>-freciprocal-math</code></h3> Use reciprocals instead of divisors: a/b = a · (1/b). Due to the limitedness of FP precision, the equal sign is really not there. Multiplication is faster than division, see Fog's tables. See also why-is-freciprocal-math-unsafe-in-gcc?. Live example on Godbolt: <pre class="prettyprint"><code>float f(float a){ return a/3; } ;With -Ofast .LCPI0_0: .long 1051372203 # float 0.333333343 f(float): # @f(float) mulss xmm0, dword ptr [rip + .LCPI0_0] ret ;With -O3 .LCPI0_0: .long 1077936128 # float 3 f(float): # @f(float) divss xmm0, dword ptr [rip + .LCPI0_0] ret </code></pre> <h3><code>-ffp-contract=fast</code></h3> Enable contraction of FP expression. Contraction is an umbrella term for any law you can apply in the field ℝ that results in a simplified expression. For example, a * k / k = a. However, the FP numbers set equipped with + and · is not a field in general due to finite precision. This flag allows the compiler to contract FP expression at the cost of correctness. Live example on Godbolt: <pre class="prettyprint"><code>float f(float a){ return a/3*3; } ;With -Ofast f(float): # @f(float) ret ;With -O3 .LCPI0_0: .long 1077936128 # float 3 f(float): # @f(float) movss xmm1, dword ptr [rip + .LCPI0_0] # xmm1 = mem[0],zero,zero,zero divss xmm0, xmm1 mulss xmm0, xmm1 ret </code></pre> <h3><code>-menable-unsafe-fp-math</code></h3> Kind of the above but in a broader sense. <blockquote> Enable optimizations that make unsafe assumptions about IEEE math (e.g. that addition is associative) or may not work for all input ranges. These optimizations allow the code generator to make use of some instructions which would otherwise not be usable (such as <code>fsin</code> on X86). </blockquote> See this about the error precision of the <code>fsin</code> instruction. Live example at Godbolt where a4 is exanded into (a2/sup>)2: <pre class="prettyprint"><code>float f(float a){ return a*a*a*a; } f(float): # @f(float) mulss xmm0, xmm0 mulss xmm0, xmm0 ret f(float): # @f(float) movaps xmm1, xmm0 mulss xmm1, xmm1 mulss xmm1, xmm0 mulss xmm1, xmm0 movaps xmm0, xmm1 ret </code></pre> <h3><code>-menable-no-nans</code></h3> Assumes the code generates no NaN values. In a previous answer of mine I analysed how ICC dealt with complex number multiplication by assuming no NaNs. Most of the FP instruction deals with NaNs automatically. There are exceptions though, such as comparisons, this can be seen in this live at Godbolt <pre class="prettyprint"><code>bool f(float a, float b){ return a<b; } ;With -Ofast f(float, float): # @f(float, float) ucomiss xmm0, xmm1 setb al ret ;With -O3 f(float, float): # @f(float, float) ucomiss xmm1, xmm0 seta al ret </code></pre> Note that the two versions are not equivalent as the -O3 one exluded the case where <code>a</code> and <code>b</code> are unordered while the other one include it in the <code>true</code> result. While the performance is the same in this case, in complex expression this asymmetry can lead to different unfolding/optimisations. <h3><code>-menable-no-infs</code></h3> Just like the above but for infinities. I was unable to reproduce a simple example in Godbolt but the trigonometric functions need to deal with infinities carefully, especially for complex numbers. If you browse the a glibc implementation's math dir (e.g. sinc) you'll see a lot of checks that should be omitted on compilation with <code>-Ofast</code>.

What does clang's `-Ofast` option do in practical terms especially for any differences from gcc?

Tags:

c++

floating-point

compiler-optimization

x86-64

intel

clang

Similar to the SO question of What does gcc's ffast-math actually do? and related to the SO question of Clang optimization levels, I'm wondering what clang's -Ofast optimization does in practical terms and whether these differ at all from gcc or is this more hardware dependent than compiler dependent.

According to the accepted answer for clang's optimization levels, -Ofast adds to the -O3 optimizations: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs. Which seems to be entirely floating point math related. But what will these optimizations mean in practical terms for things like C++ Common mathematical functions for floating point numbers on a CPU like an Intel Core i7 and how reliable are these differences?

For example, in practical terms:

The code std::isnan(std::numeric_limits<float>::infinity() * 0) returns true for me with -O3. I believe that this is what's expected of IEEE math compliant results.

With -Ofast however, I get a false return value. Additionally, the operation (std::numeric_limits<float>::infinity() * 0) == 0.0f returns true.

I don't know whether this is the same as what's seen with gcc. It's not clear to me how architecture dependent the results are, nor how compiler dependent they are, nor whether there's any applicable standard to -Ofast.

If anyone has perhaps produced something like a set of unit tests or code koans that answers this, that may be ideal. I've started to do something like this but would rather not reinvent the wheel.

332

asked Aug 15 '17 01:08

Louis Langholtz

1 Answers

Describing how each of these flags affect each of the math function would require too much work, I'll try to give an example for each instead.
Leaving to you the burden to see how each could affect a given function.

`-fno-signed-zeros`

Assumes that your code doesn't depend on the sign of zero.
In FP arithmetic zero is not an absorbing element w.r.t. the multiplication: 0 · x = x · 0 ≠ 0 because zero has a sign and thus, for example -3 · 0 = -0 ≠ 0 (Where 0 usually denotes +0).

You can see this live on Godbolt where a multiplication by zero is unfolded to a constant zero only with -Ofast

float f(float a)
{
    return a*0;
}

;With -Ofast
f(float):                                  # @f(float)
        xorps   xmm0, xmm0
        ret

;With -O3
f(float): # @f(float)
  xorps xmm1, xmm1
  mulss xmm0, xmm1
  ret

A EOF noted in the comments this also depends on finite arithmetic.

`-freciprocal-math`

Use reciprocals instead of divisors: a/b = a · (1/b).
Due to the limitedness of FP precision, the equal sign is really not there.
Multiplication is faster than division, see Fog's tables.
See also why-is-freciprocal-math-unsafe-in-gcc?.

Live example on Godbolt:

float f(float a){
    return a/3;
}

;With -Ofast
.LCPI0_0:
        .long   1051372203              # float 0.333333343
f(float):                                  # @f(float)
        mulss   xmm0, dword ptr [rip + .LCPI0_0]
        ret

;With -O3
.LCPI0_0:
  .long 1077936128 # float 3
f(float): # @f(float)
  divss xmm0, dword ptr [rip + .LCPI0_0]
  ret

`-ffp-contract=fast`

Enable contraction of FP expression.
Contraction is an umbrella term for any law you can apply in the field ℝ that results in a simplified expression.
For example, a * k / k = a.

However, the FP numbers set equipped with + and · is not a field in general due to finite precision.
This flag allows the compiler to contract FP expression at the cost of correctness.

Live example on Godbolt:

float f(float a){
    return a/3*3;
}

;With -Ofast 
f(float):                                  # @f(float)
        ret

;With -O3
.LCPI0_0:
  .long 1077936128 # float 3
f(float): # @f(float)
  movss xmm1, dword ptr [rip + .LCPI0_0] # xmm1 = mem[0],zero,zero,zero
  divss xmm0, xmm1
  mulss xmm0, xmm1
  ret

`-menable-unsafe-fp-math`

Kind of the above but in a broader sense.

Enable optimizations that make unsafe assumptions about IEEE math (e.g. that addition is associative) or may not work for all input ranges. These optimizations allow the code generator to make use of some instructions which would otherwise not be usable (such as fsin on X86).

See this about the error precision of the fsin instruction.

Live example at Godbolt where a⁴ is exanded into (a^2/sup>)2:

float f(float a){
    return a*a*a*a;
}

f(float):                                  # @f(float)
        mulss   xmm0, xmm0
        mulss   xmm0, xmm0
        ret

f(float): # @f(float)
  movaps xmm1, xmm0
  mulss xmm1, xmm1
  mulss xmm1, xmm0
  mulss xmm1, xmm0
  movaps xmm0, xmm1
  ret

`-menable-no-nans`

Assumes the code generates no NaN values.
In a previous answer of mine I analysed how ICC dealt with complex number multiplication by assuming no NaNs.

Most of the FP instruction deals with NaNs automatically.
There are exceptions though, such as comparisons, this can be seen in this live at Godbolt

bool f(float a, float b){
    return a<b;
}

;With -Ofast
f(float, float):                                 # @f(float, float)
        ucomiss xmm0, xmm1
        setb    al
        ret

;With -O3
f(float, float): # @f(float, float)
  ucomiss xmm1, xmm0
  seta al
  ret

Note that the two versions are not equivalent as the -O3 one exluded the case where a and b are unordered while the other one include it in the true result.
While the performance is the same in this case, in complex expression this asymmetry can lead to different unfolding/optimisations.

`-menable-no-infs`

Just like the above but for infinities.

I was unable to reproduce a simple example in Godbolt but the trigonometric functions need to deal with infinities carefully, especially for complex numbers.

If you browse the a glibc implementation's math dir (e.g. sinc) you'll see a lot of checks that should be omitted on compilation with -Ofast.

115

answered Nov 09 '22 20:11

Margaret Bloom

Related questions
                            
                                Writing and reading custom class to QSettings
                            
                                C++ std::unordered_map complexity
                            
                                std::remove_if and std::isspace - compile-time error
                            
                                The constructor function in a pure virtual class should be "protected" or "public"?
                            
                                Comma operator with typeid
                            
                                How to implement fast inverse sqrt without undefined behavior? [duplicate]
                            
                                How to convert QIcon to QPixmap
                            
                                return {0} from a function in C?
                            
                                Is this operation properly sequenced?
                            
                                std::map<T, bool>, count values that are true
                            
                                What kinds of header files should not be protected against multiple inclusion?
                            
                                modern c++ alternative to function pointers
                            
                                lambda capture by value mutable doesn't work with const &?
                            
                                Using macro with string fails on VC 2015
                            
                                std::conditional vs std::enable_if
                            
                                chrono_literals is not a namespace-name
                            
                                Rcpp Rtools installed but error message g++ not found
                            
                                how to convert std::vector<vector> to void*
                            
                                Draw rotated rectangle in opencv c++
                            
                                shrink_to_fit() vs swap trick

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With