Why is it mandatory to use <code>-ffast-math</code> with g++ to achieve the vectorization of loops using <code>double</code>s? I don't like <code>-ffast-math</code> because I don't want to lose precision.

You don’t necessarily lose precision with <code>-ffast-math</code>. It only affects the handling of <code>NaN</code>, <code>Inf</code> etc. and the order in which operations are performed. If you have a specific piece of code where you do not want GCC to reorder or simplify computations, you can mark variables as being used using an <code>asm</code> statement. For instance, the following code performs a rounding operation on <code>f</code>. However, the two <code>f += g</code> and <code>f -= g</code> operations are likely to get optimised away by gcc: <pre class="prettyprint"><code>static double moo(double f, double g) { g *= 4503599627370496.0; // 2 ** 52 f += g; f -= g; return f; } </code></pre> On x86_64, you can use this <code>asm</code> statement to instruct GCC not to perform that optimisation: <pre class="prettyprint"><code>static double moo(double f, double g) { g *= 4503599627370496.0; // 2 ** 52 f += g; __asm__("" : "+x" (f)); f -= g; return f; } </code></pre> You will need to adapt this for each architecture, unfortunately. On PowerPC, use <code>+f</code> instead of <code>+x</code>.

Very likely because vectorization means that you may have different results, or may mean that you miss floating point signals/exceptions. If you're compiling for 32-bit x86 then gcc and g++ default to using the x87 for floating point math, on 64-bit they default to SSE, however the x87 can and will produce different values for the same computation so it's unlikely g++ will consider vectorizing if it can't guarantee that you will get the same results unless you use <code>-ffast-math</code> or some of the flags it turns on. Basically it comes down to the floating point environment for vectorized code may not be the same as the one for non vectorized code, sometimes in ways that are important, if the differences don't matter to you, something like <pre class="prettyprint"><code>-fno-math-errno -fno-trapping-math -fno-signaling-nans -fno-rounding-math </code></pre> but first look up those options and make sure that they won't affect your program's correctness. <code>-ffinite-math-only</code> may help also

<h3>Because <code>-ffast-math</code> enables operands reordering which allows many code to be vectorized.</h3> For example to calculate this <pre class="prettyprint"><code>sum = a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + … a[99] </code></pre> the compiler is required to do the additions sequentially without <code>-ffast-math</code>, because floating-point math is neither commutative nor associative. <ul> <li>Is floating point addition commutative and associative?</li> <li>Is floating point addition commutative in C++?</li> <li>Are floating point operations in C associative?</li> <li>Is Floating point addition and multiplication associative?</li> </ul> That's the same reason why compilers can't optimize <code>a*a*a*a*a*a</code> to <code>(a*a*a)*(a*a*a)</code> without <code>-ffast-math</code> That means no vectorization available unless you have very efficient horizontal vector adds. However if <code>-ffast-math</code> is enabled, the expression can be calculated like this (Look at <code>A7. Auto-Vectorization</code>) <pre class="prettyprint"><code>sum0 = a[0] + a[4] + a[ 8] + … a[96] sum1 = a[1] + a[5] + a[ 9] + … a[97] sum2 = a[2] + a[6] + a[10] + … a[98] sum3 = a[3] + a[7] + a[11] + … a[99] sum’ = sum0 + sum1 + sum2 + sum3 </code></pre> Now the compiler can vectorize it easily by adding each column in parallel and then do a horizontal add at the end <blockquote> Does <code>sum’ == sum</code>? Only if <code>(a[0]+a[4]+…) + (a[1]+a[5]+…) + (a[2]+a[6]+…) + ([a[3]+a[7]+…) == a[0] + a[1] + a[2] + …</code> This holds under associativity, which floats don’t adhere to, all of the time. Specifying <code>/fp:fast</code> lets the compiler transform your code to run faster – up to 4 times faster, for this simple calculation. Do You Prefer Fast or Precise? - A7. Auto-Vectorization </blockquote> It may be enabled by the <code>-fassociative-math</code> flag in gcc <h3>Further readings</h3> <ul> <li>Semantics of Floating Point Math in GCC</li> <li>What does gcc's ffast-math actually do?</li> </ul>

Auto vectorization on double and ffast-math

3 Answers

You don’t necessarily lose precision with -ffast-math. It only affects the handling of NaN, Inf etc. and the order in which operations are performed.

If you have a specific piece of code where you do not want GCC to reorder or simplify computations, you can mark variables as being used using an asm statement.

For instance, the following code performs a rounding operation on f. However, the two f += g and f -= g operations are likely to get optimised away by gcc:

static double moo(double f, double g)                                      
{                                                                          
    g *= 4503599627370496.0; // 2 ** 52                                    
    f += g;                                                                
    f -= g;                                                                
    return f;                                                            
}

On x86_64, you can use this asm statement to instruct GCC not to perform that optimisation:

static double moo(double f, double g)                                      
{                                                                          
    g *= 4503599627370496.0; // 2 ** 52                                    
    f += g;                                                                
    __asm__("" : "+x" (f));
    f -= g;
    return f;
}

You will need to adapt this for each architecture, unfortunately. On PowerPC, use +f instead of +x.

147

answered Sep 27 '22 23:09

sam hocevar

Very likely because vectorization means that you may have different results, or may mean that you miss floating point signals/exceptions.

If you're compiling for 32-bit x86 then gcc and g++ default to using the x87 for floating point math, on 64-bit they default to SSE, however the x87 can and will produce different values for the same computation so it's unlikely g++ will consider vectorizing if it can't guarantee that you will get the same results unless you use -ffast-math or some of the flags it turns on.

Basically it comes down to the floating point environment for vectorized code may not be the same as the one for non vectorized code, sometimes in ways that are important, if the differences don't matter to you, something like

-fno-math-errno -fno-trapping-math -fno-signaling-nans -fno-rounding-math

but first look up those options and make sure that they won't affect your program's correctness. -ffinite-math-only may help also

answered Sep 27 '22 23:09

Spudd86

Because `-ffast-math` enables operands reordering which allows many code to be vectorized.

For example to calculate this

sum = a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + … a[99]

the compiler is required to do the additions sequentially without -ffast-math, because floating-point math is neither commutative nor associative.

Is floating point addition commutative and associative?
Is floating point addition commutative in C++?
Are floating point operations in C associative?
Is Floating point addition and multiplication associative?

That's the same reason why compilers can't optimize a*a*a*a*a*a to (a*a*a)*(a*a*a) without -ffast-math

That means no vectorization available unless you have very efficient horizontal vector adds.

However if -ffast-math is enabled, the expression can be calculated like this (Look at A7. Auto-Vectorization)

sum0 = a[0] + a[4] + a[ 8] + … a[96]
sum1 = a[1] + a[5] + a[ 9] + … a[97]
sum2 = a[2] + a[6] + a[10] + … a[98]
sum3 = a[3] + a[7] + a[11] + … a[99]
sum’ = sum0 + sum1 + sum2 + sum3

Now the compiler can vectorize it easily by adding each column in parallel and then do a horizontal add at the end

Does sum’ == sum? Only if (a[0]+a[4]+…) + (a[1]+a[5]+…) + (a[2]+a[6]+…) + ([a[3]+a[7]+…) == a[0] + a[1] + a[2] + … This holds under associativity, which floats don’t adhere to, all of the time. Specifying /fp:fast lets the compiler transform your code to run faster – up to 4 times faster, for this simple calculation.

Do You Prefer Fast or Precise? - A7. Auto-Vectorization

It may be enabled by the -fassociative-math flag in gcc

Further readings

Semantics of Floating Point Math in GCC
What does gcc's ffast-math actually do?

answered Sep 28 '22 00:09

phuclv

Related questions
                            
                                Change GCC Version to 4.7 on Mac OS X
                            
                                msvc's equivalent of gcc's __BASE_FILE__
                            
                                How to use thread-sanitizer of gcc v4.8.1?
                            
                                Implementation of __builtin_va_start(v,l)
                            
                                Why is GCC 4.8.2 complaining about addition under strict overflow?
                            
                                Why is execstack required to execute code on the heap?
                            
                                Why do compilers (e.g. gcc) deal with the memory layout of derived classes in this way?
                            
                                64 bit code generated by GCC is 3 times slower than 32 bit
                            
                                Using cmake variable in execute_process command in cmake file
                            
                                GCC: Forbid implicit bool->int conversion
                            
                                Why does GCC only sometimes detect the use of a variable before its initialization? [duplicate]
                            
                                Type 'uint32_t' could not be resolved
                            
                                Why is a template<typename...> not recognized as instantiatable through template<template<typename> typename>?
                            
                                C: Correct Way to Statically / Dynamically Link with MinGW-w64
                            
                                Why does ARM use two instructions to mask a value?
                            
                                What is the purpose of restrict as size of array?
                            
                                What is the "C++ ABI Specification" referred to in GCC's manual?
                            
                                Hide symbol(s) in Shared Object from LD
                            
                                How to build an application that requires both libstdc++.so.5 and libstdc++.so.6?
                            
                                Making a Ubuntu executable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Auto vectorization on double and ffast-math

Tags:

vectorization

double

gcc

fast-math

g++

Ruggero Turra

People also ask

3 Answers

sam hocevar

Spudd86

Because `-ffast-math` enables operands reordering which allows many code to be vectorized.

Further readings

phuclv

Recent Activity

Donate For Us

Auto vectorization on double and ffast-math

Tags:

vectorization

double

gcc

fast-math

g++

Ruggero Turra

People also ask

3 Answers

sam hocevar

Spudd86

Because -ffast-math enables operands reordering which allows many code to be vectorized.

Further readings

phuclv

Related questions

Recent Activity

Donate For Us

Because `-ffast-math` enables operands reordering which allows many code to be vectorized.