In the code below, why is the second loop able to be auto vectorized but the first cannot? How can I modify the code so it does auto vectorize? gcc says:
note: not vectorized: control flow in loop.
I am using gcc 8.2, flags are -O3 -fopt-info-vec-all. I am compiling for x86-64 avx2.
#include <stdlib.h>
#include <math.h>
void foo(const float * x, const float * y, const int * v, float * vec, float * novec, size_t size) {
size_t i;
float bar;
for (i=0 ; i<size ; ++i){
bar = x[i] - y[i];
novec[i] = v[i] ? bar : NAN;
}
for (i=0 ; i<size ; ++i){
bar = x[i];
vec[i] = v[i] ? bar : NAN;
}
}
Update: This does autovectorize:
for (i=0 ; i<size ; ++i){
bar = x[i];
novec[i] = v[i] ? bar : NAN;
novec[i] -= y[i];
}
I would still like to know why gcc says control flow for the first loop.
clang auto-vectorizes even the first loop, but gcc8.2 doesn't. (https://godbolt.org/z/cnlwuO)
gcc vectorizes with -ffast-math
. Perhaps it's worried about preserving FP exception flag status from the subtraction?
-fno-trapping-math
is sufficient for gcc to auto-vectorize (without the rest of what -ffast-math
sets), so apparently it's worried about FP exceptions. (https://godbolt.org/z/804ykV). I think it's being over-cautious, because the C source does compute bar
every time, whether or not it's used.
gcc will auto-vectorize simple FP a[i] = b[i]+c[i]
loops without any FP math options.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With