In the code below, why is the second loop able to be auto vectorized but the first cannot? How can I modify the code so it does auto vectorize? gcc says:
note: not vectorized: control flow in loop.
I am using gcc 8.2, flags are -O3 -fopt-info-vec-all. I am compiling for x86-64 avx2.
#include <stdlib.h>
#include <math.h>
void foo(const float * x, const float * y, const int * v, float * vec, float * novec, size_t size) {
    size_t i;
    float bar;
    for (i=0 ; i<size ; ++i){
        bar = x[i] - y[i];
        novec[i] = v[i] ? bar : NAN;
    }
    for (i=0 ; i<size ; ++i){
        bar = x[i];
        vec[i] = v[i] ? bar : NAN;
    }
}
Update: This does autovectorize:
for (i=0 ; i<size ; ++i){
    bar = x[i];
    novec[i] = v[i] ? bar : NAN;
    novec[i] -= y[i];
}
I would still like to know why gcc says control flow for the first loop.
clang auto-vectorizes even the first loop, but gcc8.2 doesn't. (https://godbolt.org/z/cnlwuO)
gcc vectorizes with -ffast-math.  Perhaps it's worried about preserving FP exception flag status from the subtraction?
-fno-trapping-math is sufficient for gcc to auto-vectorize (without the rest of what -ffast-math sets), so apparently it's worried about FP exceptions.  (https://godbolt.org/z/804ykV).  I think it's being over-cautious, because the C source does compute bar every time, whether or not it's used.
gcc will auto-vectorize simple FP a[i] = b[i]+c[i] loops without any FP math options.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With