Here are free functions that do the same but in the first case the loop is not vectorized but in the other cases it is. Why is that?
#include <vector>
typedef std::vector<double> Vec;
void update(Vec& a, const Vec& b, double gamma) {
const size_t K = a.size();
for (size_t i = 0; i < K; ++i) { // not vectorized
a[i] = b[i] * gamma - a[i];
}
}
void update2(Vec& a, const Vec& b, double gamma) {
for (size_t i = 0; i < a.size(); ++i) { // vectorized
a[i] = b[i] * gamma - a[i];
}
}
void update3(Vec& a, size_t K, const Vec& b, double gamma) {
for (size_t i = 0; i < K; ++i) { // vectorized
a[i] = b[i] * gamma - a[i];
}
}
int main(int argc, const char* argv[]) {
Vec a(argc), b;
update(a, b, 0.5);
update2(a, b, 0.5);
update3(a, a.size(), b, 0.5);
return 0;
}
Relevant messages from the compiler (VS2013):
1> c:\home\dima\trws\trw_s-v1.3\trws\test\vector.cpp(7) : info C5002: loop not vectorized due to reason '1200'
1> c:\home\dima\trws\trw_s-v1.3\trws\test\vector.cpp(13) : info C5001: loop vectorized
1> c:\home\dima\trws\trw_s-v1.3\trws\test\vector.cpp(19) : info C5001: loop vectorized
From comment by @tony
Reason 1200: "Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences." source
I guess it's some deeply internal compiler implementation issue, like at what stage did auto-vectorizer "kick in" and what's the state of the internal representation of the code at that time. When I tried on MSVC2017, it worked more in line with what one would expect. It auto-vectorized update()
and update3()
, but not update2()
, with the reason given 501 for line 14, which is documented as:
Induction variable is not local; or upper bound is not loop-invariant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With