I am currently trying to vectorize a program and i have observed an odd behaviour
Seems that a for loop is vectorized when using
#pragma simd
(262): (col. 3) remark: SIMD LOOP WAS VECTORIZED.
but it doesn't when i use
#pragma vector always
#pragma ivdep
(262): (col. 3) remark: loop was not vectorized: existence of vector dependence.
I always thought that both sentences do the same vectorization
pragma ivdep tells compiler to ignore assumed data dependences that inhibit vectorization(for example loop carried dependences), but not proven ones. For example it might assume to pointers aren't pointing to the same memory and vectorize.
There are two ways to vectorize a loop computation in a C/C++ program. Programmers can use intrinsics inside the C/C++ source code to tell compilers to generate specific SIMD instructions so as to vectorize the loop computation. Or, compilers may be setup to vectorize the loop computation automatically.
pragma simd enforces vectorization of loop, regardless of cost or safety.
pragma vector always tells compiler to ignore efficiency heuristics when deciding to vectorize or not. Code that vectorizes only when this pragma is added might be slower.
pragma ivdep tells compiler to ignore assumed data dependences that inhibit vectorization(for example loop carried dependences), but not proven ones. For example it might assume to pointers aren't pointing to the same memory and vectorize. However, it won't ignore a proven loop carried dependence(a[i] = a[i - 1] * c), but pragma simd might.
A reason your code might have vectorized only with the pragma simd is a proven dependence was ignored. You might want to verify your program output is correct.
Source: Intel specific pragmas documentation(http://software.intel.com/en-us/node/462880)
#pragma simd
is an explicit vectorization tool given to the developer to enforce vectorization as mentioned at https://software.intel.com/en-us/node/514582 while #pragma vector
is a tool which is used to indicate the compiler that loop should be vectorized based on its argument(s). Here the argument is always
, which means "neglect the cost/efficiency heuristics of the compiler and go ahead with vectorization". More information on #pragma vector
is available at https://software.intel.com/en-us/node/514586. That doesn't mean #pragma simd
produces wrong results it succeeds in vectorizing a loop where #pragma vector always
failed to vectorize. When #pragma simd
is used with right set of clauses, it can vectorize and still produce a correct result.
Below is a small code snippet which demonstrates that:
void foo(float *a, float *b, float *c, int N) { #pragma vector always #pragma ivdep //#pragma simd vectorlength(2) for(int i = 2; i < N; i++) a[i] = a[i-2] + b[i] + c[i]; return; }
Compiling this code using ICC will produce the following vectorization report:
$ icc -c -vec-report2 test11.cc
test11.cc(5): (col. 1) remark: loop was not vectorized: existence of vector dependence
By default ICC targets SSE2 which uses 128 bits XMM registers. 4 floats can be accommodated in one XMM register but when you try to accommodate vector of 4 floats, there is a vector dependence. So what #pragma vector always emits is right. But instead of 4, if we consider just 2 floats, we can vectorize this loop without corrupting the results. The vectorization report for the same is shown below:
void foo(float *a, float *b, float *c, int N){
//#pragma vector always
//#pragma ivdep
#pragma simd vectorlength(2)
for(int i = 2; i < N; i++)
a[i] = a[i-2] + b[i] + c[i];
return;
}
$ icc -c -vec-report2 test11.cc
test11.cc(5): (col. 1) remark: SIMD LOOP WAS VECTORIZED
But #pragma vector
doesn't have a clause which can explicitly specify the vector length to consider while vectoring the loop. This is where #pragma simd
can really come in handy.
When used with right clauses which best explains the computation in vector fashion, the compiler will generate the requested vector which will not generate wrong results. The Intel(R) Cilk(TM) Plus White Paper published at https://software.intel.com/sites/default/files/article/402486/intel-cilk-plus-white-paper.pdf has a section for "Usage of $pragma simd vectorlength clause" and "Usage of $pragma simd reduction and private clause" which explains how to pragma simd clause with right clauses. The clauses help the developer express to the compiler what he wants to achieve and the compiler generates the vector code accordingly. Is it highly recommended to use #pragma simd with relevant clauses wherever needed to best express the loop logic to the compiler.
Also traditionally inner loops are targeted for vectorization but pragma simd can be used for vectorizing outer loops too. More information on this available at https://software.intel.com/en-us/articles/outer-loop-vectorization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With