I am doing some image processing, for which I benefit from vectorization.
I have a function that vectorizes ok, but for which I am not able to convince the compiler that the input and output buffer have no overlap, and so no alias checking is necessary.
I should be able to do so using __restrict__
, but if the buffers are not defined as __restrict__
when arriving as function argument, there is no way to convince the compiler that I am absolutely sure that 2 buffers will never overlap.
This is the function:
__attribute__((optimize("tree-vectorize","tree-vectorizer-verbose=6")))
void threshold(const cv::Mat& inputRoi, cv::Mat& outputRoi, const unsigned char th) {
const int height = inputRoi.rows;
const int width = inputRoi.cols;
for (int j = 0; j < height; j++) {
const uint8_t* __restrict in = (const uint8_t* __restrict) inputRoi.ptr(j);
uint8_t* __restrict out = (uint8_t* __restrict) outputRoi.ptr(j);
for (int i = 0; i < width; i++) {
out[i] = (in[i] < valueTh) ? 255 : 0;
}
}
}
The only way I can convince the compiler to not perform the alias checking is if I put the inner loop in a separate function, in which the pointers are defined as __restrict__
arguments. If I declare this inner function as inlined, again the alias checking is activated.
You can see the effect also with this example, which I think is consistent: http://goo.gl/7HK5p7
(Note: I know there might be better ways of writing the same function, but in this case I am just trying to understand how to avoid alias check)
Edit:
Problem is solved!! (See answer below)
Using gcc 4.9.2, here is the complete example. Note the use of the compiler flag -fopt-info-vec-optimized
in place of the superseded -ftree-vectorizer-verbose=N
.
So, for gcc, use #pragma GCC ivdep
and enjoy! :)
There are two ways to vectorize a loop computation in a C/C++ program. Programmers can use intrinsics inside the C/C++ source code to tell compilers to generate specific SIMD instructions so as to vectorize the loop computation. Or, compilers may be setup to vectorize the loop computation automatically.
GCC Autovectorization flagsGCC is an advanced compiler, and with the optimization flags -O3 or -ftree-vectorize the compiler will search for loop vectorizations (remember to specify the -mavx flag too). The source code remains the same, but the compiled code by GCC is completely different.
Vectorization is the use of vector instructions to speed up program execution. Vectorization can be done both by programmers by explicitly writing vector instructions and by a compiler. The latter case is called Auto Vectorization .
if you are using Intel compiler, you can try to include the line:
#pragma ivdep
The following paragraph is quoted from Intel compiler user manual:
The ivdep pragma instructs the compiler to ignore assumed vector dependencies. To ensure correct code, the compiler treats an assumed dependence as a proven dependence, which prevents vectorization. This pragma overrides that decision. Use this pragma only when you know that the assumed loop dependencies are safe to ignore.
In gcc, one should add the line:
#pragma GCC ivdep
inside the function and right before the loop you want to vectorize (see documentation). This is only supported starting from gcc 4.9 and, by the way, makes the use of __restrict__
redundant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With