Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is vectorization not beneficial in this for loop?

I am trying to vectorize this for loop. After using the Rpass flag, I am getting the following remark for it:

int someOuterVariable = 0;

for (unsigned int i = 7; i != -1; i--)
{
  array[someOuterVariable + i] -= 0.3 * anotherArray[i];
}

Remark:
The cost-model indicates that vectorization is not beneficial
the cost-model indicates that interleaving is not beneficial

I want to understand what this means. Does "interleaving is not benificial" mean the array indexing is not proper?

like image 596
The Doctor Avatar asked Jan 12 '21 08:01

The Doctor


People also ask

Why vectorization is faster than for loops?

A major reason why vectorization is faster than its for loop counterpart is due to the underlying implementation of Numpy operations. As many of you know (if you're familiar with Python), Python is a dynamically typed language.

What is difference between vectorization and loops?

"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.

What does it mean to vectorize a loop?

• Loop vectorization transforms a program so that the. same operation is performed at the same time on several. vector elements. for (i=0; i<n; i++) c[i] = a[i] + b[i];

Why is vectorization considered a powerful method for optimizing numerical code?

Vectorization is a type of parallel processing. It enables more computer hardware to be devoted to performing the computation, so the computation is done faster.


1 Answers

It's hard to answer without more details about your types. But in general, starting a loop incurs some costs and vectorising also implies some costs (such as moving data to/from SIMD registers, ensuring proper alignment of data)

I'm guessing here that the compiler tells you that the vectorisation cost here is bigger than simply running the 8 iterations without it, so it's not doing it.

Try to increase the number of iterations, or help the compiler for computing alignement for example.

Typically, unless the type of array's item are exactly of the proper alignment for SIMD vector, accessing an array from a "unknown" offset (what you've called someOuterVariable) prevents the compiler to write an efficient vectorisation code.

EDIT: About the "interleaving" question, it's hard to guess without knowning your tool. But in general, interleaving usually means mixing 2 streams of computations so that the compute units of the CPU are all busy. For example, if you have 2 ALU in your CPU, and the program is doing:

c = a + b;
d = e * f;

The compiler can interleave the computation so that both the addition and multiplication happens at the same time (provided you have 2 ALU available). Typically, this means that the multiplication which is a bit longer to compute (for example 6 cycles) will be started before the addition (for example 3 cycles). You'll then get the result of both operation after only 6 cycles instead of 9 if the compiler serialized the computations. This is only possible if there is no dependencies between the computation (if d required c, it can not work). A compiler is very cautious about this, and, in your example, will not apply this optimization if it can't prove that array and anotherArray don't alias.

like image 56
xryl669 Avatar answered Oct 13 '22 00:10

xryl669