Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intel compiler cannot vectorize this simple loop?

So I have the following code which seems very simple to me:

#define MODS_COUNT 5

int start1 = <calc at runtime>;
int start2 = <calc at runtime>;

for (int j=0; j<MODS_COUNT; j++) // loop 5 times doing simple addition.
    logModifiers[start1 +  j] += logModsThis[start2 + j];

This loop is part of an outer loop (not sure if this makes a difference)

The compiler says: message : loop was not vectorized: vectorization possible but seems inefficient.

Why can't this loop be vectorised? it seems very simple to me. How can I force vectorisation and check performance myself?

I have Intel C++ Compiler 2013 update 3.

Full code is here if anyone is interested: http://pastebin.com/Z6H5ZejW

Edit: I understand that the compiler decided that it's inefficient. I'm asking:

Why is it inefficient?

How can I force it so that I can benchmark myself?

Edit2: If I change it to 4 instead of 5 then it gets vectorised. What makes 5 inefficient? I thought it can be done in 2 instructions, the first does 4 and the second is "normal" does 1, instead of 5 instructions.

like image 950
SpaceMonkey Avatar asked Apr 30 '13 09:04

SpaceMonkey


1 Answers

According to vectorization in intel compilers :

There are SIMD(Single instruction multiple data) registers which are 128 byte long. so if sizeof(int) is 4 then 4 integers can sit in these registers and a single instruction can perform on these 4 ints.(this also depends if same type of operation is done on these ints, here its true. more over each element of the array on LHS is dependant on a different element of a different array.)

if there are 8 ints then two instructions are required.(instead of 8 without vectorization).

but if 5(or 6 or 7) ints are there then that too will require two instructions. which might be not better than without vectorization code.

further reading LINK.

like image 169
Koushik Shetty Avatar answered Nov 15 '22 07:11

Koushik Shetty