So I have the following code which seems very simple to me:
#define MODS_COUNT 5
int start1 = <calc at runtime>;
int start2 = <calc at runtime>;
for (int j=0; j<MODS_COUNT; j++) // loop 5 times doing simple addition.
logModifiers[start1 + j] += logModsThis[start2 + j];
This loop is part of an outer loop (not sure if this makes a difference)
The compiler says:
message : loop was not vectorized: vectorization possible but seems inefficient.
Why can't this loop be vectorised? it seems very simple to me. How can I force vectorisation and check performance myself?
I have Intel C++ Compiler 2013 update 3.
Full code is here if anyone is interested: http://pastebin.com/Z6H5ZejW
Edit: I understand that the compiler decided that it's inefficient. I'm asking:
Why is it inefficient?
How can I force it so that I can benchmark myself?
Edit2: If I change it to 4 instead of 5 then it gets vectorised. What makes 5 inefficient? I thought it can be done in 2 instructions, the first does 4 and the second is "normal" does 1, instead of 5 instructions.
According to vectorization in intel compilers :
There are SIMD(Single instruction multiple data) registers which are 128 byte long. so if sizeof(int) is 4
then 4
integers can sit in these registers and a single instruction can perform on these 4
int
s.(this also depends if same type of operation is done on these int
s, here its true. more over each element of the array on LHS is dependant on a different element of a different array.)
if there are 8 int
s then two instructions are required.(instead of 8 without vectorization).
but if 5(or 6 or 7) int
s are there then that too will require two instructions. which might
be not better than without vectorization code.
further reading LINK.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With