I noticed that when vectorizing a loop in a C program, the speedup achieved is much greater when using operands of the type float compared to double operands.
Example:
for (int i = 0; i < N; i++) {
a[i] += b[i] * c[i];
}
When a, b and c arrays of size 20,000 each and I repeat this loop 1,000,000:
Without vectorization it takes around 24 seconds with both floats and doubles
With auto vectorization (compiling with -O1 -ftree-vectorize) it takes 7 seconds with floats and 21 seconds with doubles
With OpenMP (#pramga omp simd) it is similar to the above bullet point.
What could be the reason for this?
Edit: Further information:
With many of these operations SIMD (single-instruction, multiple data) instructions come into play. Floats are half the size of doubles and so twice as many of them can be dealt with in a single instruction. I am,however, surprised that the speed is triple using floats rather than simply doubled. I suspect, but don't know for sure, that is due to floats being much easier to operate on - the actual mantissa extractions etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With