Suppose I have a 2 element vector defines as follows (using the GCC syntax for packed vectors)
// packed vector of 2-elements
typedef double v2d __attribute__((vector_size(sizeof(double)*2)));
v2d x = ...;
double y = ...;
x[0] = pow(x[0], y)
x[1] = pow(x[1], y)
I'd like to know if there's a faster way to do the two power computations using vector operations. The architecture is GCC on x86-64 and platform specific code is OK.
Yes, this should be possible if you have no special cases (negative numbers, 0, 1, NaN etc...) so that the code path is linear.
Here is the generic code for the pow
function for IEEE754 doubles, it has no looping constructs, so if you flesh out all the special cases, vectorization seems straightforward. Have fun.
You can loop over the elements directly and with the right options GCC and ICC will use a vectorized pow
function
#include <math.h>
typedef double vnd __attribute__((vector_size(sizeof(double)*2)));
vnd foo(vnd x, vnd y) {
#pragma omp simd
for(int i=0; i<2; i++) x[i] = pow(x[i], y[i]);
return x;
}
With just -O2
ICC generates simply call __svml_pow2
. SVML (Short Vector Math Library) is Intel's vectorized math library. With -Ofast -fopenmp
GCC generates simply call _ZGVbN2vv___pow_finite
.
Clang does not vectorize it.
https://godbolt.org/g/pjpzFX
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With