Is it possible to vectorize this loop (with g++)?
char x;
int k;
for(int s = 0; s < 4; s++) {
A[k++] += B[x&3];
x >>= 2;
}
A
and B
are pointers to non-overlapping float arrays; B
has indices 0 to 3. I need to maximize portability as this is for an R
package, so the best would be to rewrite in such a way that g++ would be able to vectorize it alone, as I don’t know how to make SSE code portable in this context (the package RcppEigen
makes the library Eigen
available so it is possible).
Many thanks in advance for your thoughts.
P.S. The code in which it is nested looks like
int k = 0;
for(size_t j = 0; j < J; j++) {
char x = data[j];
for(int s = 0; s < 4; s++) {
A[k++] += B[x&3];
x >>= 2;
}
}
There is a solution with using of AVX2 :
__m256 _B = _mm256_setr_ps(B[0], B[1], B[2], B[3], B[0], B[1], B[2], B[3]);
__m256i _shift = _mm256_setr_epi32(0, 2, 4, 6, 8, 10, 12, 14);
__m256i _mask = _mm256_set1_epi32(3);
for (size_t j = 0; j < J/2; j++)
{
short x = ((short*)data)[j];
__m256i _index = _mm256_and_si256(_mm256_srlv_epi32(_mm256_set1_epi32(x), _shift), _mask);
_mm256_storeu_ps(A, _mm256_add_ps(_mm256_loadu_ps(A), _mm256_permutevar8x32_ps(_B, _index)));
A += 8;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With