Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatic vectorization with g++ of a loop with bit operations

Is it possible to vectorize this loop (with g++)?

char x;
int k;
for(int s = 0; s < 4; s++) {
  A[k++] += B[x&3];
  x >>= 2;
}

A and B are pointers to non-overlapping float arrays; B has indices 0 to 3. I need to maximize portability as this is for an R package, so the best would be to rewrite in such a way that g++ would be able to vectorize it alone, as I don’t know how to make SSE code portable in this context (the package RcppEigen makes the library Eigen available so it is possible).

Many thanks in advance for your thoughts.

P.S. The code in which it is nested looks like

int k = 0;
for(size_t j = 0; j < J; j++) {
  char x = data[j];
  for(int s = 0; s < 4; s++) {
    A[k++] += B[x&3];
    x >>= 2;
  }
}
like image 925
Elvis Avatar asked Dec 14 '22 10:12

Elvis


1 Answers

There is a solution with using of AVX2 :

__m256 _B = _mm256_setr_ps(B[0], B[1], B[2], B[3], B[0], B[1], B[2], B[3]);
__m256i _shift = _mm256_setr_epi32(0, 2, 4, 6, 8, 10, 12, 14);
__m256i _mask = _mm256_set1_epi32(3);
for (size_t j = 0; j < J/2; j++)
{
    short x = ((short*)data)[j];
    __m256i _index = _mm256_and_si256(_mm256_srlv_epi32(_mm256_set1_epi32(x), _shift), _mask);
    _mm256_storeu_ps(A, _mm256_add_ps(_mm256_loadu_ps(A), _mm256_permutevar8x32_ps(_B, _index)));
    A += 8;
}
like image 152
ErmIg Avatar answered Feb 17 '23 13:02

ErmIg