Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do an indirect load (gather-scatter) in AVX or SSE instructions?

Tags:

c

vector

avx

intel

sse

I've been searching for a while now, but can't seem to find anything useful in the documentation or on SO. This question didn't really help me out, since it makes references to modifying the assembly and I am writing in C.

I have some code making indirect accesses that I want to vectorize.

for (i = 0; i < LENGTH; ++i) {
   foo[bar[i]] *= 2;
}

Since I have the indices I want to double inside bar, I was wondering if there was a way to load those indices of foo into a vector register and then I could apply my math and store it back to the same indices.

Something like the following. The load and store instructions I just made up because I couldn't find anything like them in the AVX or SSE documentation. I think I read somewhere that AVX2 has similar functions, but the processor I'm working with doesn't support AVX2.

for (i = 0; i < LENGTH; i += 8) {
   // For simplicity, I'm leaving out any pointer type casting
   __m256 ymm0 = _mm256_load_indirect(bar+i);
   __m256 ymm1 = _mm256_set1_epi32(2); // Set up vector of just 2's
   __m256 ymm2 = _mm256_mul_ps(ymm0, ymm1);
   _mm256_store_indirect(ymm2, bar+i);
}

Are there any instructions in AVX or SSE that will allow me to load a vector register with an array of indices from a different array? Or any "hacky" ways around it if there isn't an explicit function?

like image 697
The Unknown Dev Avatar asked May 01 '16 20:05

The Unknown Dev


People also ask

What are gather scatter operations?

Gather-scatter is a type of memory addressing that often arises when addressing vectors in sparse linear algebra operations. It is the vector-equivalent of register indirect addressing, with gather involving indexed reads and scatter indexed writes.

How does scatter gather mechanism work?

Scatter/Gather: the Basic IdeaCaches are built from rows – you want one piece of data, you get the whole row. If you want to manage your performance tightly, then you try to have as many related variables as possible on the same row so that you get more bang for your caching buck and reduce your cache misses.


1 Answers

(I' writing an answer to this old question as I think it may help others.)

Short answer

No. There are no scatter/gather instructions in the SSE and AVX instruction sets.

Longer answer

Scatter/gather instructions are expensive to implement (in terms of complexity and silicon real estate) because scatter/gather mechanism needs to be deeply intertwined with the cache memory controller. I believe this is the reason that this functionality was missing from SSE/AVX.

For newer instruction sets the situation is different. In AVX2 you have

  • VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS for floating point gather (intrinsics here)
  • VPGATHERDD, VPGATHERQD, VPGATHERDQ, VPGATHERQQ for integer gather (intrinsics here)

In AVX-512 we got

  • VSCATTERDPD, VSCATTERDPS, VSCATTERQPD, VSCATTERQPS for floating point scatter (intrinsics here)
  • VPSCATTERDD, VPSCATTERQD, VPSCATTERDQ, VPSCATTERQQ for integer scatter (intrinsics here)

However, it is still a question whether using scatter/gather for such a simple operation would actually pay off.

like image 133
Pibben Avatar answered Sep 19 '22 20:09

Pibben