Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rotate an SSE/AVX vector

I need to perform a rotate operation with as little clock cycles as possible. In the first case let's assume __m128i as source and dest type:

source: || A0 || A1 || A2 || A3 ||
  dest: || A1 || A2 || A3 || A0 ||
dest = (__m128i)_mm_shuffle_epi32((__m128i)source, _MM_SHUFFLE(0,3,2,1));

Now I want to do the same with AVX intrinsics. So let's assume this time __m256i as source and dest type:

source: || A0 || A1 || A2 || A3 || A4 || A5 || A6 || A7 ||
  dest: || A1 || A2 || A3 || A4 || A5 || A6 || A7 || A0 ||

The AVX intrinsics is missing most of the corresponding SSE integer operations. Maybe there is some way go get the desired output working with the floating point version.

I've tried with:

dest = (__m256i)_mm256_shuffle_ps((__m256)source, (__m256)source, _MM_SHUFFLE(0,3,2,1));

but what I get is:

|| A0 || A2 || A3 || A4 || A5 || A6 || A7 || A1 ||

Any Idea on how to solve this in an efficient way? (without mixing SSE and AVX operation and without "manually" inverting A0 and A1

Thanks in advance!

like image 995
user1584773 Avatar asked Aug 10 '12 17:08

user1584773


2 Answers

My solution:

__m256 tmp =  ( __m256 ) _mm256_permute_ps((__m256)_source, _MM_SHUFFLE ( 0,3,2,1 ));
* ( _dest ) =  ( __m256i) _mm256_blend_ps(tmp, _mm256_permute2f128_ps ( tmp,tmp,1 ), 136);  
like image 169
user1584773 Avatar answered Sep 21 '22 01:09

user1584773


I have not yet checked how things are with AVX, but at least for SSE, did you consider _mm_align*?

For instance, this rotates a byte vector by 2 bytes:

__m128i v;
v = _mm_alignr_epi8 (v, v, 2) // v = v[2,3,4,5,6,7,8,9,10,11,12,13,14,15,0,1]

This can be a single instruction. Also such operations are lat 1 / tp 1, i.e. fast.

AVX is likely a bit of a hassle with this approach, so an adaptation may not be useful.

like image 43
mafu Avatar answered Sep 21 '22 01:09

mafu