In SSE, if I have a 128-bit register containing 4 floats i.e.
A = a b c d ('a','b','c','d' are floats and 'A' is a 128-bit SSE register)
and
B = e f g h
then if I want
C = a e b f
I can simply do:
C = _mm_unpacklo_ps(A,B);
Similarly if I want
D = c g d h
I can do:
D = _mm_unpackhi_ps(A,B);
If I have an AVX register containing doubles, is it possible to do the same with a single instruction?
Based on how these intrinsics work, I know that I can't use _mm256_unpacklo_pd()
, _mm256_shuffle_pd()
, _mm256_permute2f128_pd()
or _mm256_blend_pd()
. Is there any instruction apart from these that I can use or do I have to use a combination of the above instructions?
One way that I can think of is the following:
A1 = _mm256_unpacklo_pd(A,B);
A2 = _mm256_unpackhi_pd(A,B);
C = _mm256_permute2f128_pd(A1,A2,0x20);
D = _mm256_permute2f128_pd(A1,A2,0x31);
If anyone has a better solution, please do post below.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With