I understand how _mm_shuffle_ps
work. For example, in the following.
__m128 r = _mm_shuffle_ps(x,y, _MM_SHUFFLE(2,0,2,0));
r
will have contents, x[0]
, x[2]
, y[0]
, y[2]
.
But I see that _MM_SHUFFLE
also takes 4 parameters for _mm256_shuffle_ps
, while there the vectors would have 8 elements each. So, logically _MM_SHUFFLE
should have taken 8 parameters. Can someone please explain how this works?
_mm256_shuffle_ps
shuffles each of the two 128-bits lanes independently, as if _mm_shuffle_ps
is called upon two XMM. If you want to shuffle all 8 32-bits elements, you need _mm256_permutevar8x32_ps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With