I know this should be a Googling question but I just cannot find the answer.
Say I have an __m128
variable a
, whose content is a[0]
, a[1]
, a[2]
, a[3]
. Is there a single function that can reverse it to be a[3]
, a[2]
, a[1]
, a[0]
?
Use _mm_shuffle_ps(). This instruction was already available in SSE and can gather 4 32-bit components in a single vector by combining two arbitrary 32-bit components from each of the two input vectors.
How to create the mask using the macro _MM_SHUFFLE()
The macro is defined as follows:
/* Create a selector for use with the SHUFPS instruction. */
#define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \
(((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0))
Source and destination indices run from right to left in ascending order.
The first two selector values (fp0
and fp1
) designate source components in m1
,
the last two (fp2
and fp3
) the ones in m2
. Each selected source component is assigned to m3[index]
, where index
corresponds to its selector parameter fp<index>
.
Reversing 32-bit components in a vector
__m128 input = ...;
__m128 reversed = _mm_shuffle_ps(input,input,_MM_SHUFFLE(0, 1, 2, 3));
Note: The mask is an immediate value. It cannot be dynamic, as it is part of the resulting machine instruction.
Intel Intrinsics Guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With