I'm trying to optimize my code using SSE intrinsics but am running into a problem where I don't know of a good way to extract the integer values from a vector after I've done the SSE intrinsics operations to get what I want.
Does anyone know of a good way to do this? I'm programming in C and my compiler is gcc version 4.3.2.
Thanks for all your help.
It depends on what you can assume about the minimum level of SSE support that you have.
Going all the way back to SSE2 you have _mm_extract_epi16
(PEXTRW
) which can be used to extract any 16 bit element from a 128 bit vector. You would need to call this twice to get the two halves of a 32 bit element.
In more recent versions of SSE (SSE4.1 and later) you have _mm_extract_epi32
(PEXTRD
) which can extract a 32 bit element in one instruction.
Alternatively if this is not inside a performance-critical loop you can just use a union, e.g.
typedef union
{
__m128i v;
int32_t a[4];
} U32;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With