Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSE intrinsic over int16[8] to extract the sign of each element

Tags:

c

x86

simd

sign

sse

I'm working with SSE intrinsic functions. I have an __m128i representing an array of 8 signed short (16 bit) values.

Is there a function to get the sign of each element?

EDIT1: something that can be used like this:

short tmpVec[8];
__m128i tmp, sgn;

for (i-0;i<8;i++)
    tmp.m128i_i16[i] = tmpVec[i]

sgn = _mm_sign_epi16(tmp);

of course "_mm_sign_epi16" doesn't exist, so that's what I'm looking for.

How slow it is to do it element by element?

EDIT2: desired behaviour: 1 for positive values, 0 for zero, and -1 for negative values.

thanks

like image 540
Michele Avatar asked Apr 25 '14 12:04

Michele


2 Answers

You can use min/max operations to get the desired result, e.g.

inline __m128i _mm_sgn_epi16(__m128i v)
{
    v = _mm_min_epi16(v, _mm_set1_epi16(1));
    v = _mm_max_epi16(v, _mm_set1_epi16(-1));
    return v;
}

This is probably a little more efficient than explicitly comparing with zero + shifting + combining results.

Note that there is already an _mm_sign_epi16 intrinsic in SSSE3 (PSIGNW - see tmmintrin.h), which behaves somewhat differently, so I changed the name for the required function to _mm_sgn_epi16. Using _mm_sign_epi16 might be more efficient when SSSE3 is available however, so you could do something like this:

inline __m128i _mm_sgn_epi16(__m128i v)
{
#ifdef __SSSE3__
    v = _mm_sign_epi16(_mm_set1_epi16(1), v); // use PSIGNW on SSSE3 and later
#else
    v = _mm_min_epi16(v, _mm_set1_epi16(1));  // use PMINSW/PMAXSW on SSE2/SSE3.
    v = _mm_max_epi16(v, _mm_set1_epi16(-1));
#endif
    return v;
}
like image 198
Paul R Avatar answered Nov 04 '22 13:11

Paul R


Fill a register of zeros, and compare it with your register, first with "greater than", than with "lower than" (or invert the order of the operands in the "greater than" instruction).
http://msdn.microsoft.com/en-us/library/xd43yfsa%28v=vs.90%29.aspx
http://msdn.microsoft.com/en-us/library/t863edb2%28v=vs.90%29.aspx

The problem at this point is that the true value is represented as 0xffff, which happens to be -1, correct result for the negative number but not for the positive. However, as pointed out by Raymond Chen in the comments, 0x0000 - 0xffff = 0x0001, so it's enough now to subtract the result of "greater than" from the result of "lower than". http://msdn.microsoft.com/en-us/library/y25yya27%28v=vs.90%29.aspx

Of course Paul R answer is preferable, as it uses only 2 instructions.

like image 1
Antonio Avatar answered Nov 04 '22 15:11

Antonio