I want to convert an array of unsigned short numbers to float using SSE. Let's say
__m128i xVal; // Has 8 16-bit unsigned integers
__m128 y1, y2; // 2 xmm registers for 8 float values
I want first 4 uint16 in y1 & next 4 uint16 in y2. Need to know which sse intrinsic to use.
You need to first unpack your vector of 8 x 16 bit unsigned shorts into two vectors of 32 bit unsigned ints, then convert each of these vectors to float:
__m128i xlo = _mm_unpacklo_epi16(x, _mm_set1_epi16(0));
__m128i xhi = _mm_unpackhi_epi16(x, _mm_set1_epi16(0));
__m128 ylo = _mm_cvtepi32_ps(xlo);
__m128 yhi = _mm_cvtepi32_ps(xhi);
I would suggest to use a slightly different version:
static const __m128i magicInt = _mm_set1_epi16(0x4B00);
static const __m128 magicFloat = _mm_set1_ps(8388608.0f);
__m128i xlo = _mm_unpacklo_epi16(x, magicInt);
__m128i xhi = _mm_unpackhi_epi16(x, magicInt);
__m128 ylo = _mm_sub_ps(_mm_castsi128_ps(xlo), magicFloat);
__m128 yhi = _mm_sub_ps(_mm_castsi128_ps(xhi), magicFloat);
On assembly level the only difference from Paul R version is usage of _mm_sub_ps (SUBPS instruction) instead of _mm_cvtepi32_ps (CVTDQ2PS instruction). _mm_sub_ps is never slower than _mm_cvtepi32_ps, and is actually faster on old CPUs and on low-power CPUs (read: Intel Atom and AMD Bobcat)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With