Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSE: convert short integer to float

Tags:

x86

simd

sse

I want to convert an array of unsigned short numbers to float using SSE. Let's say

__m128i xVal;     // Has 8 16-bit unsigned integers
__m128 y1, y2;    // 2 xmm registers for 8 float values

I want first 4 uint16 in y1 & next 4 uint16 in y2. Need to know which sse intrinsic to use.

like image 923
Krishnaraj Avatar asked Feb 06 '12 14:02

Krishnaraj


2 Answers

You need to first unpack your vector of 8 x 16 bit unsigned shorts into two vectors of 32 bit unsigned ints, then convert each of these vectors to float:

__m128i xlo = _mm_unpacklo_epi16(x, _mm_set1_epi16(0));
__m128i xhi = _mm_unpackhi_epi16(x, _mm_set1_epi16(0));
__m128 ylo = _mm_cvtepi32_ps(xlo);
__m128 yhi = _mm_cvtepi32_ps(xhi);
like image 193
Paul R Avatar answered Nov 19 '22 03:11

Paul R


I would suggest to use a slightly different version:

static const __m128i magicInt = _mm_set1_epi16(0x4B00);
static const __m128 magicFloat = _mm_set1_ps(8388608.0f);

__m128i xlo = _mm_unpacklo_epi16(x, magicInt);
__m128i xhi = _mm_unpackhi_epi16(x, magicInt);
__m128 ylo = _mm_sub_ps(_mm_castsi128_ps(xlo), magicFloat);
__m128 yhi = _mm_sub_ps(_mm_castsi128_ps(xhi), magicFloat);

On assembly level the only difference from Paul R version is usage of _mm_sub_ps (SUBPS instruction) instead of _mm_cvtepi32_ps (CVTDQ2PS instruction). _mm_sub_ps is never slower than _mm_cvtepi32_ps, and is actually faster on old CPUs and on low-power CPUs (read: Intel Atom and AMD Bobcat)

like image 39
Marat Dukhan Avatar answered Nov 19 '22 02:11

Marat Dukhan