Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting float vector to 16-bit int without saturating

I want to convert a floating point value to a 16-bit unsigned integer without saturating (wraparound/overflow instead).

#include <iostream>
#include <xmmintrin.h>

void satur_wrap()
{
    const float bigVal = 99000.f;
    const __m128 bigValVec = _mm_set1_ps(bigVal);

    const __m64 outVec64 =_mm_cvtps_pi16(bigValVec);

#if 0
    const __m128i outVec = _mm_movpi64_epi64(outVec64);
#else

    #if 1
        const __m128i outVec  = _mm_packs_epi32(_mm_cvttps_epi32(bigValVec), _mm_cvttps_epi32(bigValVec));
    #else
        const __m128i outVec  = _mm_cvttps_epi32(bigValVec);
    #endif

#endif

    uint16_t *outVals = NULL;
    posix_memalign((void **) &outVals, sizeof(__m128i), sizeof(__m128i));

    _mm_store_si128(reinterpret_cast<__m128i *>(outVals), outVec);

    for (int i = 0; i < sizeof(outVec) / sizeof(*outVals); i++)
    {
        std::cout << "outVals[" << i << "]: " << outVals[i] << std::endl;
    }

    std::cout << std::endl
        << "\tbigVal: " << bigVal << std::endl
        << "\t(unsigned short) bigVal: " << ((unsigned short) bigVal)  << std::endl
        << "\t((unsigned short)((int) bigVal)): " << ((unsigned short)((int) bigVal)) << std::endl
        << std::endl;
}

Sample execution:

$ ./row
outVals[0]: 32767
outVals[1]: 32767
outVals[2]: 32767
outVals[3]: 32767
outVals[4]: 32767
outVals[5]: 32767
outVals[6]: 32767
outVals[7]: 32767

        bigVal: 99000
        (unsigned short) bigVal: 65535
        ((unsigned short)((int) bigVal)): 33464

The ((unsigned short)((int) bigVal)) expression works as desired (but it's probably UB, right?). But I can't find something quite similar with SSE. I must be missing something, but I couldn't find a primitive to convert four 32-bit floats to four 32-bit ints.


EDIT: Oops, I figured it would be "normal" for 32-bit integer -> 16-bit unsigned integer conversion to use wraparound. But I've since learned that _mm_packs_epi32 uses signed-saturate (and there doesn't appear to be a _mm_packus_epi32). Is there a way to set the mode, or another primitive besides _mm_packus_epi32?

like image 306
Brian Cain Avatar asked Aug 25 '12 03:08

Brian Cain


2 Answers

I think you're probably looking for the CVTTPS2DQ instruction, the intrinsic for which is _mm_cvttps_epi32. See: http://msdn.microsoft.com/en-us/library/c8c5hx3b(v=vs.71).aspx#vcref_mm_cvttps_epi32


Here is a complete implementation which takes 2 x SSE float vectors and converts them to a single packed 8 x 16 bit unsigned vector with wraparound:

#include <stdio.h>
#include <tmmintrin.h>

__m128i vec_float_to_short(const __m128 v1, const __m128 v2)
{
    __m128i v1i = _mm_cvttps_epi32(v1);
    __m128i v2i = _mm_cvttps_epi32(v2);
    v1i = _mm_shuffle_epi8(v1i, _mm_setr_epi8(0, 1, 4, 5, 8, 9, 12, 13, 255, 255, 255, 255, 255, 255, 255, 255));
    v2i = _mm_shuffle_epi8(v2i, _mm_setr_epi8(255, 255, 255, 255, 255, 255, 255, 255, 0, 1, 4, 5, 8, 9, 12, 13));
    return _mm_or_si128(v1i, v2i);
}

int main(void)
{
    __m128 v1 = _mm_setr_ps(0.0f, 1.0f, -1.0f, 32767.0f);
    __m128 v2 = _mm_setr_ps(-32768.0f, 32768.0f, 99999.0f, -99999.0f);
    __m128i v3 = vec_float_to_short(v1, v2);

    printf("v1 = %vf\n", v1);
    printf("v2 = %vf\n", v2);
    printf("v3 = %vhu\n", v3);

    return 0;
}

Note that this uses PSHUFB (_mm_shuffle_epi8) which requires SSSE3 aka SSE3.5 aka MNI (see tmmintrin.h), so this will only work on a reasonably current CPU (anything from Intel in the last 6 years or so).

$ gcc -Wall -mssse3 vec_float_to_short.c -o vec_float_to_short
$ ./vec_float_to_short 
v1 = 0.000000 1.000000 -1.000000 32767.000000
v2 = -32768.000000 32768.000000 99999.000000 -99999.000000
v3 = 0 1 65535 32767 32768 32768 34463 31073
$ 

Note that not all versions of gcc support the printf v format specifier for SIMD vectors (I'm using Apple's gcc on OS X in this instance).

like image 71
Paul R Avatar answered Nov 14 '22 08:11

Paul R


I'm answering only part of the question concerning 32-bit integer -> 16-bit unsigned integer conversion.

Since you need a wraparound, just take the low-order word of each double-word containing 32-bit integer. These 16-bit integers are interleaved with 16-bit pieces of unused data, so it may be convenient to pack them into a contiguous array. The easiest way to do this is using _mm_shuffle_epi8 intrinsic (SSSE3).

If you want your program to be more portable and require only SSE2 instruction set, you can pack the values with _mm_packs_epi32, but disable its saturating behavior with following trick:

x = _mm_slli_epi32(x, 16);
y = _mm_slli_epi32(y, 16);

x = _mm_srai_epi32(x, 16);
y = _mm_srai_epi32(y, 16);

x = _mm_packs_epi32(x, y);

This trick works because it performs sign extension of 16-bit values, which makes signed saturation a no-op.

The same trick works with _mm_packus_epi32:

x = _mm_and_si128(x, _mm_set1_epi32(65535));
y = _mm_and_si128(y, _mm_set1_epi32(65535));
x = _mm_packus_epi32(x, y);

This trick works because it performs zero extension of 16-bit values, which makes unsigned saturation a no-op. It is easier to perform zero extension, but you need SSE4.1 instruction set to make _mm_packus_epi32 available.

It is possible to pack 8 16-bit integers using a single instruction: _mm_perm_epi8. But this requires pretty rare XOP instruction set.


And here are several words about saturated conversion.

In fact _mm_packus_epi32 intrinsic is available if you change #include <xmmintrin.h> to #include <smmintrin.h> or #include <x86intrin.h>. You need both your CPU and compiler to support SSE4.1 extensions.

If you have no SSE4.1-compatible CPU or compiler or want your program to be more portable, substitute _mm_packus_epi32 intrinsic with code like this:

__m128i m1 = _mm_cmpgt_epi32(x, _mm_set1_epi32(0));
__m128i m2 = _mm_cmpgt_epi32(x, _mm_set1_epi32(65535));
x = _mm_and_si128(x, m1);
x = _mm_or_si128(x, m2);
like image 6
Evgeny Kluev Avatar answered Nov 14 '22 08:11

Evgeny Kluev