Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I convert a vector of float to short int using avx instructions?

Tags:

c++

c

gcc

avx

avx2

Basically how can I write the equivalent of this with AVX2 intrinsics? We assume here that result_in_float is of type __m256, while result is of type short int* or short int[8].

for(i = 0; i < 8; i++)
    result[i] = (short int)result_in_float[i];

I know that floats can be converted to 32 bit integers using the __m256i _mm256_cvtps_epi32(__m256 m1) intrinsic, but have no idea how to convert these 32 bit integers further to 16 bit integers. And I don't want just that but also to store those values (in the form of 16 bit integers) to the memory, and I want to do that all using vector instructions.

Searching around the internet, I found an intrinsic by the name of_mm256_mask_storeu_epi16, but I'm not really sure if that would do the trick, as I couldn't find an example of its usage.

like image 893
pythonic Avatar asked Dec 19 '16 17:12

pythonic


People also ask

How do you convert a vector to an integer?

integer() function and pass the character vector as argument to this function. as. integer() returns a new vector with the character values transformed into integer values. During conversion, each string value is converted into an integer value.

What is AVX2 used for?

AVX2 (also known as Haswell New Instructions) expands most integer commands to 256 bits and introduces new instructions.

What is __ m256d?

__m256d : This is a vector of four double precistion numbers (4x64 = 256 bits)


1 Answers

_mm256_cvtps_epi32 is a good first step, the conversion to a packed vector of shorts is a bit annoying, requiring a cross-slice shuffle (so it's good that it's not in a dependency chain here).

Since the values can be assumed to be in the right range (as per the comment), we can use _mm256_packs_epi32 instead of _mm256_shuffle_epi8 to do the conversion, either way it's a 1-cycle instruction on port 5 but using _mm256_packs_epi32 avoids having to get a shuffle mask from somewhere.

So to put it together (not tested)

__m256i tmp = _mm256_cvtps_epi32(result_in_float);
tmp = _mm256_packs_epi32(tmp, _mm256_setzero_si256());
tmp = _mm256_permute4x64_epi64(tmp, 0xD8);
__m128i res = _mm256_castsi256_si128(tmp);
// _mm_store_si128 that

The last step (cast) is free, it just changes the type.

If you had two vectors of floats to convert, you could re-use most of the instructions, eg: (not tested either)

__m256i tmp1 = _mm256_cvtps_epi32(result_in_float1);
__m256i tmp2 = _mm256_cvtps_epi32(result_in_float2);
tmp1 = _mm256_packs_epi32(tmp1, tmp2);
tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8);
// _mm256_store_si256 this
like image 57
harold Avatar answered Oct 14 '22 15:10

harold