Basically how can I write the equivalent of this with AVX2 intrinsics? We assume here that <code>result_in_float</code> is of type <code>__m256</code>, while <code>result</code> is of type <code>short int*</code> or <code>short int[8]</code>. <pre class="prettyprint"><code>for(i = 0; i < 8; i++) result[i] = (short int)result_in_float[i]; </code></pre> I know that floats can be converted to 32 bit integers using the <code>__m256i _mm256_cvtps_epi32(__m256 m1)</code> intrinsic, but have no idea how to convert these 32 bit integers further to 16 bit integers. And I don't want just that but also to store those values (in the form of 16 bit integers) to the memory, and I want to do that all using vector instructions. Searching around the internet, I found an intrinsic by the name of<code>_mm256_mask_storeu_epi16</code>, but I'm not really sure if that would do the trick, as I couldn't find an example of its usage.

<code>_mm256_cvtps_epi32</code> is a good first step, the conversion to a packed vector of shorts is a bit annoying, requiring a cross-slice shuffle (so it's good that it's not in a dependency chain here). Since the values can be assumed to be in the right range (as per the comment), we can use <code>_mm256_packs_epi32</code> instead of <code>_mm256_shuffle_epi8</code> to do the conversion, either way it's a 1-cycle instruction on port 5 but using <code>_mm256_packs_epi32</code> avoids having to get a shuffle mask from somewhere. So to put it together (not tested) <pre class="prettyprint"><code>__m256i tmp = _mm256_cvtps_epi32(result_in_float); tmp = _mm256_packs_epi32(tmp, _mm256_setzero_si256()); tmp = _mm256_permute4x64_epi64(tmp, 0xD8); __m128i res = _mm256_castsi256_si128(tmp); // _mm_store_si128 that </code></pre> The last step (cast) is free, it just changes the type. If you had two vectors of floats to convert, you could re-use most of the instructions, eg: (not tested either) <pre class="prettyprint"><code>__m256i tmp1 = _mm256_cvtps_epi32(result_in_float1); __m256i tmp2 = _mm256_cvtps_epi32(result_in_float2); tmp1 = _mm256_packs_epi32(tmp1, tmp2); tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8); // _mm256_store_si256 this </code></pre>

How can I convert a vector of float to short int using avx instructions?

Basically how can I write the equivalent of this with AVX2 intrinsics? We assume here that result_in_float is of type __m256, while result is of type short int* or short int[8].

for(i = 0; i < 8; i++)
    result[i] = (short int)result_in_float[i];

I know that floats can be converted to 32 bit integers using the __m256i _mm256_cvtps_epi32(__m256 m1) intrinsic, but have no idea how to convert these 32 bit integers further to 16 bit integers. And I don't want just that but also to store those values (in the form of 16 bit integers) to the memory, and I want to do that all using vector instructions.

Searching around the internet, I found an intrinsic by the name of_mm256_mask_storeu_epi16, but I'm not really sure if that would do the trick, as I couldn't find an example of its usage.

How do you convert a vector to an integer?

integer() function and pass the character vector as argument to this function. as. integer() returns a new vector with the character values transformed into integer values. During conversion, each string value is converted into an integer value.

What is AVX2 used for?

AVX2 (also known as Haswell New Instructions) expands most integer commands to 256 bits and introduces new instructions.

What is __ m256d?

__m256d : This is a vector of four double precistion numbers (4x64 = 256 bits)

_mm256_cvtps_epi32 is a good first step, the conversion to a packed vector of shorts is a bit annoying, requiring a cross-slice shuffle (so it's good that it's not in a dependency chain here).

Since the values can be assumed to be in the right range (as per the comment), we can use _mm256_packs_epi32 instead of _mm256_shuffle_epi8 to do the conversion, either way it's a 1-cycle instruction on port 5 but using _mm256_packs_epi32 avoids having to get a shuffle mask from somewhere.

So to put it together (not tested)

__m256i tmp = _mm256_cvtps_epi32(result_in_float);
tmp = _mm256_packs_epi32(tmp, _mm256_setzero_si256());
tmp = _mm256_permute4x64_epi64(tmp, 0xD8);
__m128i res = _mm256_castsi256_si128(tmp);
// _mm_store_si128 that

The last step (cast) is free, it just changes the type.

If you had two vectors of floats to convert, you could re-use most of the instructions, eg: (not tested either)

__m256i tmp1 = _mm256_cvtps_epi32(result_in_float1);
__m256i tmp2 = _mm256_cvtps_epi32(result_in_float2);
tmp1 = _mm256_packs_epi32(tmp1, tmp2);
tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8);
// _mm256_store_si256 this

How can I convert a vector of float to short int using avx instructions?

Tags:

c++

c

gcc

avx

avx2

pythonic

People also ask

1 Answers

harold

Recent Activity

Donate For Us

How can I convert a vector of float to short int using avx instructions?

Tags:

c++

c

gcc

avx

avx2

pythonic

People also ask

1 Answers

harold

Related questions

Recent Activity

Donate For Us