Basically how can I write the equivalent of this with AVX2 intrinsics? We assume here that result_in_float
is of type __m256
, while result
is of type short int*
or short int[8]
.
for(i = 0; i < 8; i++)
result[i] = (short int)result_in_float[i];
I know that floats can be converted to 32 bit integers using the __m256i _mm256_cvtps_epi32(__m256 m1)
intrinsic, but have no idea how to convert these 32 bit integers further to 16 bit integers. And I don't want just that but also to store those values (in the form of 16 bit integers) to the memory, and I want to do that all using vector instructions.
Searching around the internet, I found an intrinsic by the name of_mm256_mask_storeu_epi16
, but I'm not really sure if that would do the trick, as I couldn't find an example of its usage.
integer() function and pass the character vector as argument to this function. as. integer() returns a new vector with the character values transformed into integer values. During conversion, each string value is converted into an integer value.
AVX2 (also known as Haswell New Instructions) expands most integer commands to 256 bits and introduces new instructions.
__m256d : This is a vector of four double precistion numbers (4x64 = 256 bits)
_mm256_cvtps_epi32
is a good first step, the conversion to a packed vector of shorts is a bit annoying, requiring a cross-slice shuffle (so it's good that it's not in a dependency chain here).
Since the values can be assumed to be in the right range (as per the comment), we can use _mm256_packs_epi32
instead of _mm256_shuffle_epi8
to do the conversion, either way it's a 1-cycle instruction on port 5 but using _mm256_packs_epi32
avoids having to get a shuffle mask from somewhere.
So to put it together (not tested)
__m256i tmp = _mm256_cvtps_epi32(result_in_float);
tmp = _mm256_packs_epi32(tmp, _mm256_setzero_si256());
tmp = _mm256_permute4x64_epi64(tmp, 0xD8);
__m128i res = _mm256_castsi256_si128(tmp);
// _mm_store_si128 that
The last step (cast) is free, it just changes the type.
If you had two vectors of floats to convert, you could re-use most of the instructions, eg: (not tested either)
__m256i tmp1 = _mm256_cvtps_epi32(result_in_float1);
__m256i tmp2 = _mm256_cvtps_epi32(result_in_float2);
tmp1 = _mm256_packs_epi32(tmp1, tmp2);
tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8);
// _mm256_store_si256 this
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With