__m256 dst = _mm256_cmp_ps(value1, value2, _CMP_LE_OQ);
If dst
is [0,0,0,-nan, 0,0,0,-nan];
I want to be able to know the first -nan
index, in this case 3
without doing a for loop with 8
iterations.
Is this possible?
The __m128 data type is used to represent the contents of a Intel® SSE register used by Intel® SSE intrinsics. The __m128 data type can hold four 32-bit floating-point values. The __m128d data type can hold two 64-bit floating-point values.
__m256 Data Types The __m256 data type can hold eight 32-bit floating-point values, while the __m256d data type can hold four 64-bit double precision floating-point values, and the __m256i data type can hold thirty-two 8-bit, sixteen 16-bit, eight 32-bit, or four 64-bit integer values.
I would movmskps
the result of the comparison and then do a bitscan forward.
Using intrinsics (this works with gcc/clang, see here for alternatives):
int pos = __builtin_ctz(_mm256_movemask_ps(dst));
Note that the result of bsf
is unspecified if no bit is set. To work around this you can, e.g., write this to get 8
, if no other bit is set:
int pos = __builtin_ctz(_mm256_movemask_ps(dst) | 0x100);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With