float a[4] = {1,2,3,4}, b[4] = {4,3,2,1};
uint32_t c[4];
int main() {
__m128 pa = _mm_loadu_ps(a);
__m128 pb = _mm_loadu_ps(b);
__m128 pc = _mm_cmpgt_ps(pa, pb);
_mm_storeu_ps((float*)c, pc);
for (int i = 0;i < 4; ++i) printf("%u\n", c[i]);
return 0;
}
what is the correct instruction of _mm_storeu_ps((float*)c, pc)
?
here, c is a integer array... I don't think this way is good, any better?
There are two instructions to convert __m128
(float
vector) into __m128i
(int32_t
vector) in SSE2: _mm_cvtps_epi32
(with rounding) and _mm_cvttps_epi32
(with truncation).
__m128i vi = _mm_cvttps_epi32(pc);
_mm_storeu_si128((__m128i *)c, vi);
If you can't use SSE2, you should convert float
array to int
array after storing pc
into float
array.
float d[4];
_mm_storeu_ps(d, pc);
c[0] = (int)d[0]; c[1] = (int)d[1]; c[2] = (int)d[2]; c[3] = (int)d[3];
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With