Is there any single instruction or function that can invert the sign of every float inside a __m128?
i.e. a = r0:r1:r2:r3 ===> a = -r0:-r1:-r2:-r3
?
I know this can be done by _mm_sub_ps(_mm_set1_ps(0.0),a)
, but isn't it potentially slow since _mm_set1_ps(0.0)
is a multi-instruction function?
In practice your compiler should do a good job of generating the constant vector for 0.0. It will probably just use _mm_xor_ps
, and if your code is in a loop it should hoist the constant generation out of the loop anyway. So, bottom line, use your original idea of:
v = _mm_sub_ps(_mm_set1_ps(0.0), v);
or another common trick, which is:
v = _mm_xor_ps(v, _mm_set1_ps(-0.0));
which just flips the sign bits instead of doing a subtraction (not quite as safe as the first method, since it doesn't do the right thing with NaNs, but may be more efficient in some cases).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With