I was looking at how a naive implementation of abs(float)
would compile and was quite surprised by the result:
float abs(float x) {
return x < 0 ? -x : x;
}
With clang 10.1 at -O3, this results in:
.LCPI0_0:
.long 2147483648 # float -0
.long 2147483648 # float -0
.long 2147483648 # float -0
.long 2147483648 # float -0
abs(float):
movaps xmm2, xmmword ptr [rip + .LCPI0_0]
xorps xmm2, xmm0
xorps xmm3, xmm3
movaps xmm1, xmm0
cmpltss xmm1, xmm3
andps xmm2, xmm1
andnps xmm1, xmm0
orps xmm1, xmm2
movaps xmm0, xmm1
ret
I find that quite surprising, because I honestly just expected the sign bit of the float to be cleared, which should just be a single XOR instruction. There's got to be something about IEEE-754 floating point semantics that causes this complication, but I just don't understand what makes it this complicated. Why would you need any more than a compare and a conditional move?
Maybe it's because the comparison with NaN would always fail, so the sign bit doesn't get cleared in such a case? But since the sign bit can be 0 or 1 for NaN, that shouldn't matter.
For comparison, when simply using std::fabs
the output is much simpler which is exactly what one would expect:
abs(float):
andps xmm0, xmmword ptr [rip + .LCPI0_0]
ret
The same output is produced when enabling the -ffast-math
flag.
Update: gcc 10.2 at -O3 produces:
abs(float):
pxor xmm1, xmm1
comiss xmm1, xmm0
ja .L6
ret
.L6:
xorps xmm0, XMMWORD PTR .LC1[rip]
ret
The IEEE floating point space contains a number of special values, such as both positive and negative 0, positive and negative infinities, and two families of "Not a Number" (NaN). All of these values have well-defined semantics wrt. the <
operator, and so the compiler must generate code that deals correctly with all the special cases.
The flag -ffast-math
can be used to inform the compiler that it may assume that the special values are not being used, that the distinction between positive and negative 0 is irrelevant and to make some other simplifying assumptions (such as that addition is associative). With this flag, clang generates what is probably optimal code for your abs
function:
abs:
andps .LCPI0_0(%rip), %xmm0
retq
The choice of respecting the somewhat baroque IEEE semantics by default is somewhat controversial; compilers other than gcc and clang tend to make the opposite choice, they compile fast and compact code by default, and require an explicit command-line flag if full IEEE compliance is required (e.g. -mp
in the case of the Intel compiler).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With