Why can a naive abs implementation not be optimized well in C++?

Question

I was looking at how a naive implementation of abs(float) would compile and was quite surprised by the result:

float abs(float x) {
    return x < 0 ? -x : x;
}

With clang 10.1 at -O3, this results in:

.LCPI0_0:
        .long   2147483648              # float -0
        .long   2147483648              # float -0
        .long   2147483648              # float -0
        .long   2147483648              # float -0
abs(float):
        movaps  xmm2, xmmword ptr [rip + .LCPI0_0]
        xorps   xmm2, xmm0
        xorps   xmm3, xmm3
        movaps  xmm1, xmm0
        cmpltss xmm1, xmm3
        andps   xmm2, xmm1
        andnps  xmm1, xmm0
        orps    xmm1, xmm2
        movaps  xmm0, xmm1
        ret

I find that quite surprising, because I honestly just expected the sign bit of the float to be cleared, which should just be a single XOR instruction. There's got to be something about IEEE-754 floating point semantics that causes this complication, but I just don't understand what makes it this complicated. Why would you need any more than a compare and a conditional move?

Maybe it's because the comparison with NaN would always fail, so the sign bit doesn't get cleared in such a case? But since the sign bit can be 0 or 1 for NaN, that shouldn't matter.

For comparison, when simply using std::fabs the output is much simpler which is exactly what one would expect:

abs(float):
        andps   xmm0, xmmword ptr [rip + .LCPI0_0]
        ret

The same output is produced when enabling the -ffast-math flag.

Update: gcc 10.2 at -O3 produces:

abs(float):
        pxor    xmm1, xmm1
        comiss  xmm1, xmm0
        ja      .L6
        ret
.L6:
        xorps   xmm0, XMMWORD PTR .LC1[rip]
        ret

jch · Accepted Answer

The IEEE floating point space contains a number of special values, such as both positive and negative 0, positive and negative infinities, and two families of "Not a Number" (NaN). All of these values have well-defined semantics wrt. the < operator, and so the compiler must generate code that deals correctly with all the special cases.

The flag -ffast-math can be used to inform the compiler that it may assume that the special values are not being used, that the distinction between positive and negative 0 is irrelevant and to make some other simplifying assumptions (such as that addition is associative). With this flag, clang generates what is probably optimal code for your abs function:

abs:
        andps   .LCPI0_0(%rip), %xmm0
        retq

The choice of respecting the somewhat baroque IEEE semantics by default is somewhat controversial; compilers other than gcc and clang tend to make the opposite choice, they compile fast and compact code by default, and require an explicit command-line flag if full IEEE compliance is required (e.g. -mp in the case of the Intel compiler).

Why can a naive abs implementation not be optimized well in C++?

Tags:

c++

optimization

floating-point

compiler-optimization

ieee-754

Jan Schultke

1 Answers

jch

Recent Activity

Donate For Us

Why can a naive abs implementation not be optimized well in C++?

Tags:

c++

optimization

floating-point

compiler-optimization

ieee-754

Jan Schultke

1 Answers

jch

Related questions

Recent Activity

Donate For Us