Why doesn't the same generated assembler code lead to the same output?

Question

Sample code (t0.c):

#include <stdio.h>

float f(float a, float b, float c) __attribute__((noinline));
float f(float a, float b, float c)
{
    return a * c + b * c;
}

int main(void)
{
    void* p = V;
    printf("%a
", f(4476.0f, 20439.0f, 4915.0f));
    return 0;
}

Invocation & execution (via godbolt.org):

# icc 2021.1.2 on Linux on x86-64
$ icc t0.c -fp-model=fast -O3 -DV=f
0x1.d32322p+26
$ icc t0.c -fp-model=fast -O3 -DV=0
0x1.d32324p+26

Generated assembler code is the same: https://godbolt.org/z/osra5jfYY.

Why doesn't the same generated assembler code lead to the same output?

Why does void* p = f; matter?

Nate Eldredge · Accepted Answer

Godbolt shows you the assembly emitted by running the compiler with -S. But in this case, that's not the code that actually gets run, because further optimizations can be done at link time.

Try checking the "Compile to binary" box instead (https://godbolt.org/z/ETznv9qP4), which will actually compile and link the binary and then disassemble it. We see that in your -DV=f version, the code for f is:

 addss  xmm0,xmm1
 mulss  xmm0,xmm2
 ret

just as before. But with -DV=0, we have:

 movss  xmm0,DWORD PTR [rip+0x2d88]
 ret

So f has been converted to a function which simply returns a constant loaded from memory. At link time, the compiler was able to see that f was only ever called with a particular set of constant arguments, and so it could perform interprocedural constant propagation and have f merely return the precomputed result.

Having an additional reference to f evidently defeats this. Probably the compiler or linker sees that f had its address taken, and didn't notice that nothing was ever done with the address. So it assumes that f might be called elsewhere in the program, and therefore it has to emit code that would give the correct result for arbitrary arguments.

As to why the results are different: The precomputation is done strictly, evaluating both a*c and b*c as float and then adding them. So its result of 122457232 is the "right" one by the rules of C, and it is also what you get when compiling with -O0 or -fp-model=strict. The runtime version has been optimized to (a+b)*c, which is actually more accurate because it avoids an extra rounding; it yields 122457224, which is closer to the exact value of 122457225.

Why doesn't the same generated assembler code lead to the same output?

Tags:

c

floating-point

floating-accuracy

x86-64

intel

icc

pmor

1 Answers

Nate Eldredge

Recent Activity

Donate For Us

Why doesn't the same generated assembler code lead to the same output?

Tags:

c

floating-point

floating-accuracy

x86-64

intel

icc

pmor

1 Answers

Nate Eldredge

Related questions

Recent Activity

Donate For Us