Sample code (t0.c
):
#include <stdio.h>
float f(float a, float b, float c) __attribute__((noinline));
float f(float a, float b, float c)
{
return a * c + b * c;
}
int main(void)
{
void* p = V;
printf("%a\n", f(4476.0f, 20439.0f, 4915.0f));
return 0;
}
Invocation & execution (via godbolt.org):
# icc 2021.1.2 on Linux on x86-64
$ icc t0.c -fp-model=fast -O3 -DV=f
0x1.d32322p+26
$ icc t0.c -fp-model=fast -O3 -DV=0
0x1.d32324p+26
Generated assembler code is the same: https://godbolt.org/z/osra5jfYY.
Why doesn't the same generated assembler code lead to the same output?
Why does void* p = f;
matter?
Godbolt shows you the assembly emitted by running the compiler with -S
. But in this case, that's not the code that actually gets run, because further optimizations can be done at link time.
Try checking the "Compile to binary" box instead (https://godbolt.org/z/ETznv9qP4), which will actually compile and link the binary and then disassemble it. We see that in your -DV=f
version, the code for f
is:
addss xmm0,xmm1
mulss xmm0,xmm2
ret
just as before. But with -DV=0
, we have:
movss xmm0,DWORD PTR [rip+0x2d88]
ret
So f
has been converted to a function which simply returns a constant loaded from memory. At link time, the compiler was able to see that f
was only ever called with a particular set of constant arguments, and so it could perform interprocedural constant propagation and have f
merely return the precomputed result.
Having an additional reference to f
evidently defeats this. Probably the compiler or linker sees that f
had its address taken, and didn't notice that nothing was ever done with the address. So it assumes that f
might be called elsewhere in the program, and therefore it has to emit code that would give the correct result for arbitrary arguments.
As to why the results are different: The precomputation is done strictly, evaluating both a*c
and b*c
as float
and then adding them. So its result of 122457232
is the "right" one by the rules of C, and it is also what you get when compiling with -O0
or -fp-model=strict
. The runtime version has been optimized to (a+b)*c
, which is actually more accurate because it avoids an extra rounding; it yields 122457224
, which is closer to the exact value of 122457225
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With