Visual C++ can emit C4738 warning:
storing 32-bit float result in memory, possible loss of performance
for cases when a 32-bit float
is about to be stored in memory instead of being stored in a register.
The description further says using double
resolves the issue. I don't get why the latter is true.
Why is storing float
in memory result in performance loss and storing double
does not?
The warning combines two issues:
Using double resolves the second issue (at least partially, 64 bits are still less precise than 80 bits), but has no impact on the possible performance loss. Which is why the warning decription mentions TWO remedies:
To resolve this warning and avoid rounding, compile with /fp:fast or use doubles instead of floats.
To resolve this warning and avoid running out of registers, change the order of computation and modify your use of inlining
While I'm not 100% sure of the cause, here's my guess.
When compiling on x86 and SSE2 is not enabled, the compiler must use the x87 FP stack for all floating-point registers. On MSVC, the FP-mode, by default, set to the 53-bit precision rounding. (I think. I'm not 100% sure on this.)
Therefore, all operations done on the FP-stack is at double-precision.
However, when something is cast down to a float
, the precision needs to be rounded to single-precision. The only way to do this is to store it to memory via the fstp
instruction over a 4-byte memory operand - and reload it.
Let's look at the example on the C4738 warning page you linked to:
float func(float f)
{
return f;
}
int main()
{
extern float f, f1, f2;
double d = 0.0;
f1 = func(d);
f2 = (float) d;
f = f1 + f2; // C4738
printf_s("%f\n", f);
}
When you call func()
, d
is probably stored in an x87 register. However, the call to func()
requires that the precision be lowered to single-precision. This will cause d
to be rounded/stored to memory. Then reloaded and re-promoted to double-precision on the line f = f1 + f2;
.
However, if you use double
the whole way, the compiler can keep d
in register - thus bypassing the overhead of going to and from memory.
As for why it could make you run out of registers... I have no idea. It's possible that the semantics of the program may result in having both double-precision and single-precision values of the same value - which, in this case, require an extra register.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With