I was examining some code which uses the /fp:precise
and /fp:fast
flags.
According to the MSDN documentation for /fp:precise
:
With /fp:precise on x86 processors, the compiler will perform rounding on variables of type float to the proper precision for assignments and casts and when passing parameters to a function. This rounding guarantees that the data does not retain any significance greater than the capacity of its type. A program compiled with /fp:precise can be slower and larger than one compiled without /fp:precise. /fp:precise disables intrinsics; the standard run-time library routines are used instead. For more information, see /Oi (Generate Intrinsic Functions).
Looking at the disassembly of a call to sqrtf
(called with /arch:SSE2
, target x86/Win32
platform):
0033185D cvtss2sd xmm0,xmm1
00331861 call __libm_sse2_sqrt_precise (0333370h)
00331866 cvtsd2ss xmm0,xmm0
From this question I believe modern x86/x64 processors don't use 80-bit registers (or at least discourage their use) so the compiler does what I would assume to be the next best thing and do calculations with 64-bit doubles. And because intrinsics are disabled, there's a call to a library sqrtf function.
Ok, fair enough this seems to comply with what the documentation says.
However, when I compile for the x64 arch, something strange happens:
000000013F2B199E movups xmm0,xmm1
000000013F2B19A1 sqrtps xmm1,xmm1
000000013F2B19A4 movups xmmword ptr [rcx+rax],xmm1
The calculations are not performed with 64-bit doubles, and intrinsics are being used. As far as I can tell, the results are exactly the same as if the /fp:fast
flag was used.
Why is there a discrepancy between the two? Does /fp:precise
simply not work with the x64 platform?
Now, as a sanity check I tested out the same code in VS2010 x86 with /fp:precise
and /arch:SSE2
. Surprisingly, the sqrtpd
intrinsic was being used!
00AF14C7 cvtps2pd xmm0,xmm0
00AF14CA sqrtsd xmm0,xmm0
00AF14CE cvtpd2ps xmm0,xmm0
What's going on here? Why does VS2010 use intrinsics while VS2012 calls a system library?
Testing VS2010 targeting the x64 platform has similar results as VS2012 (/fp:precise
appears to be ignored).
I don't have access to any older versions of VS so i can't do any testing on these platforms.
For reference I'm testing in Windows 7 64-bit with an Intel i5-m430 processor.
First of all you should read this really good blog post about intermediate floating-point precision. The article handles visual studio generated code only (but that's what your question is all about). And now to the examples:
0033185D cvtss2sd xmm0,xmm1
00331861 call __libm_sse2_sqrt_precise (0333370h)
00331866 cvtsd2ss xmm0,xmm0
This assembler code has been generated with /fp:precise /arch:SSE2
for the x86 platform. According to the documentation, the precise floating point model promotes all calculations to double internally on the x86 platform. It also prevents usage of intrinsics (i think you read this information already). Hence the code starts with a conversion from float to double followed by a double precision sqrt call and finally the result is converted back to float.
000000013F2B199E movups xmm0,xmm1
000000013F2B19A1 sqrtps xmm1,xmm1
000000013F2B19A4 movups xmmword ptr [rcx+rax],xmm1
The second example has been compiled for x64 (amd64) platform and this platform behaves completely different! According to the documentation:
For performance reasons, intermediate operations are computed at the widest precision of either operand instead of at the widest precision available.
Hence the calulations will be done with single precision internally. I think they also decided to use intrinsics whenever possible so the difference between /fp:precise
and /fp:fast
is somewhat smaller on the x64 platform. The new behavior results in faster code and it gives the programer more control of what exactly happens (they were able to change the rules of the game because compatibility issues were of no concern for the new x64 platform). Unfortunately, these changes/differences are not explicitly stated in the documentation.
00AF14C7 cvtps2pd xmm0,xmm0
00AF14CA sqrtsd xmm0,xmm0
00AF14CE cvtpd2ps xmm0,xmm0
Finally, the last example has been compiled with the Visual Studio 2010 compiler and i think they accidentally used an intrinsic for sqrt when they should better not have (at least for /fp:precise
mode), but they decided to change/fix this behavior in Visual Studio 2012 again (see here).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With