Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange /fp Floating Point Model flag behavior

I was examining some code which uses the /fp:precise and /fp:fast flags.

According to the MSDN documentation for /fp:precise:

With /fp:precise on x86 processors, the compiler will perform rounding on variables of type float to the proper precision for assignments and casts and when passing parameters to a function. This rounding guarantees that the data does not retain any significance greater than the capacity of its type. A program compiled with /fp:precise can be slower and larger than one compiled without /fp:precise. /fp:precise disables intrinsics; the standard run-time library routines are used instead. For more information, see /Oi (Generate Intrinsic Functions).

Looking at the disassembly of a call to sqrtf (called with /arch:SSE2, target x86/Win32 platform):

0033185D  cvtss2sd    xmm0,xmm1  
00331861  call        __libm_sse2_sqrt_precise (0333370h)  
00331866  cvtsd2ss    xmm0,xmm0  

From this question I believe modern x86/x64 processors don't use 80-bit registers (or at least discourage their use) so the compiler does what I would assume to be the next best thing and do calculations with 64-bit doubles. And because intrinsics are disabled, there's a call to a library sqrtf function.

Ok, fair enough this seems to comply with what the documentation says.

However, when I compile for the x64 arch, something strange happens:

000000013F2B199E  movups      xmm0,xmm1  
000000013F2B19A1  sqrtps      xmm1,xmm1  
000000013F2B19A4  movups      xmmword ptr [rcx+rax],xmm1  

The calculations are not performed with 64-bit doubles, and intrinsics are being used. As far as I can tell, the results are exactly the same as if the /fp:fast flag was used.

Why is there a discrepancy between the two? Does /fp:precise simply not work with the x64 platform?

Now, as a sanity check I tested out the same code in VS2010 x86 with /fp:precise and /arch:SSE2. Surprisingly, the sqrtpd intrinsic was being used!

00AF14C7  cvtps2pd    xmm0,xmm0  
00AF14CA  sqrtsd      xmm0,xmm0  
00AF14CE  cvtpd2ps    xmm0,xmm0 

What's going on here? Why does VS2010 use intrinsics while VS2012 calls a system library?

Testing VS2010 targeting the x64 platform has similar results as VS2012 (/fp:precise appears to be ignored).

I don't have access to any older versions of VS so i can't do any testing on these platforms.

For reference I'm testing in Windows 7 64-bit with an Intel i5-m430 processor.

like image 417
helloworld922 Avatar asked Oct 22 '22 13:10

helloworld922


1 Answers

First of all you should read this really good blog post about intermediate floating-point precision. The article handles visual studio generated code only (but that's what your question is all about). And now to the examples:

0033185D  cvtss2sd    xmm0,xmm1  
00331861  call        __libm_sse2_sqrt_precise (0333370h)  
00331866  cvtsd2ss    xmm0,xmm0  

This assembler code has been generated with /fp:precise /arch:SSE2 for the x86 platform. According to the documentation, the precise floating point model promotes all calculations to double internally on the x86 platform. It also prevents usage of intrinsics (i think you read this information already). Hence the code starts with a conversion from float to double followed by a double precision sqrt call and finally the result is converted back to float.

000000013F2B199E  movups      xmm0,xmm1  
000000013F2B19A1  sqrtps      xmm1,xmm1  
000000013F2B19A4  movups      xmmword ptr [rcx+rax],xmm1

The second example has been compiled for x64 (amd64) platform and this platform behaves completely different! According to the documentation:

For performance reasons, intermediate operations are computed at the widest precision of either operand instead of at the widest precision available.

Hence the calulations will be done with single precision internally. I think they also decided to use intrinsics whenever possible so the difference between /fp:precise and /fp:fast is somewhat smaller on the x64 platform. The new behavior results in faster code and it gives the programer more control of what exactly happens (they were able to change the rules of the game because compatibility issues were of no concern for the new x64 platform). Unfortunately, these changes/differences are not explicitly stated in the documentation.

00AF14C7  cvtps2pd    xmm0,xmm0  
00AF14CA  sqrtsd      xmm0,xmm0  
00AF14CE  cvtpd2ps    xmm0,xmm0 

Finally, the last example has been compiled with the Visual Studio 2010 compiler and i think they accidentally used an intrinsic for sqrt when they should better not have (at least for /fp:precise mode), but they decided to change/fix this behavior in Visual Studio 2012 again (see here).

like image 139
eel76 Avatar answered Oct 24 '22 11:10

eel76