Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would a Windows VM produce different floating point outputs than Linux?

Problem

We've got multiple machines running Ubuntu with very similar specs. We ran a simple program to verify an issue we are seeing occur in the Windows VM each of these machines are running. Compiled using gcc 4.8.4 on a 64-bit Linux machine and v140 in Visual Studio on a 64-bit Windows VM.

#include <cmath>
#include <stdio.h>

int main()
{
  double num = 1.56497856262158219209;
  double numHalf = num / 2.0;

  double cosVal = cos(num);
  double cosValHalf = cos(numHalf);

  printf("num = %a\n", num);
  printf("numHalf = %af\n", numHalf);
  printf("cosVal(num) = %a\n", cosVal);
  printf("cosValHalf(numHalf) = %a\n", cosValHalf);

  //system("pause");
  return 0;
}

The issue arises when running the same binary file on host machines with certain CPUs.

Results

On Linux, all machines produce the same output. On the Windows VM, different results are produced even though the VM versions and settings are the same. Additionally, binaries generated on each VM would produce different results when moved to a different host machine; i.e. a binary generated in VM2, but executed on LM1, returned the same results as if VM1 generated the binary. We even copied the VM to confirm this behavior and sure enough it continues.

With the efforts described above, I'm thinking its not a library difference or a VM issue. As for the outputs, the following CPUs produce these results:

  • Intel® Xeon(R) CPU E5-2630 0
  • Intel® Xeon(R) CPU E5-2630 v2

The former CPUs produce uniform results between Linux and Windows. The results are in hex because the readability mattered less than whether there was a discrepancy.

num = 0x1.90a26f616699cp+0
numHalf = 0x1.90a26f616699cp-1
cosVal(num) = 0x1.7d4555e817bdcp-8
cosValHalf(numHalf) = 0x1.6b171bb5e3434p-1

These CPUs produces different results on a Windows VM than their Linux counterpart:

  • Intel® Xeon(R) CPU E5-2630 v3
  • Intel® Xeon(R) CPU E3-1270 v5

I'm not sure how these results are getting produced. The disassembly on VS2015 shows that both programs generate the same instructions regardless of which host machine it was compiled on.

num = 0x1.90a26f616699cp+0
numHalf = 0x1.90a26f616699cp-1
cosVal(num) = 0x1.7d4555e817bdcp-8
cosValHalf(numHalf) = 0x1.6b171bb5e3435p-1

Question

Why would Windows on a VM handle a binary differently when put on a machine with a specific CPU?

Looking at the differences between the CPUs E5-2630 v2 and E5-2630 v3 for example it appears the CPUs producing different results support AVX2, F16C and FMA3 instructions sets where as the former CPUs do not. However if that were the reason for the discrepancy I would also figure the results would remain uniform between Linux and Windows. Also, the disassembly showed the registers used were still the same on either chip. By debugging the file and stepping through each instruction you would think the behavior would be similar.

All this summed up, its probably the difference in architecture. Any thoughts on how I can be sure?

Resources

I've found the following questions somewhat useful regarding solutions promoting cross-platform consistency and making results more deterministic. I also took a long walk through floating-point comparison and can not recommend it enough for anyone curious about the topic.

like image 994
Juno Avatar asked Nov 08 '22 01:11

Juno


1 Answers

You can compile your program as an ELF binary on Linux and then run it on Linux. You can then copy that ELF binary onto your Windows system and run it under the Windows Subsystem for Linux. The FP initialization should be the same for both systems. Now you are running the same floating point instructions on both systems and the floating point results should be the same. If they aren't (which is unlikely), it is because of differing initializations.

You can also run this ELF binary on different architectures and systems (FreeBSD, ...). The results should all be the same. At that point you can rule out architecture+microarchitecture and rule in Windows and Linux compiler+runtime differences.

You may also be able to use Visual Studio to compile to an ELF binary and repeat this for the different systems and architectures. Those results should be the same but possibly different from the Linux GCC/Clang ELF.

like image 133
Olsonist Avatar answered Nov 15 '22 06:11

Olsonist