Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MSYS2 GCC zeros out doubles on floating point operations with SSE disabled

Consider the C program below.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[]) {
    double x = 4.5;
    double x2 = atof("3.5");
    printf("%.6f\n", x);
    printf("%.6f\n", x2);
    return 0;
}

When compiling this with the version of GCC available through MSYS2, the output ends up depending on the availability of SSE:

$ gcc test.c && ./a.exe
4.500000
3.500000

$ gcc -mno-sse test.c && ./a.exe
4.500000
0.000000

Does this behavior make any sense, and if not, is there any way to have GCC produce sensible results in this case (outside of the trivial solution of just removing -mno-sse)? Here's some version information:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-msys/7.3.0/lto-wrapper.exe
Target: x86_64-pc-msys
Configured with: /msys_scripts/gcc/src/gcc-7.3.0/configure --build=x86_64-pc-msys --prefix=/usr --libexecdir=/usr/lib --
enable-bootstrap --enable-shared --enable-shared-libgcc --enable-static --enable-version-specific-runtime-libs --with-ar
ch=x86-64 --with-tune=generic --disable-multilib --enable-__cxa_atexit --with-dwarf2 --enable-languages=c,c++,fortran,lt
o --enable-graphite --enable-threads=posix --enable-libatomic --enable-libcilkrts --enable-libgomp --enable-libitm --ena
ble-libquadmath --enable-libquadmath-support --disable-libssp --disable-win32-registry --disable-symvers --with-gnu-ld -
-with-gnu-as --disable-isl-version-check --enable-checking=release --without-libiconv-prefix --without-libintl-prefix --
with-system-zlib --enable-linker-build-id --with-default-libstdcxx-abi=gcc4-compatible
Thread model: posix
gcc version 7.3.0 (GCC)

And here's the result of disassembling main:

   0x0000000100401080 <+0>:     push   %rbp
   0x0000000100401081 <+1>:     mov    %rsp,%rbp
   0x0000000100401084 <+4>:     sub    $0x30,%rsp
   0x0000000100401088 <+8>:     mov    %ecx,0x10(%rbp)
   0x000000010040108b <+11>:    mov    %rdx,0x18(%rbp)
   0x000000010040108f <+15>:    callq  0x1004010f0 <__main>
   0x0000000100401094 <+20>:    fldl   0x1f76(%rip)        # 0x100403010
   0x000000010040109a <+26>:    fstpl  -0x8(%rbp)
   0x000000010040109d <+29>:    lea    0x1f5c(%rip),%rcx        # 0x100403000
   0x00000001004010a4 <+36>:    callq  0x100401100 <atof>
   0x00000001004010a9 <+41>:    mov    %rax,-0x10(%rbp)
   0x00000001004010ad <+45>:    mov    -0x8(%rbp),%rax
   0x00000001004010b1 <+49>:    mov    %rax,%rdx
   0x00000001004010b4 <+52>:    lea    0x1f49(%rip),%rcx        # 0x100403004
   0x00000001004010bb <+59>:    callq  0x100401110 <printf>
   0x00000001004010c0 <+64>:    mov    -0x10(%rbp),%rax
   0x00000001004010c4 <+68>:    mov    %rax,%rdx
   0x00000001004010c7 <+71>:    lea    0x1f36(%rip),%rcx        # 0x100403004
   0x00000001004010ce <+78>:    callq  0x100401110 <printf>
   0x00000001004010d3 <+83>:    mov    $0x0,%eax
   0x00000001004010d8 <+88>:    add    $0x30,%rsp
   0x00000001004010dc <+92>:    pop    %rbp
   0x00000001004010dd <+93>:    retq
   0x00000001004010de <+94>:    nop
   0x00000001004010df <+95>:    nop

Notably, attempting to compile the same program on a Linux version of GCC produces an error instead (for reasons discussed in this question):

$ gcc -mno-sse test2.c
test2.c: In function ‘main’:
test2.c:6:12: error: SSE register return with SSE disabled
     double x2 = atof("3.5");
            ^~

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18+deb9u1' --with-bugurl=file:///usr/share/doc/gcc-
6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-pr
efix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enabl
e-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-l
ibstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --
enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-
gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/u
sr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --with-arch-32=i686 --w
ith-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x8
6_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
like image 873
fuglede Avatar asked Oct 17 '22 12:10

fuglede


1 Answers

you should be getting the same error from msys gcc -mno-sse. The standard calling convention (x64 Windows __fastcall) uses xmm0..3 (SSE vector registers) to pass and return float and double.

From the asm main you show, it appears that -mno-sse changes gcc's idea of the calling convention to pass/return double in integer registers, like soft-float on ARM. So there's a calling convention mismatch, and what actually happened is down to the asm details and chance.

The Windows x64 calling convention has an interesting design feature that makes implementing variadic functions like printf simpler: when calling a variadic function, both the integer and XMM registers for that slot must contain the value (https://learn.microsoft.com/en-gb/cpp/build/varargs?view=vs-2017). Thus the function can dump rcx,rdx,r8, and r9 into the shadow space and form an array of 8-byte args (contiguous with the stack args), before looking at args to figure out which ones are FP and which are integer. (See How to set function arguments in assembly during runtime in a 64bit application on Windows? for an ugly example of doing that.) Unlike the x86-64 System V ABI, the 2nd arg overall goes in XMM1, rather than the 2nd FP arg. So only 4 total args can be in regs, even if there's a mix of FP and integer.

Thus, gcc's passing of a double bit-pattern in %rdx actually works, because this library printf only cares about the value in %rdx, ignoring the value in %xmm1.

But atof returns in XMM0, with RAX holding garbage. Your -mno-sse main uses saves RAX and passes it to the 2nd printf. It's either zero or a very small double.

If RAX held an address, the high 16 bits will be zero, so type-punning that bit-pattern to an IEEE double (https://en.wikipedia.org/wiki/Double-precision_floating-point_format) gives us exponent = 0, along with some of the bits of the significand. A small positive integer would be an even smaller double.

So you probably printed a very small subnormal double that rounds to 0 in that format, which came from whatever garbage atof left in RAX when it returned a value in XMM0.

like image 186
Peter Cordes Avatar answered Oct 21 '22 08:10

Peter Cordes