Consider the C program below.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {
double x = 4.5;
double x2 = atof("3.5");
printf("%.6f\n", x);
printf("%.6f\n", x2);
return 0;
}
When compiling this with the version of GCC available through MSYS2, the output ends up depending on the availability of SSE:
$ gcc test.c && ./a.exe
4.500000
3.500000
$ gcc -mno-sse test.c && ./a.exe
4.500000
0.000000
Does this behavior make any sense, and if not, is there any way to have GCC produce sensible results in this case (outside of the trivial solution of just removing -mno-sse
)? Here's some version information:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-msys/7.3.0/lto-wrapper.exe
Target: x86_64-pc-msys
Configured with: /msys_scripts/gcc/src/gcc-7.3.0/configure --build=x86_64-pc-msys --prefix=/usr --libexecdir=/usr/lib --
enable-bootstrap --enable-shared --enable-shared-libgcc --enable-static --enable-version-specific-runtime-libs --with-ar
ch=x86-64 --with-tune=generic --disable-multilib --enable-__cxa_atexit --with-dwarf2 --enable-languages=c,c++,fortran,lt
o --enable-graphite --enable-threads=posix --enable-libatomic --enable-libcilkrts --enable-libgomp --enable-libitm --ena
ble-libquadmath --enable-libquadmath-support --disable-libssp --disable-win32-registry --disable-symvers --with-gnu-ld -
-with-gnu-as --disable-isl-version-check --enable-checking=release --without-libiconv-prefix --without-libintl-prefix --
with-system-zlib --enable-linker-build-id --with-default-libstdcxx-abi=gcc4-compatible
Thread model: posix
gcc version 7.3.0 (GCC)
And here's the result of disassembling main
:
0x0000000100401080 <+0>: push %rbp
0x0000000100401081 <+1>: mov %rsp,%rbp
0x0000000100401084 <+4>: sub $0x30,%rsp
0x0000000100401088 <+8>: mov %ecx,0x10(%rbp)
0x000000010040108b <+11>: mov %rdx,0x18(%rbp)
0x000000010040108f <+15>: callq 0x1004010f0 <__main>
0x0000000100401094 <+20>: fldl 0x1f76(%rip) # 0x100403010
0x000000010040109a <+26>: fstpl -0x8(%rbp)
0x000000010040109d <+29>: lea 0x1f5c(%rip),%rcx # 0x100403000
0x00000001004010a4 <+36>: callq 0x100401100 <atof>
0x00000001004010a9 <+41>: mov %rax,-0x10(%rbp)
0x00000001004010ad <+45>: mov -0x8(%rbp),%rax
0x00000001004010b1 <+49>: mov %rax,%rdx
0x00000001004010b4 <+52>: lea 0x1f49(%rip),%rcx # 0x100403004
0x00000001004010bb <+59>: callq 0x100401110 <printf>
0x00000001004010c0 <+64>: mov -0x10(%rbp),%rax
0x00000001004010c4 <+68>: mov %rax,%rdx
0x00000001004010c7 <+71>: lea 0x1f36(%rip),%rcx # 0x100403004
0x00000001004010ce <+78>: callq 0x100401110 <printf>
0x00000001004010d3 <+83>: mov $0x0,%eax
0x00000001004010d8 <+88>: add $0x30,%rsp
0x00000001004010dc <+92>: pop %rbp
0x00000001004010dd <+93>: retq
0x00000001004010de <+94>: nop
0x00000001004010df <+95>: nop
Notably, attempting to compile the same program on a Linux version of GCC produces an error instead (for reasons discussed in this question):
$ gcc -mno-sse test2.c
test2.c: In function ‘main’:
test2.c:6:12: error: SSE register return with SSE disabled
double x2 = atof("3.5");
^~
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18+deb9u1' --with-bugurl=file:///usr/share/doc/gcc-
6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-pr
efix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enabl
e-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-l
ibstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --
enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-
gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/u
sr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --with-arch-32=i686 --w
ith-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x8
6_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
you should be getting the same error from msys gcc -mno-sse
. The standard calling convention (x64 Windows __fastcall
) uses xmm0..3 (SSE vector registers) to pass and return float
and double
.
From the asm main
you show, it appears that -mno-sse
changes gcc's idea of the calling convention to pass/return double
in integer registers, like soft-float on ARM. So there's a calling convention mismatch, and what actually happened is down to the asm details and chance.
The Windows x64 calling convention has an interesting design feature that makes implementing variadic functions like printf
simpler: when calling a variadic function, both the integer and XMM registers for that slot must contain the value (https://learn.microsoft.com/en-gb/cpp/build/varargs?view=vs-2017). Thus the function can dump rcx,rdx,r8, and r9 into the shadow space and form an array of 8-byte args (contiguous with the stack args), before looking at args to figure out which ones are FP and which are integer. (See How to set function arguments in assembly during runtime in a 64bit application on Windows? for an ugly example of doing that.) Unlike the x86-64 System V ABI, the 2nd arg overall goes in XMM1, rather than the 2nd FP arg. So only 4 total args can be in regs, even if there's a mix of FP and integer.
Thus, gcc's passing of a double
bit-pattern in %rdx
actually works, because this library printf
only cares about the value in %rdx
, ignoring the value in %xmm1
.
But atof
returns in XMM0, with RAX holding garbage. Your -mno-sse
main
uses saves RAX and passes it to the 2nd printf. It's either zero or a very small double
.
If RAX held an address, the high 16 bits will be zero, so type-punning that bit-pattern to an IEEE double
(https://en.wikipedia.org/wiki/Double-precision_floating-point_format) gives us exponent = 0, along with some of the bits of the significand. A small positive integer would be an even smaller double
.
So you probably printed a very small subnormal double
that rounds to 0
in that format, which came from whatever garbage atof
left in RAX when it returned a value in XMM0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With