Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSE and iostream: wrong output for floating point types

test.cpp:

#include <iostream>
using namespace std;

int main()
{
    double pi = 3.14;
    cout << "pi:"<< pi << endl;
}

When compiled on cygwin 64-bit with g++ -mno-sse test.cpp, the output is:

pi:0

However, it works properly if compiled with g++ test.cpp.

I have GCC version 5.4.0.

like image 340
olegkhr Avatar asked Jun 22 '17 05:06

olegkhr


1 Answers

Yes, I repro this. Well, mostly. I actually don't get an output of 0, but some other garbage output. So I can reproduce the invalid behavior, and I have pinpointed the cause.

You can see the code that GCC 5.4.0 generates with the -m64 -mno-sse flags here on Goldbolt's Compiler Explorer. In particular, these are the instructions we care about:

// double pi = 3.14;
fld     QWORD PTR .LC0[rip]
fstp    QWORD PTR [rbp-8]

// std::cout << "pi:";
mov     esi, OFFSET FLAT:.LC1
mov     edi, OFFSET FLAT:std::cout
call    std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)

// std::cout << pi;
sub     rsp, 8
push    QWORD PTR [rbp-8]
mov     rdi, rax
call    std::basic_ostream<char, std::char_traits<char> >::operator<<(double)
add     rsp, 16

What's happening here? Well, first, we need to understand what the -mno-sse flag means. This prevents the compiler from generating any code that uses SSE instructions (as well as any later instruction-set extensions). Therefore, this means that all floating-point operations must be done using the legacy x87 FPU. That works fine and is well-supported on 32-bit builds, but it is nonsensical on 64-bit builds. The AMD64 specification requires SSE2 support as a minimum, so it can be assumed that all 64-bit-capable x86 CPUs will support both SSE and SSE2. This assumption has made it into the ABI: all floating-point operations on x86-64 are done using SSE2 instructions, and floating-point values are passed in XMM registers. Therefore, doing floating-point operations but forbidding the compiler from using SSE/SSE2 instructions puts the code-generator in an impossible position and leads to inevitable failure.

How exactly does it fail? Let's walk through the code above. It's unoptimized (since you didn't pass an optimization flag, it defaulted to -O0), which makes it a little hard to read, but bear with me.

In the first block, it uses x87 FPU instructions to load your double-precision floating-point value (3.14) from memory (it is stored as a constant in the binary) into the register at the top of the x87 FPU stack. Then, it pops that value off the stack and stores it into memory (the program stack). This is totally just busy-work done in unoptimized code, and you can pretty much just ignore it. The upshot here is that your floating-point value is stored in memory at rbp-8 (an offset of 8 bytes from the base pointer).

The next block of instructions can be completely ignored. They just output the string "pi:".

The third block of instructions are supposed to output the floating-point value. First, 8 bytes of space is allocated on the stack. Then, the floating-point value that we had previously stored to memory is pushed onto the stack.

So far, so good. This is how you normally would pass a floating-point parameter to a function—that is, in a 32-bit build, following the 32-bit ABI, where you were using x87 instructions. In a 64-bit build, following the 64-bit ABI, floating-point parameters are supposed to be passed in XMM registers, and this is where the operator<<(double) function expects to receive its parameter. But, you told the compiler it cannot generate SSE code, so it cannot make use of the XMM registers. Its hands are tied. It cannot properly call the library function, which follows the ABI, because your specific options break the ABI.

It's all downhill from here. The compiler copies the contents of the rax register into the rdi register, and then calls the operator<<(double) function. This function tries to write the floating-point value passed in the XMM0 register to stdout, but that register contains garbage (in your case, it seems to contain 0, but its actual contents are formally undefined), so this garbage is written to stdout, instead of the floating-point value you expected to see.

Now that we understand the problem, what are the solutions?

  • If you don't want to use SSE instructions, force a 32-bit binary to be compiled using the -m32 flag. This combines safely with -mno-sse.
  • If you need a 64-bit binary, then don't pass the -mno-sse flag, because this is a violation of the 64-bit ABI, which assumes SSE2 support as a minimum.

(Although I'm ignoring it here, it is technically reasonable to pass the -mno-sse flag along with the -m64 flag. Indeed, this is explicitly supported by GCC because it is used to compile Linux kernel code, where the XMM registers' state is not persisted between calls. This works only because kernel code does not perform floating-point operations. The -mno-sse switch is used only to prevent the compiler from using SSE instructions as part of an advanced optimization that has nothing to do with floating-point operations.)

like image 124
Cody Gray Avatar answered Oct 21 '22 14:10

Cody Gray