Is there a flag in gcc/clang that specifies the precision of the intermediate floating-point calculation?
Suppose I have a C code
double x = 3.1415926;
double y = 1.414;
double z = x * y;
Is there a compiler flag that allows 'x*y' to be calculated in the highest possible precision of the user's machine, say, long-double (64-bit mantissa), and then truncated back to double (53-bit mantissa, the precision of the declared variable type)?
For information only, I am using Ubuntu 14.04 on a 64-bit machine.
GCC
[Edit on observed behavior of gcc 4.8.4, where default behavior is the opposite to documentation]
You need to make use of the 80-bit registers in the x87 FPU. With with -mfpmath=387
you can override the default use of the SSE registers XMM0-XMM7. This default actually gives you the the IEEE behavior where 64-bit registers are used at every step.
See: https://gcc.gnu.org/wiki/x87note
Thus, by default x87 arithmetic is not true 64/32 bit IEEE, but gets extended precision from the x87 unit. However, anytime a value is moved from the registers to an IEEE 64 or 32 bit storage location, this 80 bit value must be rounded down to the appropriate number of bits.
If your operation is extremely complex, however, register spilling may occur; the FP register stack is only depth 8. So when the spillage copies out to a word-sized RAM location you'll get the rounding then. You'll either need to declare long double
yourself this case and round manually at the end, or check the assembler output for explicit spillage.
More information about registers here: https://software.intel.com/en-us/articles/introduction-to-x64-assembly
In particular, XMM0...7 registers, while 128 bits wide, are only so to accommodate two simultaneous 64-bit FP operations. So you want be seeing the stack-operated FPR registers with the FLD (load), FMUL (multiply), and FSTP (store-and-pop) instructions.
So I compiled this code:
double mult(double x, double y) {
return x * y;
}
with:
gcc -mfpmath=387 -Ofast -o precision.s -S precision.c
And got:
mult:
.LFB24:
.cfi_startproc
movsd %xmm1, -8(%rsp)
fldl -8(%rsp)
movsd %xmm0, -8(%rsp)
fldl -8(%rsp)
fmulp %st, %st(1)
fstpl -8(%rsp)
movsd -8(%rsp), %xmm0
ret
.cfi_endproc
Everything makes perfect sense now. Floating point values are passed via registers XMM0 and XMM1 (although they have to take a bizarre round-trip through memory before they can be put on the FPR stack), and the result is returned in XMM0 in accordance with above Intel reference. Not sure why there isn't a simple FLD instruction directly from XMM0/1 but apparently the instruction set doesn't do that.
If you compare to -mfpmath=sse
, there's a lot less to have to do in the latter case, because the operands are ready and waiting in the XMM0/1 registers and it's as simple as a single MULSD instruction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With