I have a library that I'm converting to 64-bits. However, I can't get bit-exact results on 64-bit mode, so my tests are failing.
I reduced the problem to a simple test case:
#include <stdio.h>
int main(void) {
printf("%d bits: ", sizeof(void*) * 8);
volatile double d = 10.870191700000001;
volatile double x = 0.10090000000000002;
d += x * 30.07;
printf("%0.15f\n", d);
}
To avoid compiler differences, I'm using the same compiler and cross compiling. In this case, I'm using TDM-GCC 64-bit 5.1.0 on Windows 7 in a Core i5 CPU. Here is my command-line:
gcc double_test.c -o double_test.exe -m32 -O0 && double_test.exe && gcc double_test.c -o double_test.exe -m64 -O0 && double_test.exe
And the output is:
32 bits: 13.904254700000001
64 bits: 13.904254700000003
In this case the error is minimal, but in my full test cases the error can add up and be enough to double my output.
How can I get bit-exact operations to match the 32-bit output?
The nearest I got to something relevant was to use -ffloat-store, but in this snippet it got the 32-bit execution like the 64-bit one, while I need just the opposite. However, this didn't have any noticeable effect upon my library. I also tested the -fexcess-precision=standard and -mfp-math options to no avail.
Since you said you need the more precise ...01 result, as well as determinism, you unfortunately can't just use -msse2 -mfpmath=sse in your 32-bit build. Future readers looking for determinism should use that.
You can use -mfpmath=387 to ask gcc to use slow/obsolete x87 math in 64-bit mode, where it's not the default. The calling convention passes/returns FP args in xmm registers, so this is even worse than in 32-bit mode, sometimes requiring extra store/reload.
peter@volta:/tmp$ gcc -m64 -mfpmath=387 -O3 fp-prec.c -o fp-64-387
peter@volta:/tmp$ ./fp-64-387
64 bits: 13.904254700000001
I'm not sure if gcc strictly limits itself to x87 when auto-vectorization is possible. If so, you're missing out on performance.
And BTW, in your example the ...01 is the result of keeping extra precision in an 80-bit temporary for the x*30.07 before adding it to d. (d is volatile, but d += stuff is still equivalent to d = d + stuff so the x*30.07 doesn't get rounded to 64-bit double first).
You could use long double, e.g. d += x * (long double)30.07 to force an 80-bit temporary there. long double is 80 bits in the x86-64 System V ABI Linux/OS X/*BSD/etc, but on x64 Windows it's the same as 64-bit double. So that might not be an option for you.
In this case you can get the same result with an FMA which keeps infinite precision for the multiply before doing the add. This is slow on hardware without FMA support, but fma(d, 30.07, x) will reliably give the result you want.
If you need this, use it in the places where that precision is required.
If you compile with FMA enabled, it can inline to an FMA instruction. (e.g. -march=native on my Skylake CPU)
Even without using the fma() math.h function, gcc will contract mul+add expressions into FMA when optimizing. (Unlike Clang, which I think doesn't do FP_CONTRACT by default without -ffast-math). Note that I'm not using -march=387
# your original source code, using an FMA instruction (native=skylake in my case)
peter@volta:/tmp$ gcc -m64 -march=native -O3 fp-prec.c -o fp-64-native
peter@volta:/tmp$ ./fp-64-native
64 bits: 13.904254700000001
The relevant part of main is:
57e: c5 fb 10 44 24 08 vmovsd xmm0,QWORD PTR [rsp+0x8] # load x
584: c5 fb 10 0c 24 vmovsd xmm1,QWORD PTR [rsp] # load d
589: c4 e2 f1 99 05 d6 01 00 00 vfmadd132sd xmm0,xmm1,QWORD PTR [rip+0x1d6] # the 30.07 constant
592: c5 fb 11 04 24 vmovsd QWORD PTR [rsp],xmm0 # store d
597: c5 fb 10 04 24 vmovsd xmm0,QWORD PTR [rsp] # reload d
59c: e8 8f ff ff ff call 530 <printf@plt>
FP determinism is hard in general.
See also https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ and https://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/
I would not aim to reproduce the 32-bit output, since it's a consequence of excess precision in the 32-bit x86 (x87) ABI and possibly also compiler non-conformance. Instead try to match the 64-bit output which is what you should expect on good targets. As long as you're okay with requiring a machine with sse2+, -mfpmath=sse will make 32-bit x86 behave like 64-bit and other more reasonable targets.
If you really need the result from 32-bit x86, ideally you should write it portably. This might involve breaking things down into a pair of doubles, but for x86-only you could just use long double. In the particular example in your question, the fma function would work, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With