Floating point operations 32 bit mode vs 64 bit mode

Question

I have the same number crunching source code in Delphi which is compiled both as 32 bit and as 64 bit application. From the log file I can see that the numbers are slightly (1e-14 relative error) different. So I'm wondering if it is possible that the same CPU performs floating point operations differently when running 32 bit and 64 bit code. Or is it something that the compiler is responsible for.

David Heffernan · Accepted Answer

I'm going to assume that the code does not explicitly use Extended. Since that data type differs between 32 and 64 bit (it's 10 bytes in 32 bit and 8 bytes in 64 bit), any explicit use of Extended introduces an immediate difference. I'm going to assume that you are using Double for all your variables. Although the arguments below transfer across equally to Single.

Beyond that, the most common reason for this is a difference in behaviour between the two floating point units.

The x87 unit, used by 32 bit code, stores intermediate values to 80 bit extended precision. The SSE unit, used by 64 bit code, stores intermediate values to 64 bit double precision.

Now, the x87 unit can be configured using the control word to store intermediate values to 64 bit precision. It makes no difference in terms of performance, but will align your 32 and 64 bit results to be closer.

Even then you won't get exactly the same results on the different units. In fact you won't get the exact same results on all x87 units. Even though these units are all IEEE754 conformant, that standard allows a degree of leeway for calculations.

What's more, higher order calculations like trigonometry, logarithms, exponentiation etc. are performed quite differently between 32 and 64 bit. The 32 bit unit has more built in functionality than the 64 bit unit. You'll note in the Delphi source code that the trig functions, for example, are all implemented in the RTL for 64 bit. On 32 bit code they are implemented by calling x87 ops.

The bottom line is that you will never get your 32 and 64 bit programs to agree exactly when there are floating point calculations involved. You will have to accept differences to a small tolerance.

Floating point operations 32 bit mode vs 64 bit mode

Tags:

floating-accuracy

32bit-64bit

delphi

Max

1 Answers

David Heffernan

Recent Activity

Donate For Us

Floating point operations 32 bit mode vs 64 bit mode

Tags:

floating-accuracy

32bit-64bit

delphi

Max

1 Answers

David Heffernan

Related questions

Recent Activity

Donate For Us