Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would the same code yield different numeric results on 32 vs 64-bit machines?

We are working on a library of numeric routines in C. We are not sure yet whether we will work with single precision (float) or double (double), so we've defined a type SP as an alias until we decide:

typedef float SP;

When we run our unit tests, they all pass on my machine (a 64-bit Ubuntu) but they fail on my colleague's (a 32-bit Ubuntu that was mistakenly installed on a 64-bit machine).

Using Git's bisect command, we found the exact diff that began yielding different results between his machine and mine:

-typedef double SP;
+typedef float SP;

In other words, going from double precision to single precision yields numerically different results on our machines (about 1e-3 relative difference in the worst cases).

We are fairly certain that we are never comparing unsigned ints to negative signed ints anywhere.

Why would a library of numeric routines yield different results on a 32-bit operating system and on a 64-bit system?

CLARIFICATION

I'm afraid I might not have been clear enough: we have Git commit 2f3f671 that uses double precision, and where the unit tests pass equally well on both machines. Then we have Git commit 46f2ba, where we changed to single precision, and here the tests still pass on the 64-bit machine but not on the 32-bit machine.

like image 485
lindelof Avatar asked Oct 21 '11 09:10

lindelof


1 Answers

You are encountering what is often called the 'x87 excess-precision "bug"'.

In short: historically, (nearly) all floating-point computation on x86 processors was done using the x87 instruction set, which by default operates on an 80-bit floating-point type, but can be set to operate in either single- or double-precision (almost) by some bits in a control register.

If single-precision operations are performed while the precision of the x87 control register is set to double- or extended-precision, then the results will differ from what would be produced if the same operations were performed in single-precision (unless the compiler is extraordinarily careful and stores the result of every computation and reloads it to force rounding to occur in the correct place.)

Your code running on 32-bit is using the x87 unit for floating-point computation (apparently with the control register set for double-precision), and thus encountering the issue described above. Your code running on 64-bit is using the SSE[2,3,...] instructions for floating-point computation, which provide native single- and double-precision operations, and therefore does not carry excess-precision. This is why your results differ.

You can work around this (to a point) by telling your compiler to use SSE for floating-point computation even on 32-bit (-mfpmath=sse with GCC). Even then, bit-exact results are not guaranteed because the various libraries that you link against may use x87, or simply use different algorithms depending on the architecture.

like image 89
Stephen Canon Avatar answered Nov 14 '22 20:11

Stephen Canon