Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do FP operations give EXACTLY the same result on various x86 CPUs?

Do different x86 CPUs (with build-in FPUs and reasonably recent, say launched this millenium) produce exactly the same result for their Floating Point primitives, assuming the same instruction is available on the CPUs being compared, same input and same operating parameters such as rounding mode? I'm not interested in differences in timing, nor in the Pentium FDIV bug (which does not qualify only because that incident is ancient).

I guess the answer is yes for addition, subtraction, negation, and round-to-integer, since these have precise definitions, and I can hardly imagine what a divergence in implementations could be (short perhaps of a bug in the detection of overflow/underflow, but that would be a disaster in some applications, so I imagine this would have been caught and fixed long ago).

Multiplication seems more likely to have diverging implementations: determining the (say) nearest representable Double-Precision Float-Point Number (64 bits, including 52+1 of mantissa) of the product of two DPFPN sometime requires computing the product of their mantissa to (about) 104-bit accuracy, which, for the few LSBits, is arguably a waste of effort. I wonder if this is even attempted, and done correctly. Or perhaps IEEE-754, or some de-facto standard, prescribes something?

Division seems even more delicate.

And, short of a common design, I doubt all implementations of the much more complex things (trig functions, logs..) could be exactly in sync, given the variety of mathematical methods that can be used.

I'm asking that out of a combination of pure nosiness; willingness to improve that answer of mine; and desire for a method to (sometime) allow a program running in a VM to detect a mismatch between the CPU that pretends to be running, and the real one.

like image 225
fgrieu Avatar asked Oct 27 '12 16:10

fgrieu


People also ask

How do floating point operations perform?

Floating-point operations involve floating-point numbers and typically take longer to execute than simple binary integer operations. For this reason, most embedded applications avoid wide-spread usage of floating-point math in favor of faster, smaller integer operations.

How many cycles is a floating point multiply?

Floating point multiply: 7 cycles.


2 Answers

On assembly level basic floating-point instructions (add, subtract, multiply, divide, square root, FMA, round) always produce the same result, as described by IEEE754 standard. There are two kinds of instructions which may produce different results on different architectures: complex FPU instructions for computing transcendental operations (FSIN, FCOS, F2XM1, and alike), and approximate SSE instructions (RCPSS/RCPPS for computing approximate reciprocal, and RSQRTSS, RSQRTPS for computing approximate reciprocal square root). Transcendental x87 FPU operations are implemented in microcode, and AFAIK all Intel and AMD CPUs except AMD K5 use the same microcode, so you can't use it for detection. It might be helpful only for detection of VIA, Cyrix, Transmeta, and other old CPUs, but those are too rare to consider. Approximate SSE instructions are implemented differently on Intel and AMD, and AFAIK there is some difference in implementation on old (pre-K8) and newer AMD CPUs. You could use that difference to detect AMD CPU pretending to be Intel and vice versa, but that is a limited use-case.

like image 90
Marat Dukhan Avatar answered Oct 02 '22 01:10

Marat Dukhan


Except for extreme cases that are very well documented in errata, ALL IA-32 instructions behave identically across processors.

The obvious exceptions are, of course, CPUID and MSR accesses.

The obvious non-exceptions are the various logic, integer and floating point operations. As Maratyszcza wrote in his answer, many of the more complex operations are calculated by microcode. This microcode can be very different among processors with different microarchitectures, but the result is guaranteed to be the same. Intel, for one (I have no firsthand knowledge of other x86 developers), invests huge resources to ensure backwards compatibility between processors, even reproducing behavior that is "buggy" (which changes the bugs into the new spec).

Where the architecture behaves differently, such as with VMX (Virtualization) and SMM (System Management), the control structures include a revision ID. All processors that use the same revision ID are guaranteed to behave the same way with regard to these architectures.

To answer the original question, FP operations, be they x87, SSE or AVX, give the same result on all processors, according to IEEE 754.

like image 35
Nathan Fellman Avatar answered Oct 02 '22 01:10

Nathan Fellman