Do FP operations give EXACTLY the same result on various x86 CPUs?

Tags:

Do different x86 CPUs (with build-in FPUs and reasonably recent, say launched this millenium) produce exactly the same result for their Floating Point primitives, assuming the same instruction is available on the CPUs being compared, same input and same operating parameters such as rounding mode? I'm not interested in differences in timing, nor in the Pentium FDIV bug (which does not qualify only because that incident is ancient).

I guess the answer is yes for addition, subtraction, negation, and round-to-integer, since these have precise definitions, and I can hardly imagine what a divergence in implementations could be (short perhaps of a bug in the detection of overflow/underflow, but that would be a disaster in some applications, so I imagine this would have been caught and fixed long ago).

Multiplication seems more likely to have diverging implementations: determining the (say) nearest representable Double-Precision Float-Point Number (64 bits, including 52+1 of mantissa) of the product of two DPFPN sometime requires computing the product of their mantissa to (about) 104-bit accuracy, which, for the few LSBits, is arguably a waste of effort. I wonder if this is even attempted, and done correctly. Or perhaps IEEE-754, or some de-facto standard, prescribes something?

Division seems even more delicate.

And, short of a common design, I doubt all implementations of the much more complex things (trig functions, logs..) could be exactly in sync, given the variety of mathematical methods that can be used.

I'm asking that out of a combination of pure nosiness; willingness to improve that answer of mine; and desire for a method to (sometime) allow a program running in a VM to detect a mismatch between the CPU that pretends to be running, and the real one.

225

asked Oct 27 '12 16:10

fgrieu

2 Answers

On assembly level basic floating-point instructions (add, subtract, multiply, divide, square root, FMA, round) always produce the same result, as described by IEEE754 standard. There are two kinds of instructions which may produce different results on different architectures: complex FPU instructions for computing transcendental operations (FSIN, FCOS, F2XM1, and alike), and approximate SSE instructions (RCPSS/RCPPS for computing approximate reciprocal, and RSQRTSS, RSQRTPS for computing approximate reciprocal square root). Transcendental x87 FPU operations are implemented in microcode, and AFAIK all Intel and AMD CPUs except AMD K5 use the same microcode, so you can't use it for detection. It might be helpful only for detection of VIA, Cyrix, Transmeta, and other old CPUs, but those are too rare to consider. Approximate SSE instructions are implemented differently on Intel and AMD, and AFAIK there is some difference in implementation on old (pre-K8) and newer AMD CPUs. You could use that difference to detect AMD CPU pretending to be Intel and vice versa, but that is a limited use-case.

answered Oct 02 '22 01:10

Marat Dukhan

Except for extreme cases that are very well documented in errata, ALL IA-32 instructions behave identically across processors.

The obvious exceptions are, of course, CPUID and MSR accesses.

The obvious non-exceptions are the various logic, integer and floating point operations. As Maratyszcza wrote in his answer, many of the more complex operations are calculated by microcode. This microcode can be very different among processors with different microarchitectures, but the result is guaranteed to be the same. Intel, for one (I have no firsthand knowledge of other x86 developers), invests huge resources to ensure backwards compatibility between processors, even reproducing behavior that is "buggy" (which changes the bugs into the new spec).

Where the architecture behaves differently, such as with VMX (Virtualization) and SMM (System Management), the control structures include a revision ID. All processors that use the same revision ID are guaranteed to behave the same way with regard to these architectures.

To answer the original question, FP operations, be they x87, SSE or AVX, give the same result on all processors, according to IEEE 754.

answered Oct 02 '22 01:10

Nathan Fellman

Related questions
                            
                                What happens when you disable interrupts, and what do you do with interrupts you don't know how to handle?
                            
                                Compiler (G++) seems to allocate more memory for instances of classes than it needs
                            
                                Page number in BIOS interrupts
                            
                                How to read a NASM Assembly program .lst listing file
                            
                                How does the cache coherency protocol enforce atomicity?
                            
                                What's the purpose of stack pointer alignment in the prologue of main()
                            
                                x86-32 / x86-64 polyglot machine-code fragment that detects 64bit mode at run-time?
                            
                                Symbol name conflicts with new register names in new NASM versions?
                            
                                Java 10 (and following) on 32-Bit systems
                            
                                Why flush the pipeline for Memory Order Violation caused by other logical processors?
                            
                                Cache misses on macOS
                            
                                Exactly what cases does the gcc execstack flag allow and how does it enforce it?
                            
                                x86 instruction encoding tables
                            
                                How can I detect the number of cores in x86 assembly?
                            
                                X86 Assembly Instruction Pointer Addressing
                            
                                What do FLAGS register components mean in VS 2013?
                            
                                Who loads the BIOS and the memory map during boot-up
                            
                                Fastest way to unpack 32 bits to a 32 byte SIMD vector
                            
                                Convert ARM instruction to i386 instruction
                            
                                x86 assembly multiply and divide instruction operands, 16-bit and higher

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With