Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do FMA (fused multiply-add) instructions always produce the same result as a mul then add instruction?

I have this assembly (AT&T syntax):

mulsd   %xmm0, %xmm1
addsd   %xmm1, %xmm2

I want to replace it with:

vfmadd231sd %xmm0, %xmm1, %xmm2

Will this transformation always leave equivalent state in all involved registers and flags? Or will the result floats differ slightly in someway? (If they differ, why is that?)

(About the FMA instructions: http://en.wikipedia.org/wiki/FMA_instruction_set)

like image 919
Daryl Avatar asked Mar 16 '15 20:03

Daryl


People also ask

What is the advantage of a fused multiply add?

The primary benefit of FMA is that it can be twice as fast. Rather than take 1 cycle for the multiply and then 1 cycle for the add, the FPU can issue both operations in the same cycle. Obviously, most algorithms will benefit from faster operations.

How does fused multiply add work?

That is, where an unfused multiply–add would compute the product b × c, round it to N significant bits, add the result to a, and round back to N significant bits, a fused multiply–add would compute the entire expression a + (b × c) to its full precision before rounding the final result down to N significant bits.

What is FMA on a CPU?

The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations.


1 Answers

No. In fact, a major part of the benefit of fused multiply-add is that it does not (necessarily) produce the same result as a separate multiply and add.

As a (somewhat contrived) example, suppose that we have:

double a = 1 + 0x1.0p-52 // 1 + 2**-52
double b = 1 - 0x1.0p-52 // 1 - 2**-52

and we want to compute a*b - 1. The "mathematically exact" value of a*b - 1 is:

(1 + 2**-52)(1 - 2**-52) - 1 = 1 + 2**-52 - 2**52 - 2**-104 - 1 = -2**-104

but if we first compute a*b using multiplication it rounds to 1.0, so the subsequent subtraction of 1.0 produces a result of zero.

If we use fma(a,b,-1) instead, we eliminate the intermediate rounding of the product, which allows us to get the "real" answer, -1.0p-104.

Please note that not only do we get a different result, but different flags have been set as well; a separate multiply and subtract sets the inexact flag, whereas the fused multiply-add does not set any flags.

like image 89
Stephen Canon Avatar answered Sep 21 '22 17:09

Stephen Canon