Which algorithms benefit most from fused multiply add?

2 Answers

taw hit on one important example; more generally, FMA allows library writers to efficiently implement many other floating-point operations with correct rounding.

For example, a platform that has an FMA can use it to implement correctly rounded divide and square root (PPC and Itanium took this approach), which lets the FPU be basically a single-purpose FMA machine. Peter Tang and John Harrison (Intel), and Peter Markstein (HP) have some papers that explain this use if you're curious.

The example taw gave is more broadly useful than just in tracking error bounds. It allows you to represent the product of two floating point numbers as a sum of two floating point numbers without any rounding error; this is quite useful in implementing correctly-rounded floating-point library functions. Jean-Michel Muller's book or the papers on crlibm would be good starting places to learn more about these uses.

FMA is also broadly useful in argument reduction in math-library style routines for certain types of arguments; when one is doing argument reduction, the goal of the computation is often a term of the form (x - a*b), where (a*b) is very nearly equal to x itself; in particular, the result is often on the order of the rounding error in the (a*b) term, if this is computed without an FMA. I believe that Muller has also written some about this in his book.

200

answered Sep 23 '22 17:09

Stephen Canon

The only thing I found so far are "error-free transformations". For any floating point numbers errors from a+b, a-b, and a*b are also floating point numbers (in round to nearest mode, assuming no overflow/underflow etc. etc.).

Addition (and obviously subtraction) error is easy to compute; if abs(a) >= abs(b), error is exactly b-((a+b)-a) (2 flops, or 4-5 if we don't know which is bigger). Multiplication error is trivial to compute with fma - it is simply fma(a,b,-a*b). Without fma it's 16 flops of rather nasty code. And fully generic emulation of correctly rounded fma is even slower than that.

Extra 16 flops of error tracking per flop of real computation is a huge overkill, but with just 1-5 pipeline-friendly flops it's quite reasonable, and for many algorithms based on that 50%-200% overhead of error tracking and compensation results in error as small as if all calculations were done in twice the number of bits they were, avoiding ill-conditioning in many cases.

Interestingly, fma isn't ever used in these algorithms to compute results, just to find errors, because finding error of fma is a slow as finding error of multiplication was without fma.

Relevant keywords to search would be "compensated Horner scheme" and "compensated dot product", with Horner scheme benefiting a lot more.

answered Sep 21 '22 17:09

taw

Related questions
                            
                                Testing if Float value is NaN [duplicate]
                            
                                Limit floating point precision?
                            
                                How do I compute the square root of a number without using builtins? [duplicate]
                            
                                Printing PHP float with 2 digits after the decimal point?
                            
                                Why does (360 / 24) / 60 = 0 ... in Java
                            
                                Why double can store bigger numbers than unsigned long long?
                            
                                Inaccurate Logarithm in Python
                            
                                What could cause floating point numbers to suddenly be off by 1 bit without arithmetic changes
                            
                                On the float_precision argument to pandas.read_csv
                            
                                What is the use of feholdexcept etc.?
                            
                                Why does table-based sin approximation literature always use this formula when another formula seems to make more sense?
                            
                                Confusion on NaN in Java
                            
                                Computing x mod y where y is not representable as floating point
                            
                                Why do you need an explicit `-lm` compiler option [duplicate]
                            
                                Is there any difference in using %f, %e, %g, %E or %G with scanf?
                            
                                Is it possible to use extended precision (80-bit) floating point arithmetic in GHC/Haskell?
                            
                                Round-twice error in .NET's Double.ToString method
                            
                                Precision difference when printing Python and C++ doubles
                            
                                gcc rounding difference between versions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which algorithms benefit most from fused multiply add?

Tags:

floating-point

fma

taw

People also ask

2 Answers

Stephen Canon

taw

Recent Activity

Donate For Us