Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to chain multiple fma operations together for performance?

Assuming that in some C or C++ code I have a function named T fma( T a, T b, T c ) that performs 1 multiplication and 1 addition like so ( a * b ) + c ; how I'm supposed to optimize multiple mul & add steps ?

For example my algorithm needs to be implemented with 3 or 4 fma operations chained and summed together, How I can write this is an efficient way and at what part of the syntax or semantics I should dedicate particular attention ?

I also would like some hints on the critical part: avoid changing the rounding mode for the CPU to avoid flushing the cpu pipeline. But I'm quite sure that just using the + operation between multiple calls to fma shouldn't change that, I'm saying "quite sure" because I don't have too many CPUs to test this, I'm just following some logical steps.

My algorithm is something like the total of multiple fma calls

fma ( triplet 1 ) + fma ( triplet 2 ) + fma ( triplet 3 )
like image 992
user2485710 Avatar asked Mar 19 '23 14:03

user2485710


1 Answers

Recently, in Build 2014 Eric Brumer gave a very nice talk on the topic (see here). The bottom line of talk was that

Using Fused Multiply Accumulate (aka FMA) everywhere hurts performance.

In Intel CPUs a FMA instruction costs 5 cycles. Instead doing a multiplication (5 cycles) and an addition (3 cycles) costs 8 cycles. Using FMA your are getting two operations in the prize of one (see picture below).

enter image description here

However, FMA seems not to be the holly grail of instructions. As you can see in the picture below FMA can in certain citations hurt the performance.

enter image description here

In the same fashion, your case fma(triplet1) + fma(triplet2) + fma(triplet 3) costs 21 cycles whereas if you were to do the same operations with out FMA would cost 30 cycles. That's a 30% gain in performance.

Using FMA in your code would demand using compiler intrinsics. In my humble opinion though, FMA etc. is not something you should be worried about, unless you are a C++ compiler programmer. If your are not, let the compiler optimization take care of these technicalities. Generally, under such kind of concerns lies the root of all evil (i.e., premature optimization), to paraphrase one of the great ones (i.e., Donald Knuth).

like image 119
101010 Avatar answered Apr 26 '23 17:04

101010