Steam's hardware survey is very helpful because it gives a overview of hardware support for SSE instruction sets. However, I can't find any resources on how abundant FMA support is. Is there any data on this somewhere? Or is there any other instruction set that FMA is more or less tied to, like if you have one you most likely have the other, that you can base an estimation on?
FMA3 was introduced by AMD in Piledriver (May 2012). (Vishera FX CPU, Trinity & Richland APU). Piledriver has a serious performance bug with 256b (AVX ymm) store throughput (VMOVAPS/VMOVUPS: one per 17/20 cycles). (See Agner Fog's microarch doc, and other sources.) Either disable your 256b AVX routines on Piledriver, or write a Piledriver-specific version that uses 128b xmm FMA. (Or FMA4, and it can run on Bulldozer, too.)
The successor, Steamroller is found only in Kaveri APUs. (FX CPUs are still Piledriver.) Steamroller fixes the perf bug with 256b stores, but 256b everything takes twice as many cycles as the 128b version, so you're not gaining anything (except a tiny reduction in loop overhead) from 256b AVX. i.e. you might as well write your code to run the 128b FMA4 version if FMA4 is available.
FMA3 was introduced by Intel at the same time as AVX2, in Haswell (June 2013). Many people have not upgraded from Sandybridge/IvyBridge, because there's only a small performance diff, except in code that can use AVX2 / FMA to good advantage. (i.e. not most stuff.)
FMA3 is a separate CPUID feature flag from AVX2. The wrong answer saying it's part of AVX2 are due to Intel introducing it with Haswell.
So in summary, a lot of AMD users probably do have FMA support, even if it's Bulldozer FMA4-only. As for Intel, even Nehalem CPUs are fast enough for most people, so there hasn't been much reason to upgrade. I don't have any numbers, though.
FMA3 is part of AVX2, so any chip that has AVX2 should support FMA3. That said, you can and should check for the FMA3 support independently.
AVX2 is supported by Intel "Haswell", AMD Excavator, and later processors.
FMA4 was supported by AMD "Bulldozer", but they have moved back to supporting FMA3 with AMD "Piledriver".
Given all these chips are pretty recent, it's not wide-spread. The Valve Hardare Survey doesn't show AVX, FMA3, or AVX2 data yet so it's definitely a guess at this point.
BTW, the Xbox One and the PS4 AMD Jaguar CPU does not support FMA3, although they do support AVX and F16C.
See DirectXMath: AVX2, DirectXMath: F16C and FMA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With