Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I know if I can compile with FMA instruction sets?

I have seen questions about how to use FMA instructions set but before I get to start using them, I'd first like to know if I can (does my processor support them). I found a post saying that I needed to look at the output of (working on Linux):

more /proc/cpuinfo

to find out. I get this:

processor       : 0                                                  
vendor_id       : GenuineIntel                                       
cpu family      : 6                                                  
model           : 30                                                 
model name      : Intel(R) Xeon(R) CPU           X3470  @ 2.93GHz    
stepping        : 5                                                  
cpu MHz         : 2933.235                                           
size            : 8192 KB                                            
physical id     : 0                                                  
siblings        : 4                                                  
core id         : 0                                                  
cpu cores       : 4                                                  
apicid          : 0                                                  
initial apicid  : 0                                                  
fpu             : yes                                                
fpu_exception   : yes                                                
cpuid level     : 11                                                 
wp              : yes                                                
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni 
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid                                                                                                       
bogomips        : 5866.47                                                                                                                                                                                                                   
clflush size    : 64                                                                                                                                                                                                                        
cache_alignment : 64                                                                                                                                                                                                                        
address sizes   : 36 bits physical, 48 bits virtual     

What seems the most interesting is the flags part but I am not sure how to find out from that list if the processor supports these instructions.

Does anybody know how to find that out? Thank you.

like image 906
user18490 Avatar asked May 02 '13 22:05

user18490


2 Answers

I assume you want to detect it in C/C++ at compile-time.

FP_FAST_FMA macro is not a reliable way to detect FMA instruction set. This macro is defined in "math.h"/<cmath> if std::fma is faster than x*y+z, which is possible if it's an intrinsic function based on an FMA instruction set. Otherwise it will use a non-intrinsic function which is very slow. Now in 2016 GCC's default glibc/libstdc++ defines this macro, but most other standard library implementations don't (including LLVM libc++, ICC's and MSVC's). It doesn't mean that they don't implement std::fma as an intrinsic if possible, they just forgot to define this macro.

Reliable FMA detection

To reliably detect FMA (or any instruction set) at compile time you need to use instruction set specific macros. These macros are defined by the compiler based on the selected target architecture and/or instruction sets.

There is an __FMA__ macro for FMA/FMA3 support, and __FMA4__ macro for AMD FMA4 support. GCC, clang and ICC do define them.

Unfortunately MSVC doesn't define any instruction set specific macros other than __AVX__ and __AVX2__.

Cross-compiler FMA detection

For Intel processors FMA were introduced with AVX2 by Intel Haswell.

For AMD processors, the thing is a little bit messy. FMA4 were introduced with AVX and XOP by AMD Bulldozer. FMA3 (Intel FMA equivalent) were introduced by AMD Piledriver. You can distinguish Piledriver from its predecessor Bulldozer at compile time by the presence of FMA (__FMA__ macro) and BMI (__BMI__ macro) instruction sets. Unfortunately MSVC doesn't define neither.

Nevertheless, like Intel processors, all AMD processors support FMA/FMA3 if AVX2 is present.

If you want cross-compiler detection whether the target architecture supports FMA/FMA3, you must detect the __AVX2__ macro, since it is defined by all major compilers (including MSVC) if AVX2 is enabled:

#if !defined(__FMA__) && defined(__AVX2__)
    #define __FMA__ 1
#endif

Unfortunately there is no reliable way to detect AMD FMA4 using only __AVX__ and __AVX2__ macros.

Notes

FMA instructions are actually available in your program only if it's enabled by the compiler. In GCC and clang you need to set the proper target architecture (like -march=haswell) or manually enable the FMA instruction set with -mfma flag. ICC enables FMA automatically with the -xavx2 flag. MSVC enables FMA with the /arch:AVX2 /fp:fast /O2 options.

AMD announced that it will drop support of FMA4 in the future.

like image 60
plasmacel Avatar answered Sep 24 '22 17:09

plasmacel


Yes, if you have it, it will appear under the flags part. On an Intel Haswell machine I get

flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm

and on an AMD Piledriver, I get

flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1

(note that it includes an fma4 flag, as well as the standard fma flag).

So an easy way to check on Linux is to look at the return code of:

grep fma < /proc/cpuinfo

OS X doesn't have /proc/cpuinfo, but you can instead do:

sysctl -n hw.optional.fma

which will print 0 (no fma) or 1 (has fma).

If you're using C/C++, you can also use the FP_FAST_FMA macro.

like image 43
Simon Byrne Avatar answered Sep 20 '22 17:09

Simon Byrne