Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?

I'm trying to optimize some matrix computations and I was wondering if it was possible to detect at compile-time if SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI[1] is enabled by the compiler ? Ideally for GCC and Clang, but I can manage with only one of them.

I'm not sure it is possible and perhaps I will use my own macro, but I'd prefer detecting it rather and asking the user to select it.


[1] "KCVI" stands for Knights Corner Vector Instruction optimizations. Libraries like FFTW detect/utilize these newer instruction optimizations.

like image 801
Baptiste Wicht Avatar asked Mar 09 '15 10:03

Baptiste Wicht


People also ask

How do I know if my AVX-512 is enabled?

The first is CPU microcode support which needs to be version 0x16 or earlier to enable AVX-512. If you have anything newer, AVX-512 won't work. You can check this by running hardware monitoring apps such as HWInfo64, which will tell you what microcode version your system is running at (HWInfo64 lists it as MCU).

How do I know if my CPU supports AVX2?

To know if your CPU supports AVX, hit the Windows key, search for windows system information and look for your CPU model from this pop-up window. Then, go to the manufacturer's website, and with the help of the model number, find out whether your CPU model supports AVX or not.

What is the difference between AVX and AVX2?

The only difference between AVX and AVX2 for floating point code is availability of new FMA instruction – both AVX and AVX2 have 256-bit FP registers. The main advantage of new ISA of AVX2 is for integer code/data types – there you can expect up to 2x speedup, but 8% for FP code is good speedup of AVX2 over AVX.

What is AVX2 and AVX-512?

Intel AVX-512 enables twice the number of floating point operations per second (FLOPS) per clock cycle compared to its predecessor, Intel AVX2. A single register under Intel AVX-512 can hold up to eight double-precision or 16 single-precision floating-point numbers.


2 Answers

Most compilers will automatically define:

__SSE__ __SSE2__ __SSE3__ __AVX__ __AVX2__ 

etc, according to whatever command line switches you are passing. You can easily check this with gcc (or gcc-compatible compilers such as clang), like this:

$ gcc -msse3 -dM -E - < /dev/null | egrep "SSE|AVX" | sort #define __SSE__ 1 #define __SSE2__ 1 #define __SSE2_MATH__ 1 #define __SSE3__ 1 #define __SSE_MATH__ 1 

or:

$ gcc -mavx2 -dM -E - < /dev/null | egrep "SSE|AVX" | sort #define __AVX__ 1 #define __AVX2__ 1 #define __SSE__ 1 #define __SSE2__ 1 #define __SSE2_MATH__ 1 #define __SSE3__ 1 #define __SSE4_1__ 1 #define __SSE4_2__ 1 #define __SSE_MATH__ 1 #define __SSSE3__ 1 

or to just check the pre-defined macros for a default build on your particular platform:

$ gcc -dM -E - < /dev/null | egrep "SSE|AVX" | sort #define __SSE2_MATH__ 1 #define __SSE2__ 1 #define __SSE3__ 1 #define __SSE_MATH__ 1 #define __SSE__ 1 #define __SSSE3__ 1 

More recent Intel processors support AVX-512, which is not a monolithic instruction set. One can see the support available from GCC (version 6.2) for two examples below.

Here is Knights Landing:

$ gcc -march=knl -dM -E - < /dev/null | egrep "SSE|AVX" | sort #define __AVX__ 1 #define __AVX2__ 1 #define __AVX512CD__ 1 #define __AVX512ER__ 1 #define __AVX512F__ 1 #define __AVX512PF__ 1 #define __SSE__ 1 #define __SSE2__ 1 #define __SSE2_MATH__ 1 #define __SSE3__ 1 #define __SSE4_1__ 1 #define __SSE4_2__ 1 #define __SSE_MATH__ 1 #define __SSSE3__ 1 

Here is Skylake AVX-512:

$ gcc -march=skylake-avx512 -dM -E - < /dev/null | egrep "SSE|AVX" | sort #define __AVX__ 1 #define __AVX2__ 1 #define __AVX512BW__ 1 #define __AVX512CD__ 1 #define __AVX512DQ__ 1 #define __AVX512F__ 1 #define __AVX512VL__ 1 #define __SSE__ 1 #define __SSE2__ 1 #define __SSE2_MATH__ 1 #define __SSE3__ 1 #define __SSE4_1__ 1 #define __SSE4_2__ 1 #define __SSE_MATH__ 1 #define __SSSE3__ 1 

Intel has disclosed additional AVX-512 subsets (see ISA extensions). GCC (version 7) supports compiler flags and preprocessor symbols associated with the 4FMAPS, 4VNNIW, IFMA, VBMI and VPOPCNTDQ subsets of AVX-512:

for i in 4fmaps 4vnniw ifma vbmi vpopcntdq ; do echo "==== $i ====" ; gcc -mavx512$i -dM -E - < /dev/null | egrep "AVX512" | sort ; done ==== 4fmaps ==== #define __AVX5124FMAPS__ 1 #define __AVX512F__ 1 ==== 4vnniw ==== #define __AVX5124VNNIW__ 1 #define __AVX512F__ 1 ==== ifma ==== #define __AVX512F__ 1 #define __AVX512IFMA__ 1 ==== vbmi ==== #define __AVX512BW__ 1 #define __AVX512F__ 1 #define __AVX512VBMI__ 1 ==== vpopcntdq ==== #define __AVX512F__ 1 #define __AVX512VPOPCNTDQ__ 1 

Note that the SSE macros won't work with Visual C++. You have to use _M_IX86_FP instead.

like image 145
Paul R Avatar answered Sep 18 '22 07:09

Paul R


Take a look at archspec, a library which was built exactly for this purpose: https://github.com/archspec/archspec

like image 20
Kenneth Hoste Avatar answered Sep 18 '22 07:09

Kenneth Hoste