Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compile C++ code with AVX2/AVX512 intrinsics on AVX

I have production code that has kernels implemented for various SIMD instruction sets, including AVX, AVX2, and AVX512. The code can be compiled on the target machine for the target machine with something like ./configure --enable-proc=AVX CXXFLAGS="-mavx".

This also works well on Travis CI which exposes AVX intrinsics. I would like to at least compile the AVX2 and AVX512 versions in order to see whether all files are checked in. But it seems that compiling for a different ISA is not that easy.

A simple AVX2 test program:

#include <immintrin.h>

int main(int argc, char **argv) {
    __m256d a;
    __m256d b;
    __m256d c;

    _mm256_fnmadd_pd(a, b, c);
}

On my AVX machine (Intel Core i5-2520M), it does not compile:

$ g++ -Wall -Wpedantic --std=c++11 cpp.cpp -mavx2
In file included from /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/immintrin.h:79:0,
                 from cpp.cpp:3:
/usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/fmaintrin.h:143:1: error: inlining failed in call to always_inline '__m256d _mm256_fnmadd_pd(__m256d, __m256d, __m256d)': target specific option mismatch
 _mm256_fnmadd_pd (__m256d __A, __m256d __B, __m256d __C)
 ^~~~~~~~~~~~~~~~

Is there some way to compile the code? I do not care about running, I just want a smoke-test.

like image 331
Martin Ueding Avatar asked Apr 25 '17 14:04

Martin Ueding


People also ask

Is AVX2 faster than AVX?

Comparing vector indexesVector retrieval is consistently faster on the AVX-512 instruction set than on AVX2. This is because AVX-512 supports 512-bit computation, compared to just 256-bit computation on AVX2.

What is the difference between AVX and AVX2?

The only difference between AVX and AVX2 for floating point code is availability of new FMA instruction – both AVX and AVX2 have 256-bit FP registers. The main advantage of new ISA of AVX2 is for integer code/data types – there you can expect up to 2x speedup, but 8% for FP code is good speedup of AVX2 over AVX.

What are AVX intrinsics?

AVX provides intrinsic functions that combine one or more values into a 256-bit vector. Table 2 lists their names and provides a description of each. There are similar intrinsics that initialize 128-bit vectors, but those are provided by SSE, not AVX.

What is AVX and SSE?

SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD (single instruction multiple data streams) instruction sets supported by recent CPUs manufactured in Intel and AMD. This SIMD programming allows parallel processing by multiple cores in a single CPU.


1 Answers

Supplying -march=sandybridge, -march=haswell or -march=knl enables all the needed features to translate the code.

like image 108
Martin Ueding Avatar answered Oct 05 '22 23:10

Martin Ueding