I'd want your input which gcc compiler flags to use when optimizing for Xeons? There's no 'xeon' in mtune or march so which is the closest match?

An update for recent GCC / Xeon. <ul> <li> Sandy-Bridge-based Xeon (E3-12xx series, E5-14xx/24xx series, E5-16xx/26xx/46xx series). <code>-march=corei7-avx</code> for GCC < 4.9.0 or <code>-march=sandybridge</code> for GCC >= 4.9.0. This enables the Advanced Vector Extensions support as well as the AES and PCLMUL instruction sets for Sandy Bridge. Here's the overview from the GCC i386/x86_64 options page: <blockquote> Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support. </blockquote> </li> <li> Ivy-Bridge-based Xeon (E3-12xx v2-series, E5-14xx v2/24xx v2-series, E5-16xx v2/26xx v2/46xx v2-series, E7-28xx v2/48xx v2/88xx v2-series). <code>-march=core-avx-i</code> for GCC < 4.9.0 or <code>-march=ivybridge</code> for GCC >= 4.9.0. This includes the Sandy Bridge (corei7-avx) options while also tacking in support for the new Ivy instruction sets: FSGSBASE, RDRND and F16C. From GCC options page: <blockquote> Intel Core CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C6 instruction set support. </blockquote> </li> <li> Haswell-based Xeon (E3-1xxx v3-series, E5-1xxx v3-series, E5-2xxx v3-series). <code>-march=core-avx2</code> for GCC 4.8.2/4.8.3 or <code>-march=haswell</code> for GCC >= 4.9.0. From GCC options page: <blockquote> Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2 and F16C instruction set support. </blockquote> </li> <li> Broadwell-based Xeon (E3-12xx v4 series, E5-16xx v4 series) <code>-march=core-avx2</code> for GCC 4.8.x or <code>-march=broadwell</code> for GCC >= 4.9.0. From GCC options page: <blockquote> Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support. </blockquote> </li> <li> Skylake-based Xeon (E3-12xx v5 series) and KabyLake-based Xeon (E3-12xx v6 series): <code>-march=core-avx2</code> for GCC 4.8.x or <code>-march=skylake</code> for GCC 4.9.x or <code>-march=skylake-avx512</code> for GCC >= 5.x AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions. From GCC options page: <blockquote> Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set support. </blockquote> </li> <li>Coffee Lake-based Xeon (E-21xx): <code>-march=skylake-avx512</code>.</li> <li> Cascade Lake-based Xeon (Platinum 8200/9200 series, Gold 5200/6200 series, Silver 4100/4200 series, Bronze 3100/3200 series): <code>-march=cascade-lake</code> (requires gcc 9.x). From GCC options page: <blockquote> enables MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, CLWB, AVX512VL, AVX512BW, AVX512DQ, AVX512CD and AVX512VNNI. </blockquote> AVX-512 Vector Neural Network Instructions (AVX512 VNNI) is an x86 extension, part of the AVX-512, designed to accelerate convolutional neural network-based algorithms. </li> <li> Cooper Lake-based Xeon (Platinum, Gold, Silver, Bronze): <code>-march=cooperlake</code> (requires gcc 10.1). The switch enables the AVX512BF16 ISA extensions. </li> </ul> <hr> To find out what the compiler will do with the <code>-march=native</code> option you can use: <pre class="prettyprint"><code>gcc -march=native -Q --help=target </code></pre>

newer versions of gcc have -march=native which lets the compiler automatically determine the optimal <code>-march</code> flag.

gcc optimization flags for Xeon?

2 Answers

An update for recent GCC / Xeon.

Sandy-Bridge-based Xeon (E3-12xx series, E5-14xx/24xx series, E5-16xx/26xx/46xx series).

-march=corei7-avx for GCC < 4.9.0 or -march=sandybridge for GCC >= 4.9.0.

This enables the Advanced Vector Extensions support as well as the AES and PCLMUL instruction sets for Sandy Bridge. Here's the overview from the GCC i386/x86_64 options page:

Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support.
Ivy-Bridge-based Xeon (E3-12xx v2-series, E5-14xx v2/24xx v2-series, E5-16xx v2/26xx v2/46xx v2-series, E7-28xx v2/48xx v2/88xx v2-series).

-march=core-avx-i for GCC < 4.9.0 or -march=ivybridge for GCC >= 4.9.0.

This includes the Sandy Bridge (corei7-avx) options while also tacking in support for the new Ivy instruction sets: FSGSBASE, RDRND and F16C. From GCC options page:

Intel Core CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C6 instruction set support.
Haswell-based Xeon (E3-1xxx v3-series, E5-1xxx v3-series, E5-2xxx v3-series).

-march=core-avx2 for GCC 4.8.2/4.8.3 or -march=haswell for GCC >= 4.9.0.

From GCC options page:

Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2 and F16C instruction set support.
Broadwell-based Xeon (E3-12xx v4 series, E5-16xx v4 series)

-march=core-avx2 for GCC 4.8.x or -march=broadwell for GCC >= 4.9.0.

From GCC options page:

Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support.
Skylake-based Xeon (E3-12xx v5 series) and KabyLake-based Xeon (E3-12xx v6 series):

-march=core-avx2 for GCC 4.8.x or -march=skylake for GCC 4.9.x or -march=skylake-avx512 for GCC >= 5.x

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions.

From GCC options page:

Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set support.
Coffee Lake-based Xeon (E-21xx): -march=skylake-avx512.
Cascade Lake-based Xeon (Platinum 8200/9200 series, Gold 5200/6200 series, Silver 4100/4200 series, Bronze 3100/3200 series): -march=cascade-lake (requires gcc 9.x).

From GCC options page:

enables MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, CLWB, AVX512VL, AVX512BW, AVX512DQ, AVX512CD and AVX512VNNI.

AVX-512 Vector Neural Network Instructions (AVX512 VNNI) is an x86 extension, part of the AVX-512, designed to accelerate convolutional neural network-based algorithms.
Cooper Lake-based Xeon (Platinum, Gold, Silver, Bronze): -march=cooperlake (requires gcc 10.1).

The switch enables the AVX512BF16 ISA extensions.

To find out what the compiler will do with the -march=native option you can use:

gcc -march=native -Q --help=target

104

answered Oct 03 '22 01:10

manlio

newer versions of gcc have -march=native which lets the compiler automatically determine the optimal -march flag.

answered Oct 03 '22 01:10

user83255

Related questions
                            
                                How can I easily format my data table in C++?
                            
                                How can std::cin return a bool and itself at the same time?
                            
                                How do I remove this inheritance-related code smell?
                            
                                Vim: Go to Beginning/End of Next Method
                            
                                Split a string into words by multiple delimiters
                            
                                the procedure entry point __gxx_personality_v0 could not be located
                            
                                What is the function of this statement *(long*)0=0;?
                            
                                Which is the fastest STL container for find?
                            
                                How do I iterate over cin line by line in C++?
                            
                                Small logger class
                            
                                String representation of time_t?
                            
                                Algorithm for generating a unique ID in C++?
                            
                                using googletest in eclipse: how?
                            
                                Measure execution time in C++ OpenMP code
                            
                                c++, usleep() is obsolete, workarounds for Windows/MingW?
                            
                                Get the points of intersection from 2 rectangles
                            
                                How to use the __attribute__((visibility("default")))?
                            
                                C++ exception overhead
                            
                                Dealing with Angle Wrap in c++ code
                            
                                in c++ main function is the entry point to program how i can change it to an other function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

gcc optimization flags for Xeon?

Tags:

c++

c

optimization

gcc

compiler-flags

Eugene Bujak

People also ask

2 Answers

manlio

user83255

Recent Activity

Donate For Us