I'm trying to figure out how to set -march
option properly to see how much performance difference between the option enabled and disabled can occur on my PC with gcc 4.7.2.
Before trying compiling, I tried to find what is the best -march
option for my PC. My PC has Pentium G850, whose architecture is Sandy Bridge. So I referred to the gcc 4.7.2 manual and found that -march=corei7-avx
seems the best.
However, I remembered that Sandy Bridge based Pentium lacks AVX and AES-NI instruction set support, which is true for Pentium G850. So -march=corei7-avx
is not a proper option.
I come up with some potential options:
-march=corei7-avx -mno-avx -mno-aes
-march=corei7 -mtune=corei7-avx
-march=native
The first option looks reasonable considering information I have, but I'm anxious that there may be missing feature other than AVX and AES-NI. The second option looks safe, but it could miss some minor features on Sandy Bridge because of -march=corei7
. The third option will take care of all of my concerns, but I've heard this option sometimes misdetects features of CPU so I would like to know how to manually do that.
I've googled and searched StackOverflow and SuperUser, but I can't find any clear solutions...
What options should be set?
What about detecting via GCC, for me (gcc-5.3.0) on an i5-2450M CPU (Lenovo e520), the following shows:
gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
/usr/libexec/gcc/x86_64-pc-linux-gnu/5.3.0/cc1 -E -quiet -v - -march=sandybridge
-mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16
-msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mno-abm -mno-lwp
-mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx
-mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd
-mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr
-mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd
-mno-vx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves
-mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma
-mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param
l1-cache-size=32 --param l1-cache-line-size=64 --param
l2-cache-size=3072 -mtune=sandybridge -fstack-protector-strong
I would suggest to use -march=corei7-avx -mtune=corei7-avx -mno-avx -mno-aes
. It is important to specify -mtune
because this option tells gcc which CPU model it should use for scheduling instructions in the generated code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With