Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if compiled code uses SSE and AVX instructions?

I wrote some code to do a bunch of math, and it needs to go fast, so I need it to use SSE and AVX instructions. I'm compiling it using g++ with the flags -O3 and -march=native, so I think it's using SSE and AVX instructions, but I'm not sure. Most of my code looks something like the following:

for(int i = 0;i<size;i++){
    a[i] = b[i] * c[i];
}

Is there any way I can tell if my code (after compilation) uses SSE and AVX instructions? I think I could look at the assembly to see, but I don't know assembly, and I don't know how to see the assembly that the compiler outputs.

like image 901
BadProgrammer99 Avatar asked Dec 19 '17 00:12

BadProgrammer99


People also ask

How do you check if AVX is supported?

To know if your CPU supports AVX, hit the Windows key, search for windows system information and look for your CPU model from this pop-up window. Then, go to the manufacturer's website, and with the help of the model number, find out whether your CPU model supports AVX or not.

How do I know if avx512 is enabled?

The first is CPU microcode support which needs to be version 0x16 or earlier to enable AVX-512. If you have anything newer, AVX-512 won't work. You can check this by running hardware monitoring apps such as HWInfo64, which will tell you what microcode version your system is running at (HWInfo64 lists it as MCU).

What is SSE and AVX?

SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD (single instruction multiple data streams) instruction sets supported by recent CPUs manufactured in Intel and AMD. This SIMD programming allows parallel processing by multiple cores in a single CPU.

How do I enable avx2 in Windows 10?

To make sure that AVX is enabled, do the following: Open your Windows Power Shell in Admin mode. In the command line type: bcdedit /set xsavedisable 0 (do NOT set this value to a number other than zero!). You should get feedback that the operation was successfully completed.


3 Answers

Under Linux, you could also decompile your binary:

objdump -d YOURFILE > YOURFILE.asm

Then find all SSE instructions:

awk '/[ \t](addps|addss|andnps|andps|cmpps|cmpss|comiss|cvtpi2ps|cvtps2pi|cvtsi2ss|cvtss2s|cvttps2pi|cvttss2si|divps|divss|ldmxcsr|maxps|maxss|minps|minss|movaps|movhlps|movhps|movlhps|movlps|movmskps|movntps|movss|movups|mulps|mulss|orps|rcpps|rcpss|rsqrtps|rsqrtss|shufps|sqrtps|sqrtss|stmxcsr|subps|subss|ucomiss|unpckhps|unpcklps|xorps|pavgb|pavgw|pextrw|pinsrw|pmaxsw|pmaxub|pminsw|pminub|pmovmskb|psadbw|pshufw)[ \t]/' YOURFILE.asm

Find only packed SSE instructions (suggested by @Peter Cordes in comments):

awk '/[ \t](addps|andnps|andps|cmpps|cvtpi2ps|cvtps2pi|cvttps2pi|divps|maxps|minps|movaps|movhlps|movhps|movlhps|movlps|movmskps|movntps|movntq|movups|mulps|orps|pavgb|pavgw|pextrw|pinsrw|pmaxsw|pmaxub|pminsw|pminub|pmovmskb|pmulhuw|psadbw|pshufw|rcpps|rsqrtps|shufps|sqrtps|subps|unpckhps|unpcklps|xorps)[ \t]/' YOURFILE.asm

Find all SSE2 instructions (except MOVSD and CMPSD, which were first introduced in 80386):

awk '/[ \t](addpd|addsd|andnpd|andpd|cmppd|comisd|cvtdq2pd|cvtdq2ps|cvtpd2dq|cvtpd2pi|cvtpd2ps|cvtpi2pd|cvtps2dq|cvtps2pd|cvtsd2si|cvtsd2ss|cvtsi2sd|cvtss2sd|cvttpd2dq|cvttpd2pi|cvtps2dq|cvttsd2si|divpd|divsd|maxpd|maxsd|minpd|minsd|movapd|movhpd|movlpd|movmskpd|movupd|mulpd|mulsd|orpd|shufpd|sqrtpd|sqrtsd|subpd|subsd|ucomisd|unpckhpd|unpcklpd|xorpd|movdq2q|movdqa|movdqu|movq2dq|paddq|pmuludq|pshufhw|pshuflw|pshufd|pslldq|psrldq|punpckhqdq|punpcklqdq)[ \t]/' YOURFILE.asm

Find only packed SSE2 instructions:

awk '/[ \t](addpd|andnpd|andpd|cmppd|cvtdq2pd|cvtdq2ps|cvtpd2dq|cvtpd2pi|cvtpd2ps|cvtpi2pd|cvtps2dq|cvtps2pd|cvttpd2dq|cvttpd2pi|cvttps2dq|divpd|maxpd|minpd|movapd|movapd|movhpd|movhpd|movlpd|movlpd|movmskpd|movntdq|movntpd|movupd|movupd|mulpd|orpd|pshufd|pshufhw|pshuflw|pslldq|psrldq|punpckhqdq|shufpd|sqrtpd|subpd|unpckhpd|unpcklpd|xorpd)[ \t]/' YOURFILE.asm

Find all SSE3 instructions:

awk '/[ \t](addsubpd|addsubps|haddpd|haddps|hsubpd|hsubps|movddup|movshdup|movsldup|lddqu|fisttp)[ \t]/' YOURFILE.asm

Find all SSSE3 instructions:

awk '/[ \t](psignw|psignd|psignb|pshufb|pmulhrsw|pmaddubsw|phsubw|phsubsw|phsubd|phaddw|phaddsw|phaddd|palignr|pabsw|pabsd|pabsb)[ \t]/' YOURFILE.asm

Find all SSE4 instructions:

awk '/[ \t](mpsadbw|phminposuw|pmulld|pmuldq|dpps|dppd|blendps|blendpd|blendvps|blendvpd|pblendvb|pblenddw|pminsb|pmaxsb|pminuw|pmaxuw|pminud|pmaxud|pminsd|pmaxsd|roundps|roundss|roundpd|roundsd|insertps|pinsrb|pinsrd|pinsrq|extractps|pextrb|pextrd|pextrw|pextrq|pmovsxbw|pmovzxbw|pmovsxbd|pmovzxbd|pmovsxbq|pmovzxbq|pmovsxwd|pmovzxwd|pmovsxwq|pmovzxwq|pmovsxdq|pmovzxdq|ptest|pcmpeqq|pcmpgtq|packusdw|pcmpestri|pcmpestrm|pcmpistri|pcmpistrm|crc32|popcnt|movntdqa|extrq|insertq|movntsd|movntss|lzcnt)[ \t]/' YOURFILE.asm

Find most common AVX instructions (including scalar, including AVX2, AVX-512 family and some FMA like vfmadd132pd):

awk '/[ \t](vmovapd|vmulpd|vaddpd|vsubpd|vfmadd213pd|vfmadd231pd|vfmadd132pd|vmulsd|vaddsd|vmosd|vsubsd|vbroadcastss|vbroadcastsd|vblendpd|vshufpd|vroundpd|vroundsd|vxorpd|vfnmadd231pd|vfnmadd213pd|vfnmadd132pd|vandpd|vmaxpd|vmovmskpd|vcmppd|vpaddd|vbroadcastf128|vinsertf128|vextractf128|vfmsub231pd|vfmsub132pd|vfmsub213pd|vmaskmovps|vmaskmovpd|vpermilps|vpermilpd|vperm2f128|vzeroall|vzeroupper|vpbroadcastb|vpbroadcastw|vpbroadcastd|vpbroadcastq|vbroadcasti128|vinserti128|vextracti128|vpminud|vpmuludq|vgatherdpd|vgatherqpd|vgatherdps|vgatherqps|vpgatherdd|vpgatherdq|vpgatherqd|vpgatherqq|vpmaskmovd|vpmaskmovq|vpermps|vpermd|vpermpd|vpermq|vperm2i128|vpblendd|vpsllvd|vpsllvq|vpsrlvd|vpsrlvq|vpsravd|vblendmpd|vblendmps|vpblendmd|vpblendmq|vpblendmb|vpblendmw|vpcmpd|vpcmpud|vpcmpq|vpcmpuq|vpcmpb|vpcmpub|vpcmpw|vpcmpuw|vptestmd|vptestmq|vptestnmd|vptestnmq|vptestmb|vptestmw|vptestnmb|vptestnmw|vcompresspd|vcompressps|vpcompressd|vpcompressq|vexpandpd|vexpandps|vpexpandd|vpexpandq|vpermb|vpermw|vpermt2b|vpermt2w|vpermi2pd|vpermi2ps|vpermi2d|vpermi2q|vpermi2b|vpermi2w|vpermt2ps|vpermt2pd|vpermt2d|vpermt2q|vshuff32x4|vshuff64x2|vshuffi32x4|vshuffi64x2|vpmultishiftqb|vpternlogd|vpternlogq|vpmovqd|vpmovsqd|vpmovusqd|vpmovqw|vpmovsqw|vpmovusqw|vpmovqb|vpmovsqb|vpmovusqb|vpmovdw|vpmovsdw|vpmovusdw|vpmovdb|vpmovsdb|vpmovusdb|vpmovwb|vpmovswb|vpmovuswb|vcvtps2udq|vcvtpd2udq|vcvttps2udq|vcvttpd2udq|vcvtss2usi|vcvtsd2usi|vcvttss2usi|vcvttsd2usi|vcvtps2qq|vcvtpd2qq|vcvtps2uqq|vcvtpd2uqq|vcvttps2qq|vcvttpd2qq|vcvttps2uqq|vcvttpd2uqq|vcvtudq2ps|vcvtudq2pd|vcvtusi2ps|vcvtusi2pd|vcvtusi2sd|vcvtusi2ss|vcvtuqq2ps|vcvtuqq2pd|vcvtqq2pd|vcvtqq2ps|vgetexppd|vgetexpps|vgetexpsd|vgetexpss|vgetmantpd|vgetmantps|vgetmantsd|vgetmantss|vfixupimmpd|vfixupimmps|vfixupimmsd|vfixupimmss|vrcp14pd|vrcp14ps|vrcp14sd|vrcp14ss|vrndscaleps|vrndscalepd|vrndscaless|vrndscalesd|vrsqrt14pd|vrsqrt14ps|vrsqrt14sd|vrsqrt14ss|vscalefps|vscalefpd|vscalefss|vscalefsd|valignd|valignq|vdbpsadbw|vpabsq|vpmaxsq|vpmaxuq|vpminsq|vpminuq|vprold|vprolvd|vprolq|vprolvq|vprord|vprorvd|vprorq|vprorvq|vpscatterdd|vpscatterdq|vpscatterqd|vpscatterqq|vscatterdps|vscatterdpd|vscatterqps|vscatterqpd|vpconflictd|vpconflictq|vplzcntd|vplzcntq|vpbroadcastmb2q|vpbroadcastmw2d|vexp2pd|vexp2ps|vrcp28pd|vrcp28ps|vrcp28sd|vrcp28ss|vrsqrt28pd|vrsqrt28ps|vrsqrt28sd|vrsqrt28ss|vgatherpf0dps|vgatherpf0qps|vgatherpf0dpd|vgatherpf0qpd|vgatherpf1dps|vgatherpf1qps|vgatherpf1dpd|vgatherpf1qpd|vscatterpf0dps|vscatterpf0qps|vscatterpf0dpd|vscatterpf0qpd|vscatterpf1dps|vscatterpf1qps|vscatterpf1dpd|vscatterpf1qpd|vfpclassps|vfpclasspd|vfpclassss|vfpclasssd|vrangeps|vrangepd|vrangess|vrangesd|vreduceps|vreducepd|vreducess|vreducesd|vpmovm2d|vpmovm2q|vpmovm2b|vpmovm2w|vpmovd2m|vpmovq2m|vpmovb2m|vpmovw2m|vpmullq|vpmadd52luq|vpmadd52huq|v4fmaddps|v4fmaddss|v4fnmaddps|v4fnmaddss|vp4dpwssd|vp4dpwssds|vpdpbusd|vpdpbusds|vpdpwssd|vpdpwssds|vpcompressb|vpcompressw|vpexpandb|vpexpandw|vpshld|vpshldv|vpshrd|vpshrdv|vpopcntd|vpopcntq|vpopcntb|vpopcntw|vpshufbitqmb|gf2p8affineinvqb|gf2p8affineqb|gf2p8mulb|vpclmulqdq|vaesdec|vaesdeclast|vaesenc|vaesenclast)[ \t]/' YOURFILE.asm

NOTE: tested with gawk and nawk.

like image 150
Andriy Makukha Avatar answered Oct 19 '22 13:10

Andriy Makukha


There is no need to check the assembly. Most compilers provide optimisation reports that exactly tell you whether or not your loops were vectorised using SIMD instructions.

If you compile using GCC, set -O3 -march=native to make sure vectorisation is performed using whichever SIMD instruction set (SSE, AVX, ...) the CPU you are compiling on supports, and add -fopt-info to make the compiler verbose about optimisations:

g++ -O3 -march=native -fopt-info -o main.o main.cpp

This will give you output like:

main.cpp:12:20: note: loop vectorized
main.cpp:12:20: note: loop peeled for vectorization to enhance alignment

Hope that helps.

like image 22
noma Avatar answered Oct 19 '22 14:10

noma


Notice that most packed SSE instructions end with PS/PD we'll have a simpler way to check for packed SSEx instructions after dumping the binary content to asmfile

grep %xmm asmfile | grep -P '([[:xdigit:]]{2}\s)+\s*[[:alnum:]]+p[sd]\s+'

or the xmm check can be combined into the pattern

grep -P '([[:xdigit:]]{2}\s)+\s*[[:alnum:]]+p[sd]\s+.+xmm' asmfile

This will suffice for programs only use floating-point operations. However for better coverage you also need to check for instructions begin with P so you need to change the regex a bit

grep -P '([[:xdigit:]]{2}\s)+\s*([[:alnum:]]+p[sd]\s+|p[[:alnum:]]+).+%xmm' asmfile

To also include MMX instructions in 32-bit code change the %xmm part at the end to %x?mm

To check for AVX1/2 you just need to find ymm or %ymm usage instead of checking the instruction name, because AVX1/2 instructions only have the vector version

grep ymm asmfile

Similarly AVX-512 can be checked with

grep zmm asmfile
like image 2
phuclv Avatar answered Oct 19 '22 13:10

phuclv