I wrote some code to do a bunch of math, and it needs to go fast, so I need it to use SSE and AVX instructions. I'm compiling it using g++ with the flags <code>-O3</code> and <code>-march=native</code>, so I think it's using SSE and AVX instructions, but I'm not sure. Most of my code looks something like the following: <pre class="prettyprint"><code>for(int i = 0;i<size;i++){ a[i] = b[i] * c[i]; } </code></pre> Is there any way I can tell if my code (after compilation) uses SSE and AVX instructions? I think I could look at the assembly to see, but I don't know assembly, and I don't know how to see the assembly that the compiler outputs.

There is no need to check the assembly. Most compilers provide optimisation reports that exactly tell you whether or not your loops were vectorised using SIMD instructions. If you compile using GCC, set <code>-O3 -march=native</code> to make sure vectorisation is performed using whichever SIMD instruction set (SSE, AVX, ...) the CPU you are compiling on supports, and add <code>-fopt-info</code> to make the compiler verbose about optimisations: <pre class="prettyprint"><code>g++ -O3 -march=native -fopt-info -o main.o main.cpp </code></pre> This will give you output like: <pre class="prettyprint"><code>main.cpp:12:20: note: loop vectorized main.cpp:12:20: note: loop peeled for vectorization to enhance alignment </code></pre> Hope that helps.

Notice that most packed SSE instructions end with PS/PD we'll have a simpler way to check for packed SSEx instructions after dumping the binary content to <code>asmfile</code> <pre class="prettyprint"><code>grep %xmm asmfile | grep -P '([[:xdigit:]]{2}\s)+\s*[[:alnum:]]+p[sd]\s+' </code></pre> or the xmm check can be combined into the pattern <pre class="prettyprint"><code>grep -P '([[:xdigit:]]{2}\s)+\s*[[:alnum:]]+p[sd]\s+.+xmm' asmfile </code></pre> This will suffice for programs only use floating-point operations. However for better coverage you also need to check for instructions begin with <code>P</code> so you need to change the regex a bit <pre class="prettyprint"><code>grep -P '([[:xdigit:]]{2}\s)+\s*([[:alnum:]]+p[sd]\s+|p[[:alnum:]]+).+%xmm' asmfile </code></pre> To also include MMX instructions in 32-bit code change the <code>%xmm</code> part at the end to <code>%x?mm</code> To check for AVX1/2 you just need to find <code>ymm</code> or <code>%ymm</code> usage instead of checking the instruction name, because AVX1/2 instructions only have the vector version <pre class="prettyprint"><code>grep ymm asmfile </code></pre> Similarly AVX-512 can be checked with <pre class="prettyprint"><code>grep zmm asmfile </code></pre>

How to check if compiled code uses SSE and AVX instructions?

Tags:

c++

x86

assembly

g++

simd

I wrote some code to do a bunch of math, and it needs to go fast, so I need it to use SSE and AVX instructions. I'm compiling it using g++ with the flags -O3 and -march=native, so I think it's using SSE and AVX instructions, but I'm not sure. Most of my code looks something like the following:

for(int i = 0;i<size;i++){
    a[i] = b[i] * c[i];
}

Is there any way I can tell if my code (after compilation) uses SSE and AVX instructions? I think I could look at the assembly to see, but I don't know assembly, and I don't know how to see the assembly that the compiler outputs.

901

asked Dec 19 '17 00:12

BadProgrammer99

3 Answers

Under Linux, you could also decompile your binary:

objdump -d YOURFILE > YOURFILE.asm

Then find all SSE instructions:

awk '/[ \t](addps|addss|andnps|andps|cmpps|cmpss|comiss|cvtpi2ps|cvtps2pi|cvtsi2ss|cvtss2s|cvttps2pi|cvttss2si|divps|divss|ldmxcsr|maxps|maxss|minps|minss|movaps|movhlps|movhps|movlhps|movlps|movmskps|movntps|movss|movups|mulps|mulss|orps|rcpps|rcpss|rsqrtps|rsqrtss|shufps|sqrtps|sqrtss|stmxcsr|subps|subss|ucomiss|unpckhps|unpcklps|xorps|pavgb|pavgw|pextrw|pinsrw|pmaxsw|pmaxub|pminsw|pminub|pmovmskb|psadbw|pshufw)[ \t]/' YOURFILE.asm

Find only packed SSE instructions (suggested by @Peter Cordes in comments):

awk '/[ \t](addps|andnps|andps|cmpps|cvtpi2ps|cvtps2pi|cvttps2pi|divps|maxps|minps|movaps|movhlps|movhps|movlhps|movlps|movmskps|movntps|movntq|movups|mulps|orps|pavgb|pavgw|pextrw|pinsrw|pmaxsw|pmaxub|pminsw|pminub|pmovmskb|pmulhuw|psadbw|pshufw|rcpps|rsqrtps|shufps|sqrtps|subps|unpckhps|unpcklps|xorps)[ \t]/' YOURFILE.asm

Find all SSE2 instructions (except MOVSD and CMPSD, which were first introduced in 80386):

awk '/[ \t](addpd|addsd|andnpd|andpd|cmppd|comisd|cvtdq2pd|cvtdq2ps|cvtpd2dq|cvtpd2pi|cvtpd2ps|cvtpi2pd|cvtps2dq|cvtps2pd|cvtsd2si|cvtsd2ss|cvtsi2sd|cvtss2sd|cvttpd2dq|cvttpd2pi|cvtps2dq|cvttsd2si|divpd|divsd|maxpd|maxsd|minpd|minsd|movapd|movhpd|movlpd|movmskpd|movupd|mulpd|mulsd|orpd|shufpd|sqrtpd|sqrtsd|subpd|subsd|ucomisd|unpckhpd|unpcklpd|xorpd|movdq2q|movdqa|movdqu|movq2dq|paddq|pmuludq|pshufhw|pshuflw|pshufd|pslldq|psrldq|punpckhqdq|punpcklqdq)[ \t]/' YOURFILE.asm

Find only packed SSE2 instructions:

awk '/[ \t](addpd|andnpd|andpd|cmppd|cvtdq2pd|cvtdq2ps|cvtpd2dq|cvtpd2pi|cvtpd2ps|cvtpi2pd|cvtps2dq|cvtps2pd|cvttpd2dq|cvttpd2pi|cvttps2dq|divpd|maxpd|minpd|movapd|movapd|movhpd|movhpd|movlpd|movlpd|movmskpd|movntdq|movntpd|movupd|movupd|mulpd|orpd|pshufd|pshufhw|pshuflw|pslldq|psrldq|punpckhqdq|shufpd|sqrtpd|subpd|unpckhpd|unpcklpd|xorpd)[ \t]/' YOURFILE.asm

Find all SSE3 instructions:

awk '/[ \t](addsubpd|addsubps|haddpd|haddps|hsubpd|hsubps|movddup|movshdup|movsldup|lddqu|fisttp)[ \t]/' YOURFILE.asm

Find all SSSE3 instructions:

awk '/[ \t](psignw|psignd|psignb|pshufb|pmulhrsw|pmaddubsw|phsubw|phsubsw|phsubd|phaddw|phaddsw|phaddd|palignr|pabsw|pabsd|pabsb)[ \t]/' YOURFILE.asm

Find all SSE4 instructions:

awk '/[ \t](mpsadbw|phminposuw|pmulld|pmuldq|dpps|dppd|blendps|blendpd|blendvps|blendvpd|pblendvb|pblenddw|pminsb|pmaxsb|pminuw|pmaxuw|pminud|pmaxud|pminsd|pmaxsd|roundps|roundss|roundpd|roundsd|insertps|pinsrb|pinsrd|pinsrq|extractps|pextrb|pextrd|pextrw|pextrq|pmovsxbw|pmovzxbw|pmovsxbd|pmovzxbd|pmovsxbq|pmovzxbq|pmovsxwd|pmovzxwd|pmovsxwq|pmovzxwq|pmovsxdq|pmovzxdq|ptest|pcmpeqq|pcmpgtq|packusdw|pcmpestri|pcmpestrm|pcmpistri|pcmpistrm|crc32|popcnt|movntdqa|extrq|insertq|movntsd|movntss|lzcnt)[ \t]/' YOURFILE.asm

Find most common AVX instructions (including scalar, including AVX2, AVX-512 family and some FMA like vfmadd132pd):

awk '/[ \t](vmovapd|vmulpd|vaddpd|vsubpd|vfmadd213pd|vfmadd231pd|vfmadd132pd|vmulsd|vaddsd|vmosd|vsubsd|vbroadcastss|vbroadcastsd|vblendpd|vshufpd|vroundpd|vroundsd|vxorpd|vfnmadd231pd|vfnmadd213pd|vfnmadd132pd|vandpd|vmaxpd|vmovmskpd|vcmppd|vpaddd|vbroadcastf128|vinsertf128|vextractf128|vfmsub231pd|vfmsub132pd|vfmsub213pd|vmaskmovps|vmaskmovpd|vpermilps|vpermilpd|vperm2f128|vzeroall|vzeroupper|vpbroadcastb|vpbroadcastw|vpbroadcastd|vpbroadcastq|vbroadcasti128|vinserti128|vextracti128|vpminud|vpmuludq|vgatherdpd|vgatherqpd|vgatherdps|vgatherqps|vpgatherdd|vpgatherdq|vpgatherqd|vpgatherqq|vpmaskmovd|vpmaskmovq|vpermps|vpermd|vpermpd|vpermq|vperm2i128|vpblendd|vpsllvd|vpsllvq|vpsrlvd|vpsrlvq|vpsravd|vblendmpd|vblendmps|vpblendmd|vpblendmq|vpblendmb|vpblendmw|vpcmpd|vpcmpud|vpcmpq|vpcmpuq|vpcmpb|vpcmpub|vpcmpw|vpcmpuw|vptestmd|vptestmq|vptestnmd|vptestnmq|vptestmb|vptestmw|vptestnmb|vptestnmw|vcompresspd|vcompressps|vpcompressd|vpcompressq|vexpandpd|vexpandps|vpexpandd|vpexpandq|vpermb|vpermw|vpermt2b|vpermt2w|vpermi2pd|vpermi2ps|vpermi2d|vpermi2q|vpermi2b|vpermi2w|vpermt2ps|vpermt2pd|vpermt2d|vpermt2q|vshuff32x4|vshuff64x2|vshuffi32x4|vshuffi64x2|vpmultishiftqb|vpternlogd|vpternlogq|vpmovqd|vpmovsqd|vpmovusqd|vpmovqw|vpmovsqw|vpmovusqw|vpmovqb|vpmovsqb|vpmovusqb|vpmovdw|vpmovsdw|vpmovusdw|vpmovdb|vpmovsdb|vpmovusdb|vpmovwb|vpmovswb|vpmovuswb|vcvtps2udq|vcvtpd2udq|vcvttps2udq|vcvttpd2udq|vcvtss2usi|vcvtsd2usi|vcvttss2usi|vcvttsd2usi|vcvtps2qq|vcvtpd2qq|vcvtps2uqq|vcvtpd2uqq|vcvttps2qq|vcvttpd2qq|vcvttps2uqq|vcvttpd2uqq|vcvtudq2ps|vcvtudq2pd|vcvtusi2ps|vcvtusi2pd|vcvtusi2sd|vcvtusi2ss|vcvtuqq2ps|vcvtuqq2pd|vcvtqq2pd|vcvtqq2ps|vgetexppd|vgetexpps|vgetexpsd|vgetexpss|vgetmantpd|vgetmantps|vgetmantsd|vgetmantss|vfixupimmpd|vfixupimmps|vfixupimmsd|vfixupimmss|vrcp14pd|vrcp14ps|vrcp14sd|vrcp14ss|vrndscaleps|vrndscalepd|vrndscaless|vrndscalesd|vrsqrt14pd|vrsqrt14ps|vrsqrt14sd|vrsqrt14ss|vscalefps|vscalefpd|vscalefss|vscalefsd|valignd|valignq|vdbpsadbw|vpabsq|vpmaxsq|vpmaxuq|vpminsq|vpminuq|vprold|vprolvd|vprolq|vprolvq|vprord|vprorvd|vprorq|vprorvq|vpscatterdd|vpscatterdq|vpscatterqd|vpscatterqq|vscatterdps|vscatterdpd|vscatterqps|vscatterqpd|vpconflictd|vpconflictq|vplzcntd|vplzcntq|vpbroadcastmb2q|vpbroadcastmw2d|vexp2pd|vexp2ps|vrcp28pd|vrcp28ps|vrcp28sd|vrcp28ss|vrsqrt28pd|vrsqrt28ps|vrsqrt28sd|vrsqrt28ss|vgatherpf0dps|vgatherpf0qps|vgatherpf0dpd|vgatherpf0qpd|vgatherpf1dps|vgatherpf1qps|vgatherpf1dpd|vgatherpf1qpd|vscatterpf0dps|vscatterpf0qps|vscatterpf0dpd|vscatterpf0qpd|vscatterpf1dps|vscatterpf1qps|vscatterpf1dpd|vscatterpf1qpd|vfpclassps|vfpclasspd|vfpclassss|vfpclasssd|vrangeps|vrangepd|vrangess|vrangesd|vreduceps|vreducepd|vreducess|vreducesd|vpmovm2d|vpmovm2q|vpmovm2b|vpmovm2w|vpmovd2m|vpmovq2m|vpmovb2m|vpmovw2m|vpmullq|vpmadd52luq|vpmadd52huq|v4fmaddps|v4fmaddss|v4fnmaddps|v4fnmaddss|vp4dpwssd|vp4dpwssds|vpdpbusd|vpdpbusds|vpdpwssd|vpdpwssds|vpcompressb|vpcompressw|vpexpandb|vpexpandw|vpshld|vpshldv|vpshrd|vpshrdv|vpopcntd|vpopcntq|vpopcntb|vpopcntw|vpshufbitqmb|gf2p8affineinvqb|gf2p8affineqb|gf2p8mulb|vpclmulqdq|vaesdec|vaesdeclast|vaesenc|vaesenclast)[ \t]/' YOURFILE.asm

NOTE: tested with gawk and nawk.

150

answered Oct 19 '22 13:10

Andriy Makukha

There is no need to check the assembly. Most compilers provide optimisation reports that exactly tell you whether or not your loops were vectorised using SIMD instructions.

If you compile using GCC, set -O3 -march=native to make sure vectorisation is performed using whichever SIMD instruction set (SSE, AVX, ...) the CPU you are compiling on supports, and add -fopt-info to make the compiler verbose about optimisations:

g++ -O3 -march=native -fopt-info -o main.o main.cpp

This will give you output like:

main.cpp:12:20: note: loop vectorized
main.cpp:12:20: note: loop peeled for vectorization to enhance alignment

Hope that helps.

answered Oct 19 '22 14:10

noma

Notice that most packed SSE instructions end with PS/PD we'll have a simpler way to check for packed SSEx instructions after dumping the binary content to asmfile

grep %xmm asmfile | grep -P '([[:xdigit:]]{2}\s)+\s*[[:alnum:]]+p[sd]\s+'

or the xmm check can be combined into the pattern

grep -P '([[:xdigit:]]{2}\s)+\s*[[:alnum:]]+p[sd]\s+.+xmm' asmfile

This will suffice for programs only use floating-point operations. However for better coverage you also need to check for instructions begin with P so you need to change the regex a bit

grep -P '([[:xdigit:]]{2}\s)+\s*([[:alnum:]]+p[sd]\s+|p[[:alnum:]]+).+%xmm' asmfile

To also include MMX instructions in 32-bit code change the %xmm part at the end to %x?mm

To check for AVX1/2 you just need to find ymm or %ymm usage instead of checking the instruction name, because AVX1/2 instructions only have the vector version

grep ymm asmfile

Similarly AVX-512 can be checked with

grep zmm asmfile

answered Oct 19 '22 13:10

phuclv

Related questions
                            
                                Handling void assignment in C++ generic programming
                            
                                std::is_constructible on incomplete types
                            
                                What does __sync_synchronize do?
                            
                                Name of C/C++ stdlib naming convention?
                            
                                How to use Unicode in C++?
                            
                                Nested NameSpaces in C++
                            
                                What is a nested name specifier?
                            
                                Is there any way to enforce that instances are only ever on the stack?
                            
                                Why are two using clauses resolving to the same type seen as ambigious in gcc
                            
                                stl vector and c++: how to .resize without a default constructor?
                            
                                error: strcpy was not declared in this scope
                            
                                How can I find the user's home dir in a cross platform manner, using C++?
                            
                                What does the term "empty loop" refer to exactly in C and C++?
                            
                                Program can't find libgcc_s_dw2-1.dll [duplicate]
                            
                                imread not working in Opencv
                            
                                Does C++11 offer a better way to concatenate strings on the fly?
                            
                                C++: Reason why using ".hh" as extension for C++ header files [closed]
                            
                                Why are there two overloads for vector::push_back?
                            
                                Are global variables in C++ stored on the stack, heap or neither of them?
                            
                                How can I define macros for the C++ intellisense engine?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With