I have a <code>switch</code> case program: Ascending order switch cases : <pre class="prettyprint"><code>int main() { int a, sc = 1; switch (sc) { case 1: a = 1; break; case 2: a = 2; break; } } </code></pre> Assembly of code: <pre class="prettyprint"><code>main: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov eax, DWORD PTR [rbp-4] cmp eax, 1 je .L3 cmp eax, 2 je .L4 jmp .L2 .L3: mov DWORD PTR [rbp-8], 1 jmp .L2 .L4: mov DWORD PTR [rbp-8], 2 nop .L2: mov eax, 0 pop rbp ret </code></pre> Descending order switch cases: <pre class="prettyprint"><code>int main() { int a, sc = 1; switch (sc) { case 2: a = 1; break; case 1: a = 2; break; } } </code></pre> Assembly of code: <pre class="prettyprint"><code>main: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov eax, DWORD PTR [rbp-4] cmp eax, 1 je .L3 cmp eax, 2 jne .L2 mov DWORD PTR [rbp-8], 1 jmp .L2 .L3: mov DWORD PTR [rbp-8], 2 nop .L2: mov eax, 0 pop rbp ret </code></pre> Here, ascending order cases generated more assembly than descending order. So, if I have more numbers of switch cases, then does the order of cases affect performance?

Compiler optimization of <code>switch</code> statements is tricky. Of course, you need to enable optimizations (e.g. try to compile your code with <code>gcc -O2 -fverbose-asm -S</code> with GCC and look inside the generated <code>.s</code> assembler file). BTW on both of your examples my GCC 7 on Debian/Sid/x86-64 gives simply: <pre class="prettyprint"><code> .type main, @function main: .LFB0: .cfi_startproc # rsp.c:13: } xorl %eax, %eax # ret .cfi_endproc </code></pre> (so there in no trace of <code>switch</code> in that generated code) If you need to understand how a compiler could optimize <code>switch</code>, there are some papers on that subject, such as this one. <blockquote> If I have more numbers of switch cases, then an order of cases effect on performance? </blockquote> Not in general, if you are using some optimizing compiler and asking it to optimize. See also this. If that matters to you so much (but it should not, leave micro-optimizations to your compiler!), you need to benchmark, to profile and perhaps to study the generated assembler code. BTW, cache misses and register allocation could matter much more than order of <code>case</code>-s so I think you should not bother at all. Keep in mind the approximate timing estimates of recent computers. Put the <code>case</code>s in the most readable order (for the next developer working on that same source code). Read also about threaded code. If you have objective (performance related) reasons to re-order the <code>case</code>-s (which is very unlikely and should happen at most once in your lifetime), write some good comment explaining those reasons. If you care that much about performance, be sure to benchmark and profile, and choose a good compiler and use it with relevant optimization options. Perhaps experiment several different optimization settings (and maybe several compilers). You may want to add <code>-march=native</code> (in addition of <code>-O2</code> or <code>-O3</code>). You could consider compiling and linking with <code>-flto -O2</code> to enable link-time optimizations, etc. You might also want profile based optimizations. BTW, many compilers are huge free software projects (in particular GCC and Clang). If you care that much about performance, you might patch the compiler, extend it by adding some additional optimization pass (by forking the source code, by adding some plugin to GCC or some GCC MELT extensions). That requires months or years of work (notably to understand the internal representations and organization of that compiler). (Don't forget to take development costs into account; in most cases, they cost much more)

Does the order of cases in a switch statement affect performance?

Tags:

performance

c

gcc

switch-statement

I have a switch case program:

Ascending order switch cases :

int main() {         int a, sc = 1;         switch (sc)         {                 case 1:                         a = 1;                         break;                 case 2:                         a = 2;                         break;         } }

Assembly of code:

main:         push    rbp         mov     rbp, rsp         mov     DWORD PTR [rbp-4], 1         mov     eax, DWORD PTR [rbp-4]         cmp     eax, 1         je      .L3         cmp     eax, 2         je      .L4         jmp     .L2 .L3:         mov     DWORD PTR [rbp-8], 1         jmp     .L2 .L4:         mov     DWORD PTR [rbp-8], 2         nop .L2:         mov     eax, 0         pop     rbp         ret

Descending order switch cases:

int main() {         int a, sc = 1;         switch (sc)         {                 case 2:                         a = 1;                         break;                 case 1:                         a = 2;                         break;         } }

Assembly of code:

main:         push    rbp         mov     rbp, rsp         mov     DWORD PTR [rbp-4], 1         mov     eax, DWORD PTR [rbp-4]         cmp     eax, 1         je      .L3         cmp     eax, 2         jne     .L2         mov     DWORD PTR [rbp-8], 1         jmp     .L2 .L3:         mov     DWORD PTR [rbp-8], 2         nop .L2:         mov     eax, 0         pop     rbp         ret

Here, ascending order cases generated more assembly than descending order.

So, if I have more numbers of switch cases, then does the order of cases affect performance?

543

asked Nov 03 '17 06:11

msc

2 Answers

You're looking at unoptimized code, so studying it for performance isn't very meaningful. If you look at optimized code for your examples, you'll find that it doesn't do the comparisons at all! The optimizer notices that the switch variable sc always has the value 1, so it removes the unreachable case 2.

The optimizer also sees that the variable a isn't used after it's assigned, so it removes the code in case 1 as well, leaving main() an empty function. And it removes the function prolog/epilog that manipulates rbp since that register is unused.

So the optimized code ends up the same for either version of your main() function:

main:     xor eax, eax     ret

In short, for the code in the question, it doesn't matter which order you put the case statements, because none of that code will be generated at all.

Would the case order matter in a more real-life example where the code actually is generated and used? Probably not. Note that even in your unoptimized generated code, both versions test for the two case values in numeric order, checking first for 1 and then for 2, regardless of the order in the source code. Clearly the compiler is doing some sorting even in the unoptimized code.

Be sure to note Glenn and Lundin's comments: the order of the case sections is not the only change between your two examples, the actual code is different too. In one of them, the case values match the values set into a, but not so in the other.

Compilers use various strategies for switch/case statements depending on the actual values used. They may use a series of comparisons as in these examples, or perhaps a jump table. It can be interesting to study the generated code, but as always, if performance matters, watch your optimization settings and test it in a real-life situation.

111

answered Sep 23 '22 09:09

Michael Geary

Compiler optimization of switch statements is tricky. Of course, you need to enable optimizations (e.g. try to compile your code with gcc -O2 -fverbose-asm -S with GCC and look inside the generated .s assembler file). BTW on both of your examples my GCC 7 on Debian/Sid/x86-64 gives simply:

        .type   main, @function main: .LFB0:         .cfi_startproc # rsp.c:13: }         xorl    %eax, %eax      #         ret         .cfi_endproc

^{(so there in no trace of switch in that generated code)}

If you need to understand how a compiler could optimize switch, there are some papers on that subject, such as this one.

If I have more numbers of switch cases, then an order of cases effect on performance?

Not in general, if you are using some optimizing compiler and asking it to optimize. See also this.

If that matters to you so much (but it should not, leave micro-optimizations to your compiler!), you need to benchmark, to profile and perhaps to study the generated assembler code. BTW, cache misses and register allocation could matter much more than order of case-s so I think you should not bother at all. Keep in mind the approximate timing estimates of recent computers. Put the cases in the most readable order (for the next developer working on that same source code). Read also about threaded code. If you have objective (performance related) reasons to re-order the case-s (which is very unlikely and should happen at most once in your lifetime), write some good comment explaining those reasons.

If you care that much about performance, be sure to benchmark and profile, and choose a good compiler and use it with relevant optimization options. Perhaps experiment several different optimization settings (and maybe several compilers). You may want to add -march=native (in addition of -O2 or -O3). You could consider compiling and linking with -flto -O2 to enable link-time optimizations, etc. You might also want profile based optimizations.

BTW, many compilers are huge free software projects (in particular GCC and Clang). If you care that much about performance, you might patch the compiler, extend it by adding some additional optimization pass (by forking the source code, by adding some plugin to GCC or some GCC MELT extensions). That requires months or years of work (notably to understand the internal representations and organization of that compiler).

^{(Don't forget to take development costs into account; in most cases, they cost much more)}

answered Sep 24 '22 09:09

Basile Starynkevitch

Related questions
                            
                                What's the syntactically proper way to declare a C struct?
                            
                                ARM: link register and frame pointer
                            
                                JNI vs. JNA performance
                            
                                When implementing an infinite loop, is there a difference in using while(1) vs for(;;) vs goto (in C)?
                            
                                Difference between lstat fstat and stat in C
                            
                                What does this ">>=" operator mean in C?
                            
                                Referencing a char* that went out of scope
                            
                                C typedef of pointer to structure
                            
                                Can I get a non-const C string back from a C++ string?
                            
                                Definitions of sqrt, sin, cos, pow etc. in cmath
                            
                                How to use C code in C++
                            
                                Parse string into argv/argc
                            
                                error: expected declaration or statement at end of input in c
                            
                                How can we know the caller function's name?
                            
                                Initialize array of strings
                            
                                Why does m[1] - m[0] return 3 where m is a 3x3 matrix?
                            
                                Variable number of parameters in function in C++
                            
                                Allocating memory for a Structure in C
                            
                                String going crazy if I don't give it a little extra room. Can anyone explain what is happening here?
                            
                                Setup OpenCV-2.3 for Visual Studio 2010

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With