I have a switch
case program:
Ascending order switch cases :
int main() { int a, sc = 1; switch (sc) { case 1: a = 1; break; case 2: a = 2; break; } }
Assembly of code:
main: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov eax, DWORD PTR [rbp-4] cmp eax, 1 je .L3 cmp eax, 2 je .L4 jmp .L2 .L3: mov DWORD PTR [rbp-8], 1 jmp .L2 .L4: mov DWORD PTR [rbp-8], 2 nop .L2: mov eax, 0 pop rbp ret
Descending order switch cases:
int main() { int a, sc = 1; switch (sc) { case 2: a = 1; break; case 1: a = 2; break; } }
Assembly of code:
main: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov eax, DWORD PTR [rbp-4] cmp eax, 1 je .L3 cmp eax, 2 jne .L2 mov DWORD PTR [rbp-8], 1 jmp .L2 .L3: mov DWORD PTR [rbp-8], 2 nop .L2: mov eax, 0 pop rbp ret
Here, ascending order cases generated more assembly than descending order.
So, if I have more numbers of switch cases, then does the order of cases affect performance?
The order of cases does not matter as long as you always have the reserved word “break” at the end of all cases that shouldn't have fall-through. The result of the expression in the switch statement must have the same type and value as a case in order to execute that case's instructions.
A switch statement is significantly faster than an if-else ladder if there are many nested if-else's involved. This is due to the creation of a jump table for switch during compilation. As a result, instead of checking which case is satisfied throughout execution, it just decides which case must be completed.
Execution of the switch statement body begins at the first statement in or after the matching labeled-statement . Execution proceeds until the end of the body, or until a break statement transfers control out of the body.
You're looking at unoptimized code, so studying it for performance isn't very meaningful. If you look at optimized code for your examples, you'll find that it doesn't do the comparisons at all! The optimizer notices that the switch variable sc
always has the value 1
, so it removes the unreachable case 2
.
The optimizer also sees that the variable a
isn't used after it's assigned, so it removes the code in case 1
as well, leaving main()
an empty function. And it removes the function prolog/epilog that manipulates rbp
since that register is unused.
So the optimized code ends up the same for either version of your main()
function:
main: xor eax, eax ret
In short, for the code in the question, it doesn't matter which order you put the case
statements, because none of that code will be generated at all.
Would the case
order matter in a more real-life example where the code actually is generated and used? Probably not. Note that even in your unoptimized generated code, both versions test for the two case
values in numeric order, checking first for 1
and then for 2
, regardless of the order in the source code. Clearly the compiler is doing some sorting even in the unoptimized code.
Be sure to note Glenn and Lundin's comments: the order of the case
sections is not the only change between your two examples, the actual code is different too. In one of them, the case values match the values set into a
, but not so in the other.
Compilers use various strategies for switch
/case
statements depending on the actual values used. They may use a series of comparisons as in these examples, or perhaps a jump table. It can be interesting to study the generated code, but as always, if performance matters, watch your optimization settings and test it in a real-life situation.
Compiler optimization of switch
statements is tricky. Of course, you need to enable optimizations (e.g. try to compile your code with gcc -O2 -fverbose-asm -S
with GCC and look inside the generated .s
assembler file). BTW on both of your examples my GCC 7 on Debian/Sid/x86-64 gives simply:
.type main, @function main: .LFB0: .cfi_startproc # rsp.c:13: } xorl %eax, %eax # ret .cfi_endproc
(so there in no trace of switch
in that generated code)
If you need to understand how a compiler could optimize switch
, there are some papers on that subject, such as this one.
If I have more numbers of switch cases, then an order of cases effect on performance?
Not in general, if you are using some optimizing compiler and asking it to optimize. See also this.
If that matters to you so much (but it should not, leave micro-optimizations to your compiler!), you need to benchmark, to profile and perhaps to study the generated assembler code. BTW, cache misses and register allocation could matter much more than order of case
-s so I think you should not bother at all. Keep in mind the approximate timing estimates of recent computers. Put the case
s in the most readable order (for the next developer working on that same source code). Read also about threaded code. If you have objective (performance related) reasons to re-order the case
-s (which is very unlikely and should happen at most once in your lifetime), write some good comment explaining those reasons.
If you care that much about performance, be sure to benchmark and profile, and choose a good compiler and use it with relevant optimization options. Perhaps experiment several different optimization settings (and maybe several compilers). You may want to add -march=native
(in addition of -O2
or -O3
). You could consider compiling and linking with -flto -O2
to enable link-time optimizations, etc. You might also want profile based optimizations.
BTW, many compilers are huge free software projects (in particular GCC and Clang). If you care that much about performance, you might patch the compiler, extend it by adding some additional optimization pass (by forking the source code, by adding some plugin to GCC or some GCC MELT extensions). That requires months or years of work (notably to understand the internal representations and organization of that compiler).
(Don't forget to take development costs into account; in most cases, they cost much more)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With