Is 'switch' faster than 'if'?

People also ask

Are switches faster than if statements?

A switch statement works much faster than an equivalent if-else ladder. It's because the compiler generates a jump table for a switch during compilation. As a result, during execution, instead of checking which case is satisfied, it only decides which case has to be executed.

Is it better to use switch or if-else?

Use switch every time you have more than 2 conditions on a single variable, take weekdays for example, if you have a different action for every weekday you should use a switch. Other situations (multiple variables or complex if clauses you should Ifs, but there isn't a rule on where to use each.

Is switch faster than if C++?

Generally switch statements are faster than if else statements. But when there are few cases (less than 5) it is better to with if else statements as there will no significant performance improvement. Compliers normally generates a jump table when compiling a switch statement by looking at the cases.

What is faster than if statement?

Switch is generally faster than a long list of ifs because the compiler can generate a jump table. The longer the list, the better a switch statement is over a series of if statements. Copy link CC BY-SA 2.5. Follow this answer to receive notifications.

There are several optimizations a compiler can make on a switch. I don't think the oft-mentioned "jump-table" is a very useful one though, as it only works when the input can be bounded some way.

C Pseudocode for a "jump table" would be something like this -- note that the compiler in practice would need to insert some form of if test around the table to ensure that the input was valid in the table. Note also that it only works in the specific case that the input is a run of consecutive numbers.

If the number of branches in a switch is extremely large, a compiler can do things like using binary search on the values of the switch, which (in my mind) would be a much more useful optimization, as it does significantly increase performance in some scenarios, is as general as a switch is, and does not result in greater generated code size. But to see that, your test code would need a LOT more branches to see any difference.

To answer your specific questions:

Clang generates one that looks like this:

test_switch(char):                       # @test_switch(char)
        movl    %edi, %eax
        cmpl    $19, %edi
        jbe     .LBB0_1
        retq
.LBB0_1:
        jmpq    *.LJTI0_0(,%rax,8)
        jmp     void call<0u>()         # TAILCALL
        jmp     void call<1u>()         # TAILCALL
        jmp     void call<2u>()         # TAILCALL
        jmp     void call<3u>()         # TAILCALL
        jmp     void call<4u>()         # TAILCALL
        jmp     void call<5u>()         # TAILCALL
        jmp     void call<6u>()         # TAILCALL
        jmp     void call<7u>()         # TAILCALL
        jmp     void call<8u>()         # TAILCALL
        jmp     void call<9u>()         # TAILCALL
        jmp     void call<10u>()        # TAILCALL
        jmp     void call<11u>()        # TAILCALL
        jmp     void call<12u>()        # TAILCALL
        jmp     void call<13u>()        # TAILCALL
        jmp     void call<14u>()        # TAILCALL
        jmp     void call<15u>()        # TAILCALL
        jmp     void call<16u>()        # TAILCALL
        jmp     void call<17u>()        # TAILCALL
        jmp     void call<18u>()        # TAILCALL
        jmp     void call<19u>()        # TAILCALL
.LJTI0_0:
        .quad   .LBB0_2
        .quad   .LBB0_3
        .quad   .LBB0_4
        .quad   .LBB0_5
        .quad   .LBB0_6
        .quad   .LBB0_7
        .quad   .LBB0_8
        .quad   .LBB0_9
        .quad   .LBB0_10
        .quad   .LBB0_11
        .quad   .LBB0_12
        .quad   .LBB0_13
        .quad   .LBB0_14
        .quad   .LBB0_15
        .quad   .LBB0_16
        .quad   .LBB0_17
        .quad   .LBB0_18
        .quad   .LBB0_19
        .quad   .LBB0_20
        .quad   .LBB0_21

I can say that it is not using a jump table -- 4 comparison instructions are clearly visible:

13FE81C51 cmp  qword ptr [rsp+30h],1 
13FE81C57 je   testSwitch+73h (13FE81C73h) 
13FE81C59 cmp  qword ptr [rsp+30h],2 
13FE81C5F je   testSwitch+87h (13FE81C87h) 
13FE81C61 cmp  qword ptr [rsp+30h],3 
13FE81C67 je   testSwitch+9Bh (13FE81C9Bh) 
13FE81C69 cmp  qword ptr [rsp+30h],4 
13FE81C6F je   testSwitch+0AFh (13FE81CAFh)

A jump table based solution does not use comparison at all.

Either not enough branches to cause the compiler to generate a jump table, or your compiler simply doesn't generate them. I'm not sure which.

EDIT 2014: There has been some discussion elsewhere from people familiar with the LLVM optimizer saying that the jump table optimization can be important in many scenarios; e.g. in cases where there is an enumeration with many values and many cases against values in said enumeration. That said, I stand by what I said above in 2011 -- too often I see people thinking "if I make it a switch, it'll be the same time no matter how many cases I have" -- and that's completely false. Even with a jump table you get the indirect jump cost and you pay for entries in the table for each case; and memory bandwidth is a Big Deal on modern hardware.

Write code for readability. Any compiler worth its salt is going to see an if / else if ladder and transform it into equivalent switch or vice versa if it would be faster to do so.

To your question:

1.What would a basic jump table look like, in x86 or x64?

Jump table is memory address that holds pointer to the labels in something like array structure. following example will help you understand how jump tables are laid out

00B14538  D8 09 AB 00 D8 09 AB 00 D8 09 AB 00 D8 09 AB 00  Ø.«.Ø.«.Ø.«.Ø.«.
00B14548  D8 09 AB 00 D8 09 AB 00 D8 09 AB 00 00 00 00 00  Ø.«.Ø.«.Ø.«.....
00B14558  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00B14568  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

enter image description here

Where 00B14538 is the pointer to the Jump table , and value like D8 09 AB 00 represents label pointer.

2.Is this code using a jump table? No in this case.

3.Why is there no performance difference in this example?

There is no performance difference because instruction for both case looks same, no jump table.

4.Is there any situation in which there is a significant performance difference?

If you have very long sequence of if check, in that case using a jump table improves performance (branching/jmp instructions are expensive if they don't predict near-perfectly) but comes with the cost of memory.

The code for all the compare instructions has some size, too, so especially with 32-bit pointers or offsets, a single jump table lookup might not cost a lot more size in an executable.

Conclusion: Compiler is smart enough handle such case and generate appropriate instructions :)

The compiler is free to compile the switch statement as a code which is equivalent to if-statement, or to create a jump table. It will likely chose one on the other based on what will execute fastest or generate the smallest code somewhat depending on what you have specified in you compiler options -- so worst case it will be the same speed as if-statements

I would trust the compiler to do the best choice and focus on what makes the code most readable.

If the number of cases becomes very large a jump table will be much faster than a series of if. However if the steps between the values is very large, then the jump table can become large, and the compiler may choose not to generate one.

How do you know your computer was not performing some task unrelated to the test during the switch test loop and performing fewer tasks during the if test loop? Your test results do not show anything as:

the difference is very small
there is only one result, not a series of results
there are too few cases

My results:

I addded:

printf("counter: %u\n", counter);

to the end so that it would not optimise away the loop as counter was never used in your example so why would the compiler perform the loop? Immediately, the switch was always winning even with such a micro-benchmark.

The other problem with your code is:

switch (counter % 4 + 1)

in your switch loop, versus

const size_t c = counter % 4 + 1;

in your if loop. Very big difference if you fix that. I believe that putting the statement inside the switch statement provokes the compiler into sending the value directly into the CPU registers rather than putting it on the stack first. This is therefore in favour of the switch statement and not a balanced test.

Oh and I think you should also reset counter between tests. In fact, you probably should be using some kind of random number instead of +1, +2, +3 etc, as it will probably optimise something there. By random number, I mean a number based on the current time, for example. Otherwise, the compiler could turn both of your functions into one long math operation and not even bother with any loops.

I have modified Ryan's code just enough to make sure the compiler couldn't figure things out before the code had run:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#define MAX_COUNT (1 << 26)
size_t counter = 0;

long long testSwitch()
{
    clock_t start = clock();
    size_t i;
    for (i = 0; i < MAX_COUNT; i++)
    {
        const size_t c = rand() % 20 + 1;

        switch (c)
        {
                case 1: counter += 20; break;
                case 2: counter += 33; break;
                case 3: counter += 62; break;
                case 4: counter += 15; break;
                case 5: counter += 416; break;
                case 6: counter += 3545; break;
                case 7: counter += 23; break;
                case 8: counter += 81; break;
                case 9: counter += 256; break;
                case 10: counter += 15865; break;
                case 11: counter += 3234; break;
                case 12: counter += 22345; break;
                case 13: counter += 1242; break;
                case 14: counter += 12341; break;
                case 15: counter += 41; break;
                case 16: counter += 34321; break;
                case 17: counter += 232; break;
                case 18: counter += 144231; break;
                case 19: counter += 32; break;
                case 20: counter += 1231; break;
        }
    }
    return 1000 * (long long)(clock() - start) / CLOCKS_PER_SEC;
}

long long testIf()
{
    clock_t start = clock();
    size_t i;
    for (i = 0; i < MAX_COUNT; i++)
    {
        const size_t c = rand() % 20 + 1;
        if (c == 1) { counter += 20; }
        else if (c == 2) { counter += 33; }
        else if (c == 3) { counter += 62; }
        else if (c == 4) { counter += 15; }
        else if (c == 5) { counter += 416; }
        else if (c == 6) { counter += 3545; }
        else if (c == 7) { counter += 23; }
        else if (c == 8) { counter += 81; }
        else if (c == 9) { counter += 256; }
        else if (c == 10) { counter += 15865; }
        else if (c == 11) { counter += 3234; }
        else if (c == 12) { counter += 22345; }
        else if (c == 13) { counter += 1242; }
        else if (c == 14) { counter += 12341; }
        else if (c == 15) { counter += 41; }
        else if (c == 16) { counter += 34321; }
        else if (c == 17) { counter += 232; }
        else if (c == 18) { counter += 144231; }
        else if (c == 19) { counter += 32; }
        else if (c == 20) { counter += 1231; }
    }
    return 1000 * (long long)(clock() - start) / CLOCKS_PER_SEC;
}

int main()
{
    srand(time(NULL));
    printf("Starting...\n");
    printf("Switch statement: %lld ms\n", testSwitch()); fflush(stdout);
    printf("counter: %d\n", counter);
    counter = 0;
    srand(time(NULL));
    printf("If     statement: %lld ms\n", testIf()); fflush(stdout);
    printf("counter: %d\n", counter);
}

switch: 3740
if: 3980

(similar results over multiple attempts)

I also reduced the number of cases/ifs to 5 and the switch function still won.

A good optimizing compiler such as MSVC can generate:

a simple jump table if the cases are arranged in a nice long range
a sparse (two-level) jump table if there are many gaps
a series of ifs if the number of cases is small or the values are not close together
a combination of above if the cases represent several groups of closely-spaced ranges.

In short, if the switch looks to be slower than a series of ifs, the compiler might just convert it to one. And it's likely to be not just a sequence of comparisons for each case, but a binary search tree. See here for an example.

Here are some results from the old (now hard to find) bench++ benchmark:

Test Name:   F000003                         Class Name:  Style
CPU Time:       0.781  nanoseconds           plus or minus     0.0715
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 2-way if/else if statement
 compare this test with F000004

Test Name:   F000004                         Class Name:  Style
CPU Time:        1.53  nanoseconds           plus or minus     0.0767
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 2-way switch statement
 compare this test with F000003

Test Name:   F000005                         Class Name:  Style
CPU Time:        7.70  nanoseconds           plus or minus      0.385
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way if/else if statement
 compare this test with F000006

Test Name:   F000006                         Class Name:  Style
CPU Time:        2.00  nanoseconds           plus or minus     0.0999
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way switch statement
 compare this test with F000005

Test Name:   F000007                         Class Name:  Style
CPU Time:        3.41  nanoseconds           plus or minus      0.171
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way sparse switch statement
 compare this test with F000005 and F000006

What we can see from this is that (on this machine, with this compiler -- VC++ 9.0 x64), each if test takes about 0.7 nanoseconds. As the number of tests goes up, the time scales almost perfectly linearly.

With the switch statement, there's almost no difference in speed between a 2-way and a 10-way test, as long as the values are dense. The 10-way test with sparse values takes about 1.6x as much time as the 10-way test with dense values -- but even with sparse values, still better than twice the speed of a 10-way if/else if.

Bottom line: using only a 4-way test won't really show you much about the performance of switch vs if/else. If you look at the numbers from this code, it's pretty easy to interpolate the fact that for a 4-way test, we'd expect the two to produce pretty similar results (~2.8 nanoseconds for an if/else, ~2.0 for switch).

Related questions
                            
                                #define macro for debug printing in C?
                            
                                Why does this for loop exit on some platforms and not on others?
                            
                                How can I suppress "unused parameter" warnings in C?
                            
                                Execution time of C program
                            
                                What is Bit Masking?
                            
                                Why does GCC use multiplication by a strange number in implementing integer division?
                            
                                warning: implicit declaration of function
                            
                                C fopen vs open
                            
                                What is exactly the base pointer and stack pointer? To what do they point?
                            
                                Significance of ios_base::sync_with_stdio(false); cin.tie(NULL);
                            
                                How do I check OS with a preprocessor directive?
                            
                                Modulo operation with negative numbers
                            
                                Why is 0 < -0x80000000?
                            
                                size_t vs. uintptr_t
                            
                                What is the difference between char array and char pointer in C?
                            
                                uint8_t vs unsigned char
                            
                                What does {0} mean when initializing an object?
                            
                                Difference between fprintf, printf and sprintf?
                            
                                Understanding typedefs for function pointers in C
                            
                                Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is 'switch' faster than 'if'?

Tags:

performance

c

assembly

switch-statement

jump-table

People also ask

Recent Activity

Donate For Us