Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jump Table Switch Case question

People also ask

What is jump table in switch?

A jump table is basically an array of pointers to pieces of code to handle the various cases in the switch statement. It's most likely to be generated when your cases are dense (i.e. you have a case for every possible value in a range).

How does a jump table work?

The jumptable is a method of mapping some input integer to an action. It stems from the fact that you can use the input integer as the index of an array. The code sets up an array of pointers to functions. Your input integer is then used to select on of these function-pointers.

What is jump table in assembly language?

In computer programming, a branch table or jump table is a method of transferring program control (branching) to another part of a program (or a different program that may have been dynamically loaded) using a table of branch or jump instructions. It is a form of multiway branch.


A jump table is an abstract structure used to transfer control to another location. Goto, continue, and break are similar, except they always transfer to a specific location instead of one possibility from many. In particular, this control flow is not the same as a function call. (Wikipedia's article on branch tables is related.)

A switch statement is how to write jump tables in C/C++. Only a limited form is provided (can only switch on integral types) to make implementations easier and faster in this common case. (How to implement jump tables efficiently has been studied much more for integral types than for the general case.) A classic example is Duff's Device.

However, the full capability of a jump table is often not required, such as when every case would have a break statement. These "limited jump tables" are a different pattern, which is only taking advantage of a jump table's well-studied efficiency, and are common when each "action" is independent of the others.


Actual implementations of jump tables take different forms, mostly differing in how the key to index mapping is done. That mapping is where terms like "dictionary" and "hash table" come in, and those techniques can be used independently of a jump table. Saying that some code "uses a jump table" doesn't imply by itself that you have O(1) lookup.

The compiler is free to choose the lookup method for each switch statement, and there is no guarantee you'll get one particular implementation; however, compiler options such as optimize-for-speed and optimize-for-size should be taken into account.

You should look into studying data structures to get a handle on the different complexity requirements imposed by them. Briefly, if by "dictionary" you mean a balanced binary tree, then it is O(log n); and a hash table depends on its hash function and collision strategy. In the particular case of switch statements, since the compiler has full information, it can generate a perfect hash function which means O(1) lookup. However, don't get lost by just looking at overall algorithmic complexity: it hides important factors.


A jump table is basically an array of pointers to pieces of code to handle the various cases in the switch statement. It's most likely to be generated when your cases are dense (i.e. you have a case for every possible value in a range). For example, given a statement like:

switch (i) {
   case 1: printf("case 1"); break;
   case 2: printf("case 2"); break;
   case 3: printf("case 3"); break;
}

it could generate code roughly equivalent to something like this:

void case1() { printf("case 1"); }
void case2() { printf("case 2"); }
void case3() { printf("case 3"); }

typedef void (*pfunc)(void);

pfunc functions[3] = {case1, case2, case3};

if ((unsigned)i<3)    
    functions[i]();

This has O(K) complexity. A typical hash table also has roughly O(K) expected complexity, though the worst case is typically O(N). The jump table will usually be faster, but it will usually only be used if the table will be quite dense, whereas a hash table/dictionary works quite well even when the cases would be quite sparse.


Suppose you had an array of procedures:

void fa() { 
 printf("a\n");
}

...

void fz() { 
 printf("it's z!\n");
}



typedef void (*F)();
F table[26]={fa,fb,...,fz};

Suppose you accept a character (from a-z) of input from the user and run fc:

char c;
switch(c) {
   case 'a': fa();break;
   case 'b': fb();break;
   ...
   case 'z': fz();break;       
   default: exit(-1);
}

Ideally this would be replaced with something like:

if (c<'a' || c>'z') exit(-1);
else (*table[c-'a'])();

Naturally, you might make the table bigger so the range check wouldn't be necessary.

The compiler would do this for arbitrary code, not necessarily function calls only, and would do it by storing the address to jump to (essentially, a goto). C doesn't directly support any sort of computed goto (indexing into a table or otherwise), but the CPU instructions for it are pretty simple.


Compiling for a switch statement can take many forms, depending on the cases. If the cases are close together, it is a no brainer: use a jump table. If the cases are far apart, use if (case == value) or use a map. Or a compiler can use a combination: islands of jump tables determined by if checks of the jump table ranges.


A jump table is simple an array of function pointers, you can picture a jump table roughly like so:

int (*functions[10])(); /* Array of 10 Function Pointers */

From my understanding, this is used with a case statement like so: each condition, case _, will be an index into this array, so for example:

switch( a ) {
    case 1:  // (*functions[1])() // Call function containing actions in case of 1
        ...  
    case 2:  // (*functions[2])() // Call function containing actions in case of 2
        ...

Each case, transforms to become simply functions[a]. This means that accessing functions[9] is just as quick as accessing functions[1]. Giving you the O(1) time you mentioned.

Obviously, if you have case 1, and case 4907, this isn't going to be a good method, and the hash table/dictionary methods you mentioned may come into play.


To further elaborate on Jerry's answer and others

Given:

int x=1;
switch (i) {
   case 1: x=6; break;
   case 2: x++;
   // Fall through
   case 3: x+=7; break;
}

you could have something like the following:

int f1() {return 6;}
int f2() {return 1+f3();}
int f3() {return 8;}

The the compiler could use a jump table to index {f1, f2, f3}

The compiler can do inlining when creating the table having f1, f2, f3 setting x directly to 6,9,8

But if you wrote the functions, and rolled your own jump table, f1,f2,f3 could be anywhere, but the compiler will know to put them close to the switch creating much better code locality than you could.

Note that in many cases the compiler will generate a guard to check if i is in range (or to handle the default) and if you are sure that it always is one of the cases, you could skip that

The interesting thing is that for under a small number of cases, and under different compiler flags (compiler dependent) the switch would not use a table, but would just do ifs, similar to:

if (i==1) x=f1();
else if (i==2) x=f2();
else if (i==3) x=f3();

or it might optimize this (where simple tests are one instruction) to:

x=(i==1) ? f1()
: (i==2) ? f2()
: (i==3) ? f3()
: x;

The best advice is to look at the assembly generated to see what the compiler did to your code on your architecture, g++ on Linux/intel will generate something like the following, if there is a jump table

(note I had to go to 5 case statements to force the jump table, it used ifs below that number of case statements)

Note that small holes will be in the jump table to do the default

int foo(int i)
{
   int x=1;
   switch (i) {
       case 1: x=6; break;
       case 2: x++;
        // Fall through
       case 3: x+=7; break;
       case 4: x+=2; break;
       case 5: x+=9; break;
    }
  return x;
}

would generate the following assembly code (// comments are mine):

        cmp     edi, 5                     //make sure it is not over 5
        ja      .L2                        //jump to default case
        mov     edi, edi
        jmp     [QWORD PTR .L4[0+rdi*8]]   // use the jump table at label L4:
.L4:
        .quad   .L2                        // if i=0, set x=1 (default)
        .quad   .L9                        // f1() see below
        .quad   .L10                       // f2() see below
        .quad   .L6                        // f3() see below
        .quad   .L7                        // f4() see below
        .quad   .L8                        // f5() see below
.L10:
        mov     eax, 9                     // x=9
        ret
.L9:
        mov     eax, 6                     // x=6
        ret
.L8:
        mov     eax, 10                    // x=10
        ret
.L6:
        mov     eax, 8                     // x=8
        ret
.L7:
        mov     eax, 3                     // x=3
        ret
.L2:
        mov     eax, 1                     // default, x was 1, noop is: x=1
        ret