Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neither GCC nor Clang will inline calls through an array of function pointers known at compile time — why?

Tags:

c++

gcc

Sample code on Compiler Explorer: https://godbolt.org/g/fPfw4k

I was attempting to use an array of function pointers as a jump table instead of switches as I found it to be cleaner. However, to my surprise, neither GCC nor Clang compiler seems capable of inlining this.

Is there a specific reason why?

Example code incase of dead link:

namespace{
template<int N>
int bar(){
    return N;
}

int foo1(int n){
     if(n < 0 || n > 5){
        __builtin_unreachable();
    }
    #if __clang__
    __builtin_assume(n >= 0 && n <= 5);
    #endif
    static int (* const fns[])() = {
        bar<0>, bar<1>, bar<2>, bar<3>, bar<4>, bar<5>
    };
    return fns[n]();
}

int foo2(int n){
    #if __clang__
    __builtin_assume(n >= 0 && n <= 5);
    #endif
    switch(n){
        case 0:
            return bar<0>();
        case 1:
            return bar<1>();
        case 2:
            return bar<2>();
        case 3:
            return bar<3>();
        case 4:
            return bar<4>();
        case 5:
            return bar<5>();
        default:
            __builtin_unreachable();
    }
}
}

int main(int argc, char** argv){
    volatile int n = foo1(argc);
    volatile int p = foo2(argc);
}

Using the language extension attribute always_inline provided by GCC & Clang makes no difference either.

like image 735
Rusty Shackleford Avatar asked Jun 18 '17 10:06

Rusty Shackleford


1 Answers

The compiler cannot inline the call in foo1 because the call does not use a compile-time constant callee. If it knows that a constant argument was passed to foo1 at compile time by inlining it, it will inline the correct function.

Consider this example:

namespace{
template<int N>
int bar(){
    return N;
}

int foo1(int n){
     if(n < 0 || n > 5){
        __builtin_unreachable();
    }
    #if __clang__
    __builtin_assume(n >= 0 && n <= 5);
    #endif
    static int (* const fns[])() = {
        bar<0>, bar<1>, bar<2>, bar<3>, bar<4>, bar<5>
    };
    return fns[n]();
}
}

int main(int argc, char** argv){
    int n = foo1(3);

    return n;
}

It is compiled to the following code by both compilers:

main:
        mov     eax, 3
        ret

In the case of foo2, the compiler starts out with 5 different calls with constant callees, all of which it inlines. Then it optimizes the resulting code further, generating its own jump table if it considers it profitable.

I guess the compiler could try to extract a switch from your jump table and then inline everything, but this would be quite complex and very unlikely to yield a performance improvement in the general case, so neither gcc nor clang seems to do this.

like image 102
PaulR Avatar answered Oct 21 '22 02:10

PaulR