Sample code on Compiler Explorer: https://godbolt.org/g/fPfw4k
I was attempting to use an array of function pointers as a jump table instead of switches as I found it to be cleaner. However, to my surprise, neither GCC nor Clang compiler seems capable of inlining this.
Is there a specific reason why?
Example code incase of dead link:
namespace{
template<int N>
int bar(){
return N;
}
int foo1(int n){
if(n < 0 || n > 5){
__builtin_unreachable();
}
#if __clang__
__builtin_assume(n >= 0 && n <= 5);
#endif
static int (* const fns[])() = {
bar<0>, bar<1>, bar<2>, bar<3>, bar<4>, bar<5>
};
return fns[n]();
}
int foo2(int n){
#if __clang__
__builtin_assume(n >= 0 && n <= 5);
#endif
switch(n){
case 0:
return bar<0>();
case 1:
return bar<1>();
case 2:
return bar<2>();
case 3:
return bar<3>();
case 4:
return bar<4>();
case 5:
return bar<5>();
default:
__builtin_unreachable();
}
}
}
int main(int argc, char** argv){
volatile int n = foo1(argc);
volatile int p = foo2(argc);
}
Using the language extension attribute always_inline
provided by GCC & Clang makes no difference either.
The compiler cannot inline the call in foo1 because the call does not use a compile-time constant callee. If it knows that a constant argument was passed to foo1 at compile time by inlining it, it will inline the correct function.
Consider this example:
namespace{
template<int N>
int bar(){
return N;
}
int foo1(int n){
if(n < 0 || n > 5){
__builtin_unreachable();
}
#if __clang__
__builtin_assume(n >= 0 && n <= 5);
#endif
static int (* const fns[])() = {
bar<0>, bar<1>, bar<2>, bar<3>, bar<4>, bar<5>
};
return fns[n]();
}
}
int main(int argc, char** argv){
int n = foo1(3);
return n;
}
It is compiled to the following code by both compilers:
main:
mov eax, 3
ret
In the case of foo2, the compiler starts out with 5 different calls with constant callees, all of which it inlines. Then it optimizes the resulting code further, generating its own jump table if it considers it profitable.
I guess the compiler could try to extract a switch from your jump table and then inline everything, but this would be quite complex and very unlikely to yield a performance improvement in the general case, so neither gcc nor clang seems to do this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With