Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compel a compiler to generate a code equivalent for a manual switch?

The task is simple: we have a type list (i.e. using list = std::tuple<A, B, C, ...>;) and we want to dispatch its contents based on an index known at runtime. By "dispatch" I mean to call a templated handler, parameterized by a type from this list. It's easy to write a switch manually:

template <class... Args>
auto dispatch(size_t i, Args&& ...args)
{
  switch (i) {
    case 0: return handler<typename std::tuple_element<0, list>::type>(std::forward<Args>(args)...);
    ...
  }
}

But it's tedious, error prone and boring. I'm looking for a way to compel a compiler to do it from some more compact and terse definition.

And I want to achieve the result identical or very close to the manual switch. So, generating a table of function pointers (like here) is not a desired solution (I don't want to introduce a function call for cases where handlers are perfectly inlinable).

So far, I've found two ways:

  1. Inlinable recursion (thanks to Horstling)

    template <size_t N>
    struct dispatch_helper
    {
        template <class... Args>
        static R dispatch(size_t i, Args&& ...args)
        {
            if (N == i)
                return handler<typename std::tuple_element<N, list>::type>(std::forward<Args>(args)...);
            return dispatch_helper<N+1>::dispatch(i, std::forward<Args>(args)...);
        }
    };
    
    template <>
    struct dispatch_helper<200>
    {
        template <class... Args>
        static R dispatch(size_t type, Args&& ...args)
        {
            return {};
        }
    };
    
    template <class... Args>
    auto dispatch_recursion(size_t i, Args&& ...args)
    {
        return dispatch_helper<0>::dispatch(i, std::forward<Args>(args)...);
    }
    
  2. Usage of std::initializer_list

    template <class L>
    struct Iterator;
    
    template <template <class...> class L, class... T>
    struct Iterator<L<T...>>
    {
        template <class... Args>
        static R dispatch(size_t i, Args&& ...args)
        {
            R res;
            std::initializer_list<int> l {
                (
                    i == T::value ? (res = handler<typename T::type>(std::forward<Args>(args)...), 0) : 0
                )...
            };
            (void)l;
            return res;
        }
    };
    
    template <class... Args>
    auto dispatch_type_list_iter(size_t i, Args&& ...args)
    {
        return Iterator<index_list<list>>::dispatch(i, std::forward<Args>(args)...);
    }
    

    I'm omitting some details here, like converting a type list into an indexed type list or how to deal with return type, but the idea is clear, I hope.

I tried those on godbolt and Clang 4.0 generates a "logarithmic" switch (a tree of small switches) for the first approach and a code, identical to the manual switch for the second approach. I played with handler's size, inlinable or not inlinable handler and results look stable.

But GCC and ICC both generate a simple sequence of conditionals (for both approaches) which is very sad.

So, are there other solutions? Especially those working at least in Clang and GCC.

like image 377
Alexander Morozov Avatar asked Aug 25 '17 06:08

Alexander Morozov


2 Answers

You can do the binary search yourself:

#include <tuple>
#include <iostream>

template <class T>
void function(T arg) {
    std::cout << arg << std::endl;
}

If you are able to use c++14 and can use auto for return type deduction and auto for generic lambda expressions, you can go even further:

#include <tuple>
#include <iostream>
#include <string>

template <class Function, size_t begin, size_t end, class Tuple>
struct  call_on_element;

template <class Function, size_t begin, class Tuple>
struct call_on_element<Function, begin, begin, Tuple> {
    static auto call(Function f, size_t, const Tuple& t) {
        return f(std::get<begin>(t));
    }
};

template <class Function, size_t begin, size_t end, class Tuple>
struct call_on_element {
    static auto call(Function f, size_t i, const Tuple& t) {
        constexpr size_t half = begin + (end - begin) / 2;
        if(i <= half) {
            return call_on_element<Function, begin, half, Tuple>::call(f, i, t);
        } else {
            return call_on_element<Function, half + 1, end, Tuple>::call(f, i, t);
        }
    }
};

template <class Function, class... Args>
auto dispatch(Function f, size_t i, Args&&... args) {
    const auto tuple = std::make_tuple(std::forward<decltype(args)>(args)...);
    return call_on_element<Function, 0, sizeof...(Args) - 1, decltype(tuple)>::call(f, i , tuple);
}


struct Functor {
    template <class T>
    std::string operator()(T arg) {
        std::cout << arg << std::endl;
        return "executed functor";
    }
};

int main() {

    int i = 42;
    char c = 'y';
    float f = 1337;

    std::cout << "using Functor" << std::endl;
    dispatch(Functor(), 0, i, c, f);
    dispatch(Functor(), 1, i, c, f);
    auto s = dispatch(Functor(), 2, i, c, f);

    std::cout << "return value of dispatch = " << s << std::endl;

    std::cout << "using lambda" << std::endl;
    dispatch([](auto arg) { std::cout << arg << std::endl;}, 0, i, c, f);
    dispatch([](auto arg) { std::cout << arg << std::endl;}, 1, i, c, f);
    dispatch([](auto arg) { std::cout << arg << std::endl;}, 2, i, c, f);

};

I guess this is harder to inline for the compiler, but much more compfortable to use and also it is not impossible.


To check for inlining i modified the c++14 version to this:

int main() {
    dispatch(1, 42, 'c', 3.14);
};

// compiled with 'g++ -g -O3 -std=c++14 -S main.cpp' (using gcc7.1.1)
// generated objectdump with objdump -d a.out | c++filt

Objdump delivers the following output:

00000000000009a0 <main>:
 9a0:   55                      push   %rbp
 9a1:   53                      push   %rbx
 9a2:   48 8d 3d d7 16 20 00    lea    0x2016d7(%rip),%rdi        # 202080 <std::cout@@GLIBCXX_3.4>
 9a9:   ba 01 00 00 00          mov    $0x1,%edx
 9ae:   48 83 ec 18             sub    $0x18,%rsp
 9b2:   48 8d 74 24 07          lea    0x7(%rsp),%rsi
 9b7:   c6 44 24 07 63          movb   $0x63,0x7(%rsp)
 9bc:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
 9c3:   00 00 
 9c5:   48 89 44 24 08          mov    %rax,0x8(%rsp)
 9ca:   31 c0                   xor    %eax,%eax
 9cc:   e8 7f ff ff ff          callq  950 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@plt>
 9d1:   48 89 c5                mov    %rax,%rbp
 9d4:   48 8b 00                mov    (%rax),%rax
 9d7:   48 8b 40 e8             mov    -0x18(%rax),%rax
 9db:   48 8b 9c 05 f0 00 00    mov    0xf0(%rbp,%rax,1),%rbx
 9e2:   00 
 9e3:   48 85 db                test   %rbx,%rbx
 9e6:   74 67                   je     a4f <main+0xaf>
 9e8:   80 7b 38 00             cmpb   $0x0,0x38(%rbx)
 9ec:   74 2d                   je     a1b <main+0x7b>
 9ee:   0f be 73 43             movsbl 0x43(%rbx),%esi
 9f2:   48 89 ef                mov    %rbp,%rdi
 9f5:   e8 86 ff ff ff          callq  980 <std::basic_ostream<char, std::char_traits<char> >::put(char)@plt>
 9fa:   48 89 c7                mov    %rax,%rdi
 9fd:   e8 5e ff ff ff          callq  960 <std::basic_ostream<char, std::char_traits<char> >::flush()@plt>
 a02:   31 c0                   xor    %eax,%eax
 a04:   48 8b 4c 24 08          mov    0x8(%rsp),%rcx
 a09:   64 48 33 0c 25 28 00    xor    %fs:0x28,%rcx
 a10:   00 00 
 a12:   75 36                   jne    a4a <main+0xaa>
 a14:   48 83 c4 18             add    $0x18,%rsp
 a18:   5b                      pop    %rbx
 a19:   5d                      pop    %rbp
 a1a:   c3                      retq   
 a1b:   48 89 df                mov    %rbx,%rdi
 a1e:   e8 fd fe ff ff          callq  920 <std::ctype<char>::_M_widen_init() const@plt>
 a23:   48 8b 03                mov    (%rbx),%rax
 a26:   48 8d 0d 73 01 00 00    lea    0x173(%rip),%rcx        # ba0 <std::ctype<char>::do_widen(char) const>
 a2d:   be 0a 00 00 00          mov    $0xa,%esi
 a32:   48 8b 50 30             mov    0x30(%rax),%rdx
 a36:   48 39 ca                cmp    %rcx,%rdx
 a39:   74 b7                   je     9f2 <main+0x52>
 a3b:   be 0a 00 00 00          mov    $0xa,%esi
 a40:   48 89 df                mov    %rbx,%rdi
 a43:   ff d2                   callq  *%rdx
 a45:   0f be f0                movsbl %al,%esi
 a48:   eb a8                   jmp    9f2 <main+0x52>
 a4a:   e8 21 ff ff ff          callq  970 <__stack_chk_fail@plt>
 a4f:   e8 bc fe ff ff          callq  910 <std::__throw_bad_cast()@plt>
 a54:   66 90                   xchg   %ax,%ax
 a56:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 a5d:   00 00 00 

template <size_t begin, size_t end, class Tuple>
struct  call_on_element;

template <size_t begin, class Tuple>
struct call_on_element<begin, begin, Tuple> {
    static void call(size_t, const Tuple& t) {
        function(std::get<begin>(t));
    }
};

template <size_t begin, size_t end, class Tuple>
struct call_on_element {
    static void call(size_t i, const Tuple& t) {
        constexpr size_t half = begin + (end - begin) / 2;
        if(i <= half) {
            call_on_element<begin, half, Tuple>::call(i, t);
        } else {
            call_on_element<half+1, end, Tuple>::call(i, t);
        }
    }
};

template <class... Args>
void dispatch(size_t i, Args&&... args) {
    const auto tuple = std::make_tuple(std::forward<decltype(args)>(args)...);
    call_on_element<0, sizeof...(Args)-1, decltype(tuple)>::call(i , tuple);
}

int main() {

    int i = 42;
    char c = 'y';
    float f = 1337;

    dispatch(0, i, c, f);
    dispatch(1, i, c, f);
    dispatch(2, i, c, f);

};

The calls to callq std::basic_ostream<...> and je ... make me believe, that it actually inlines the function and does the logarithmic condition tree.

like image 129
OutOfBound Avatar answered Nov 06 '22 10:11

OutOfBound


For keeping a simple point of declaration, I am a huge fan of Boost.Preprocessor. I agree these are macros and you may despise them but why not use them when they are actually improving code readability and maintenance ?

#include <boost/preprocessor.hpp>

#include <iostream>
//the following two #include are only for handler() code
#include <typeinfo>
#include <tuple>

template <class T>
void handler(int x) {
  std::cout << typeid(T).name() << ' ' << x << '\n';
}

#define TYPE_LIST (int)(char)(bool) //just declare your types here

using list = std::tuple<BOOST_PP_SEQ_ENUM(TYPE_LIST)>;

template <class... Args>
auto dispatch(size_t i, Args&& ...args)
{
  switch (i) {
#define TYPE_case(r, data, i, type) case i: return handler<type>(std::forward<Args>(args)...);
    BOOST_PP_SEQ_FOR_EACH_I(TYPE_case,, TYPE_LIST);
#undef TYPE_case
  }
}

int main(int, char**) {
  for (int i = 0; i < std::tuple_size<list>::value; ++i) {
    dispatch(i, i);
  }
}
like image 20
Boris L. Avatar answered Nov 06 '22 10:11

Boris L.