Instantiating a template function in all M * N combinations

Imagine I have M methods that I want to time, along with N timing methods (let's call them clock implementations)1. The exact details aren't too important here, but I'm mentioning it so I can give a concrete example.

Now let's say I have a templatized timing method like so:

typedef void (bench_f)(uint64_t);

template <bench_f METHOD, typename CLOCK>
uint64_t time_method(size_t loop_count) {
  auto t0 = CLOCK::now();
  auto t1 = CLOCK::now();
  return t1 - t0;

Basically it brackets the call to METHOD with calls to CLOCK::now() and returns the difference. Note also that METHOD is not passed as a function pointer, but rather only as a template argument - so you get unique instantiations for each method, rather than one and then an indirect call through a pointer.

This works well for my case because both of the clock calls and the method under test are direct static calls (i.e., something like call <function address> at the assembly level).

Now I have N methods I want to test (perhaps 50) along with M clock methods (perhaps 5). I want to actually instantiate, at compile time, all M * N methods, so that I can call all the test methods with a specific clock implementation.

Now the "standard" way to do this would just to pass a function pointer (or class implementing a virtual function) for both the method under test and the clock implementation, at which point I'd only need a single time_method method and can create whatever combination I want at runtime. In this particular case, the performance impacts of the indirect calls are too much, so I want template instantiation and I am willing to pay the resulting binary bloat (e.g., M * N = 250 instantiated combinations with my numbers).

At runtime I want to get a list of N method combined with a particular clock, for example.

I am fine explicitly listing all N method and all M clocks, but I don't want to write out the M * N instantiations (DRY and all that).

1 I'm using the word clock pretty loosely here - some of the "clocks" may in fact measure aspects unrelated to time, such as heap memory use, or some application specific metric.

2 Answers

template<bench_f* ...> struct method_list {};
template<class...> struct clock_list {};

using time_method_t = uint64_t (*)(size_t);

template<bench_f Method, class...Clocks>
constexpr auto make_single_method_table()
    -> std::array<time_method_t, sizeof...(Clocks)> {
    return { time_method<Method, Clocks>... };

template<bench_f*... Methods, class... Clocks>
constexpr auto make_method_table(method_list<Methods...>, clock_list<Clocks...>)
    -> std::array<std::array<time_method_t, sizeof...(Clocks)>, sizeof...(Methods)> {
    return { make_single_method_table<Methods, Clocks...>()... };
To make the code you must write linear in the sum of the number of options instead of their product, write template functions that remove one layer of option at a time.


typedef uint64_t (*benchmark_runner)(size_t loop_count);

benchmark_runner all_runners[NMETHODS][NCLOCKS];

template <bench_f METHOD>
void fill_row(size_t bench_f_index)
    benchmark_runner* it = &all_runners[bench_f_index][0];
    *(it++) = &time_method<METHOD, FIRST_CLOCK>;
    *(it++) = &time_method<METHOD, SECOND_CLOCK>;
    *(it++) = &time_method<METHOD, THIRD_CLOCK>;
    *(it++) = &time_method<METHOD, LAST_CLOCK>;

void fill_all()
    int row = 0;
