Instantiating a template function in all M * N combinations

Question

Imagine I have M methods that I want to time, along with N timing methods (let's call them clock implementations)¹. The exact details aren't too important here, but I'm mentioning it so I can give a concrete example.

Now let's say I have a templatized timing method like so:

typedef void (bench_f)(uint64_t);

template <bench_f METHOD, typename CLOCK>
uint64_t time_method(size_t loop_count) {
  auto t0 = CLOCK::now();
  METHOD(loop_count);
  auto t1 = CLOCK::now();
  return t1 - t0;
}

Basically it brackets the call to METHOD with calls to CLOCK::now() and returns the difference. Note also that METHOD is not passed as a function pointer, but rather only as a template argument - so you get unique instantiations for each method, rather than one and then an indirect call through a pointer.

This works well for my case because both of the clock calls and the method under test are direct static calls (i.e., something like call <function address> at the assembly level).

Now I have N methods I want to test (perhaps 50) along with M clock methods (perhaps 5). I want to actually instantiate, at compile time, all M * N methods, so that I can call all the test methods with a specific clock implementation.

Now the "standard" way to do this would just to pass a function pointer (or class implementing a virtual function) for both the method under test and the clock implementation, at which point I'd only need a single time_method method and can create whatever combination I want at runtime. In this particular case, the performance impacts of the indirect calls are too much, so I want template instantiation and I am willing to pay the resulting binary bloat (e.g., M * N = 250 instantiated combinations with my numbers).

At runtime I want to get a list of N method combined with a particular clock, for example.

I am fine explicitly listing all N method and all M clocks, but I don't want to write out the M * N instantiations (DRY and all that).

¹ I'm using the word clock pretty loosely here - some of the "clocks" may in fact measure aspects unrelated to time, such as heap memory use, or some application specific metric.

T.C. · Accepted Answer

template<bench_f* ...> struct method_list {};
template<class...> struct clock_list {};

using time_method_t = uint64_t (*)(size_t);

template<bench_f Method, class...Clocks>
constexpr auto make_single_method_table()
    -> std::array<time_method_t, sizeof...(Clocks)> {
    return { time_method<Method, Clocks>... };
}

template<bench_f*... Methods, class... Clocks>
constexpr auto make_method_table(method_list<Methods...>, clock_list<Clocks...>)
    -> std::array<std::array<time_method_t, sizeof...(Clocks)>, sizeof...(Methods)> {
    return { make_single_method_table<Methods, Clocks...>()... };
}

Ben Voigt · Answer

To make the code you must write linear in the sum of the number of options instead of their product, write template functions that remove one layer of option at a time.

e.g.

typedef uint64_t (*benchmark_runner)(size_t loop_count);

benchmark_runner all_runners[NMETHODS][NCLOCKS];

template <bench_f METHOD>
void fill_row(size_t bench_f_index)
{
    benchmark_runner* it = &all_runners[bench_f_index][0];
    *(it++) = &time_method<METHOD, FIRST_CLOCK>;
    *(it++) = &time_method<METHOD, SECOND_CLOCK>;
    *(it++) = &time_method<METHOD, THIRD_CLOCK>;
    *(it++) = &time_method<METHOD, LAST_CLOCK>;
}

void fill_all()
{
    int row = 0;
    fill_row<BENCH_A>(row++);
    fill_row<BENCH_B>(row++);
    ...
    fill_row<BENCH_Z>(row++);
}

Instantiating a template function in all M * N combinations

Tags:

c++

performance

templates

BeeOnRope

2 Answers

T.C.

Ben Voigt

Recent Activity

Donate For Us

Instantiating a template function in all M * N combinations

Tags:

c++

performance

templates

BeeOnRope

2 Answers

T.C.

Ben Voigt

Related questions

Recent Activity

Donate For Us