Imagine I have M methods that I want to time, along with N timing methods (let's call them clock implementations)1. The exact details aren't too important here, but I'm mentioning it so I can give a concrete example.
Now let's say I have a templatized timing method like so:
typedef void (bench_f)(uint64_t);
template <bench_f METHOD, typename CLOCK>
uint64_t time_method(size_t loop_count) {
auto t0 = CLOCK::now();
METHOD(loop_count);
auto t1 = CLOCK::now();
return t1 - t0;
}
Basically it brackets the call to METHOD
with calls to CLOCK::now()
and returns the difference. Note also that METHOD
is not passed as a function pointer, but rather only as a template argument - so you get unique instantiations for each method, rather than one and then an indirect call through a pointer.
This works well for my case because both of the clock calls and the method under test are direct static calls (i.e., something like call <function address>
at the assembly level).
Now I have N methods I want to test (perhaps 50) along with M clock methods (perhaps 5). I want to actually instantiate, at compile time, all M * N methods, so that I can call all the test methods with a specific clock implementation.
Now the "standard" way to do this would just to pass a function pointer (or class implementing a virtual function) for both the method under test and the clock implementation, at which point I'd only need a single time_method
method and can create whatever combination I want at runtime. In this particular case, the performance impacts of the indirect calls are too much, so I want template instantiation and I am willing to pay the resulting binary bloat (e.g., M * N = 250 instantiated combinations with my numbers).
At runtime I want to get a list of N method combined with a particular clock, for example.
I am fine explicitly listing all N method and all M clocks, but I don't want to write out the M * N instantiations (DRY and all that).
1 I'm using the word clock pretty loosely here - some of the "clocks" may in fact measure aspects unrelated to time, such as heap memory use, or some application specific metric.
template<bench_f* ...> struct method_list {};
template<class...> struct clock_list {};
using time_method_t = uint64_t (*)(size_t);
template<bench_f Method, class...Clocks>
constexpr auto make_single_method_table()
-> std::array<time_method_t, sizeof...(Clocks)> {
return { time_method<Method, Clocks>... };
}
template<bench_f*... Methods, class... Clocks>
constexpr auto make_method_table(method_list<Methods...>, clock_list<Clocks...>)
-> std::array<std::array<time_method_t, sizeof...(Clocks)>, sizeof...(Methods)> {
return { make_single_method_table<Methods, Clocks...>()... };
}
To make the code you must write linear in the sum of the number of options instead of their product, write template functions that remove one layer of option at a time.
e.g.
typedef uint64_t (*benchmark_runner)(size_t loop_count);
benchmark_runner all_runners[NMETHODS][NCLOCKS];
template <bench_f METHOD>
void fill_row(size_t bench_f_index)
{
benchmark_runner* it = &all_runners[bench_f_index][0];
*(it++) = &time_method<METHOD, FIRST_CLOCK>;
*(it++) = &time_method<METHOD, SECOND_CLOCK>;
*(it++) = &time_method<METHOD, THIRD_CLOCK>;
*(it++) = &time_method<METHOD, LAST_CLOCK>;
}
void fill_all()
{
int row = 0;
fill_row<BENCH_A>(row++);
fill_row<BENCH_B>(row++);
...
fill_row<BENCH_Z>(row++);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With