Consider a function object F
taking a constexpr size_t
argument I
struct F
{
template <size_t I>
constexpr size_t operator()(size <I>) const { return I; }
};
wrapped within a type size <I>
, where (for brevity)
template <size_t N>
using size = std::integral_constant <size_t, N>;
Of course, we could pass I
directly but I want to emphasize that it is constexpr by using it as a template argument. Function F
is dummy here but in reality it could do a variety of useful stuff like retrieving information from the I
th element of a tuple. F
is assumed to have the same return type regardless of I
. I
could be of any integral type but is assumed non-negative.
Problem
Given a constexpr size_t
value I
, we can call F
by
F()(size <I>());
Now, what if we want to call F
with a non-constepr size_t
value i
? Consider the following:
constexpr size_t L = 10;
idx <F, L> f;
for (size_t i = 0; i < L; ++i)
cout << f(i) << " ";
(Why would I need this? To give some context, I am in fact trying to build a composite iterator into a container view that represents a sequence of "joined" (concatenated) heterogeneous containers. This would give the ability to say something like join(a, b) = c;
where arrays join(a, b)
and c
are of equal length. However, i
is iterator state so cannot be constexpr
, yet sub-iterators are stored in a tuple and need to be accessed by a constexpr
index. Individual value_type
's are roughly consistent so that the joined view can take on their common_type
type, but sub-containers and consequently sub-iterators are of different types.)
Solution
Here, I have come up with struct idx <F, L>
, which adapts function F
for this purpose, assuming the input argument is less than L
. This actually compiles fine giving output
0 1 2 3 4 5 6 7 8 9
and here is a live example.
idx
works by recursively decomposing input i
into a binary representation and reconstructing a constexpr counterpart N
:
template <typename F, size_t L, size_t N = 0, bool = (N < L)>
struct idx
{
template <size_t R = 1>
inline constexpr decltype(F()(size <0>()))
operator()(size_t I, size <R> = size <R>()) const
{
return I%2 ? idx <F, L, N+R>()(--I/2, size <2*R>()) :
I ? idx <F, L, N >()( I/2, size <2*R>()) :
F()(size <N>());
}
};
where R
represents a power of 2
at the current iteration. To avoid infinite template instantiation, a specialization is given for N >= L
, returning F()(size <0>())
as a dummy value:
template <typename F, size_t L, size_t N>
struct idx <F, L, N, false>
{
template <size_t R>
inline constexpr decltype(F()(size <0>()))
operator()(size_t I, size <R>) const { return F()(size <0>()); }
};
In fact, this method is a generalization of the more common idiom with a Boolean argument:
bool b = true;
b ? f <true>() : f <false>();
where f
is a function taking a bool
as a template argument. In this case it is evident that all two possible versions of f
are instantiated.
Question
Although this works and its run-time complexity is presumably logarithmic in i
, I am concerned with the compile-time implications, like:
how many combinations of idx
and its template operator()
are instantiated for this to work at run time for any input i
that is not known at compile time? (I understand "all possible" again, but how many?)
is it really possible to inline operator()
?
is there any alternative strategy or variant that is "easier" to compile?
should I forget about this idea as an instance of pure code bloat?
Notes
Here are the compile times (in seconds) and executable sizes (in KB) I have measured for different values of L
:
L Clang(s) GCC(s) Clang(KB) GCC(KB)
10 1.3 1.7 33 36
20 2.1 3.0 48 65
40 3.7 5.8 87 120
80 7.0 11.2 160 222
160 13.9 23.4 306 430
320 28.1 47.8 578 850
640 56.5 103.0 1126 1753
So, although it appears roughly linear in L
, it is quite long and frustratingly large.
Attempting to force operator()
inline fails: probably ignored by Clang (executable even larger), while GCC reports recursive inlining
.
Running nm -C
on the executable e.g. for L = 160
, shows 511/1253
different versions of operator()
(with Clang/GCC). These are all for N < L
so it appears the terminating specialization N >= L
does get inlined.
PS. I would add tag code-bloat
but the system won't let me.
I call this technique the magic switch.
The most efficient way I know of to do this is to build your own jump table.
// first, index list boilerplate. Does log-depth creation as well
// needed for >1000 magic switches:
template<unsigned...Is> struct indexes {typedef indexes<Is...> type;};
template<class lhs, class rhs> struct concat_indexes;
template<unsigned...Is, unsigned...Ys> struct concat_indexes<indexes<Is...>, indexes<Ys...>>{
typedef indexes<Is...,Ys...> type;
};
template<class lhs, class rhs>
using ConcatIndexes = typename concat_indexes<lhs, rhs>::type;
template<unsigned min, unsigned count> struct make_indexes:
ConcatIndexes<
typename make_indexes<min, count/2>::type,
typename make_indexes<min+count/2, (count+1)/2>::type
>
{};
template<unsigned min> struct make_indexes<min, 0>:
indexes<>
{};
template<unsigned min> struct make_indexes<min, 1>:
indexes<min>
{};
template<unsigned max, unsigned min=0>
using MakeIndexes = typename make_indexes<min, max-min>::type;
// This class exists simply because [](blah){code}... `...` expansion
// support is lacking in many compilers:
template< typename L, typename R, unsigned I >
struct helper_helper {
static R func( L&& l ) { return std::forward<L>(l)(size<I>()); }
};
// the workhorse. Creates an "manual" jump table, then invokes it:
template<typename L, unsigned... Is>
auto
dispatch_helper(indexes<Is...>, L&& l, unsigned i)
-> decltype( std::declval<L>()(size<0>()) )
{
// R is return type:
typedef decltype( std::declval<L>()(size<0>()) ) R;
// F is the function pointer type for our jump table:
typedef R(*F)(L&&);
// the jump table:
static const F table[] = {
helper_helper<L, R, Is>::func...
};
// invoke at the jump spot:
return table[i]( std::forward<L>(l) );
};
// create an index pack for each index, and invoke the helper to
// create the jump table:
template<unsigned Max, typename L>
auto
dispatch(L&& l, unsigned i)
-> decltype( std::declval<L>()(size<0>()) )
{
return dispatch_helper( MakeIndexes<Max>(), std::forward<L>(l), i );
};
which requires some static setup, but is pretty fast when run.
An assert that i
is in bounds may also be useful.
live example
If your solution have cap on maximum possible value (say 256) you can use macro magic and switch statement to simplify it:
#define POS(i) case (i): return F<(i)>(); break;
#define POS_4(i) POS(i + 0) POS(i + 1) POS(i + 2) POS(i + 3)
#define POS_16(i) POS_4(i + 0) POS_4(i + 4) POS_4(i + 8) POS_4(i + 12)
int func(int i)
{
switch(i)
{
POS_16(0)
}
}
Another possible solution is (with C++11) use variadic templates:
template<int I>
struct FF
{
static int f() { return I; }
};
template<typename... I>
int func(int i)
{
constexpr int (*Func[])() = { I::f... };
return Func[i]();
}
int main(int argc, char** argv)
{
func<FF<0>,FF<1>>(1);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With