Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need Help Cleaning Up Template Instantation Framework

I've been working on a framework to help with function template instantations. I have a bunch of functions, templated by integer value for optimization purposes, which need to be instantiated and selected at runtime. A usage example is the following:

// Function to instantiate templates of.
template<int a, int b, int c> void MyFunction(float, double){};

// List of values to substitute into each template parameter.
typedef mpl::vector_c< int, 7, 0, 3, 4, 2> valuesToInstantiate;
int numberOfValuesPerParameter = size<valuesToInstantiate>::type::value;

// Function pointer type. Must define type for array to hold template instantiations.
typedef void (*MyFunctionPointer)(float, double);

// Array to hold template instantiations.
// Accessed at runtime to get proper instantiation.
MyFunctionPointer arrayOfTemplateInstantiations[numberOfValuesPerParameter*numberOfValuesPerParameter*numberOfValuesPerParameter];

// Passed to template instantiation framework.
// AddTemplate member function will be called once per template value combo (3 int values).
// templateIndex indicates where to store the instantation in the array.
// templateSequence contains the template value combo (3 int values).
template<int templateIndex, typename templateSequence>
struct MyFunctionTemplateCreator
{
    static void AddTemplate(void)
    {
        // Store template instantiation in array.
        arrayOfTemplateInstantiations[templateIndex] = MyFunction
        <
        mpl::at<templateSequence, mpl::int_<0> >::type::value, 
        mpl::at<templateSequence, mpl::int_<1> >::type::value, 
        mpl::at<templateSequence, mpl::int_<2> >::type::value
        >;
    }
};

// List of lists where each inner list contains values to instantiate
// for the corresponding template parameter. E.g. each value in the first
// inner list will be passed into the first template parameter of MyFunction
typedef mpl::vector< valuesToInstantiate, valuesToInstantiate, valuesToInstantiate > templatesToCreate;

// Call template instantation framework to instantiate templates.
CreateTemplates<MyFunctionTemplateCreator, templatesToCreate> unusedVariable;

// Call proper template instantation at runtime...using index 5 arbitrarily for example.
arrayOfTemplateInstantiations[5](1.5, 2.0);

So in that example, I'm instantiating MyFunction, which takes 3 integer values, with every combination of { {7, 0, 3, 4, 2}, {7, 0, 3, 4, 2}, {7, 0, 3, 4, 2} }. I've omitted the implementation of CreateTemplates as it's quite long, but it's implemented using boost MPL for_each. The code above is required for every function I want to do this with, and while it's shorter than writing out 512 explicit instantiations, it's still a bit long.

Surprisingly, the longest code that has to be written for each function I want to do this with is the typedef of the function pointer, as many of the functions take 10+ arguments. Is there a way to store these template instantiations in an array of a more generic type by wrapping them somehow?

For the sake of argument, you can assume that the template parameters are always integer values like the example, such that the signatures of the template instantiations are all the same for a given function template. The functions being instantiated are all in global namespace, never member functions (they're actually CUDA kernels). Any other tips to clean this up would be appreciated.

Note: Using c++03

Edit: I wanted to address TarmoPikaro's question about what I'm trying to accomplish.

I'm working with an application where up to 4 tasks/threads will share a GPU to do their work (same work, different data). Since some of our CUDA kernels use textures, we need to dynamically hand out available textures at runtime. We are stuck supporting legacy CUDA compute capabilities, meaning texture objects can't be passed as function arguments and must be static global variables. To give out textures to CPU tasks/threads then, we give out texture indices and our CUDA kernels have statements like:

// (variables t_int_2d_N are texture objects)
if (maskTextureIndex == 0)
    maskValue = tex2D(t_int_2d_0, (float(p) + 0.5f)*maskScale.x + maskShift.x, (float(q) + 0.5f)*maskScale.y + maskShift.y)
else if (maskTextureIndex == 1)
    maskValue = tex2D(t_int_2d_1, (float(p) + 0.5f)*maskScale.x + maskShift.x, (float(q) + 0.5f)*maskScale.y + maskShift.y)
else if (maskTextureIndex == 2)
    maskValue = tex2D(t_int_2d_2, (float(p) + 0.5f)*maskScale.x + maskShift.x, (float(q) + 0.5f)*maskScale.y + maskShift.y)
else if (maskTextureIndex == 3)
    maskValue = tex2D(t_int_2d_3, (float(p) + 0.5f)*maskScale.x + maskShift.x, (float(q) + 0.5f)*maskScale.y + maskShift.y)
else if (maskTextureIndex == 4)
    maskValue = tex2D(t_int_2d_4, (float(p) + 0.5f)*maskScale.x + maskShift.x, (float(q) + 0.5f)*maskScale.y + maskShift.y)
else if (maskTextureIndex == 5)
    maskValue = tex2D(t_int_2d_5, (float(p) + 0.5f)*maskScale.x + maskShift.x, (float(q) + 0.5f)*maskScale.y + maskShift.y)
else if (maskTextureIndex == 6)
    maskValue = tex2D(t_int_2d_6, (float(p) + 0.5f)*maskScale.x + maskShift.x, (float(q) + 0.5f)*maskScale.y + maskShift.y)
else if (maskTextureIndex == 7)
    maskValue = tex2D(t_int_2d_7, (float(p) + 0.5f)*maskScale.x + maskShift.x, (float(q) + 0.5f)*maskScale.y + maskShift.y)

Having that statement in a loop in a kernel is an unacceptable performance loss. To avoid the performance loss, we template the kernel by integer value (representing the texture index) such that the above conditional statement is compiled out. The kernel that contains the above code would be instantiated with maskTextureIndex equal to 0-7, so we have 8 different kernels to select from at runtime. Some of our kernels use up to 3 textures, and we allow each texture type (e.g. float 1D, float 2D, float2 2D, int 3D, etc) to have indices 0-7, meaning we have to instantiate 8*8*8=512 different kernels to compile out 3 different conditional statements like the one above. The code in my original question is used, per kernel that uses textures, to help instantiate all the combinations.

like image 733
user1777820 Avatar asked Mar 27 '16 23:03

user1777820


1 Answers

With C++03, I have not been able to find a way to avoid writing the function typedef or a way to make it smaller. With C++11 and decltype you could typedef it like this (assuming you don't have any templates with type parameters):

typedef decltype(&MyFunction<0, 0, 0>) MyFunctionPointer;

On the other hand, you can make some of the code you copy around for every function you instantiate unnecessary. In your example, you have declared a struct MyFunctionTemplateCreator. This struct can be changed so that it only needs a much smaller struct to provide the value of the function pointer for that instantiation. Here is the more generic version of the struct:

template<
    typename Arg,
    template <Arg, Arg, Arg> class TemplateClass,
    typename Func,
    Func* instantiationArray>
struct FunctionTemplateCreator
{
    template<
        int templateIndex,
        typename templateSequence>
    struct InnerStruct
    {
        static void AddTemplate(void)
        {
            instantiationArray[templateIndex] = TemplateClass
                <
                mpl::at<templateSequence, mpl::int_<0> >::type::value,
                mpl::at<templateSequence, mpl::int_<1> >::type::value,
                mpl::at<templateSequence, mpl::int_<2> >::type::value
                >::function();
        }
    };
};

You only have to declare this struct once and put it in a header somewhere. It will work for every function that has three identical type parameters. Here is how you would use this struct for the function in your example. First you declare all the mpl::vector types that are used to provide the values to instantiate the template overloads. Then you create a struct that provides a function() method which returns the function pointer of the overload. Here is one defined for your example function:

template<int a, int b, int c>
struct MyFunctionTypedef
{
    static MyFunctionPointer function()
    {
        return &MyFunction<a, b, c>;
    }
};

The InnerStruct of FunctionTemplateCreator is what actually is passed into CreateTemplates. FunctionTemplateCreator only serves to forward the template parameters to the inner struct. Here is what the CreateTemplates variable would look like with these new types:

CreateTemplates<FunctionTemplateCreator<int, MyFunctionTypedef, MyFunctionPointer, arrayOfTemplateInstantiations>::InnerStruct, templatesToCreate> unusedVariable;

If you begin using C++11 the function() method in MyFunctionTypedef could be made constexpr.

like image 163
phantom Avatar answered Oct 13 '22 12:10

phantom