CUDA C++ Templating of Kernel Parameter

Question

I'm trying to templatize a CUDA kernel based on a boolean variable (as shown here: Should I unify two similar kernels with an 'if' statement, risking performance loss?), but I keep getting a compiler error that says my function is not a template. I think that I'm just missing something obvious so it's pretty frustrating.

The following does NOT work:

util.cuh

#include "kernels.cuh"
//Utility functions

kernels.cuh

    #ifndef KERNELS
    #define KERNELS
    template<bool approx>
    __global__ void kernel(...params...);
    #endif

kernels.cu

template<bool approx>
__global__ void kernel(...params...)
{
    if(approx)
    {
       //Approximate calculation
    }
    else
    {
      //Exact calculation
    }
}

template __global__ void kernel<false>(...params...); //Error occurs here

main.cu

#include "kernels.cuh"
kernel<false><<<dimGrid,dimBlock>>>(...params...);

The following DOES work:

util.cuh

#include "kernels.cuh"
//Utility functions

kernels.cuh

#ifndef KERNELS
#define KERNELS
template<bool approx>
__global__ void kernel(...params...);
template<bool approx>
__global__ void kernel(...params...)
{
    if(approx)
    {
       //Approximate calculation
    }
    else
    {
      //Exact calculation
    }
}
#endif

main.cu

#include "kernels.cuh"
kernel<false><<<dimGrid,dimBlock>>>(...params...);

If I throw in the

template __global__ void kernel<false>(...params...);

line at the end of kernels.cuh it also works.

I get the following errors (both referring to the marked line above):

kernel is not a template
invalid explicit instantiation declaration

If it makes a difference I compile all of my .cu files in one line, like:

nvcc -O3 -arch=sm_21 -I. main.cu kernels.cu -o program

Vitality · Accepted Answer

All explicit specialization declarations must be visible at the time of the template instantiation. Your explicit specialization declaration is visible only in the kernels.cu translation unit, but not in main.cu.

The following code is indeed working correctly (apart from adding a __global__ qualifier at the explicit instantiation instruction).

#include<cuda.h>
#include<cuda_runtime.h>
#include<stdio.h>
#include<conio.h>

template<bool approx>
__global__ void kernel()
{
    if(approx)
    {
        printf("True branch
");
    }
    else
    {
        printf("False branch
");
    }
}

template __global__ void kernel<false>();

int main(void) {
    kernel<false><<<1,1>>>();
    getch();
    return 0;
}

EDIT

In C++, templated functions are not compiled until an explicit instantiation of the function is encountered. From this point of view, CUDA, which now fully supports templates, behaves exactly the same way as C++.

To make a concrete example, when the compiler finds something like

template<class T>
__global__ void kernel(...params...)
{
    ...
    T a;
    ...
}

it just checks the function syntax, but produces no object code. So, if you would compile a file with a single templated function as above, you will have an "empty" object file. This is reasonable, since the compiler would not know which type assigning to a.

The compiler produces an object code only when it encounters an explicit instantiation of the function template. This is, at that moment, how compilation of templated functions work and this behavior introduces a restriction for multiple-file projects: the implementation (definition) of a templated function must be in the same file as its declaration. So, you cannot separate the interface contained in kernels.cuh in a header file separated from kernels.cu, which is the main reason why the first version of your code does not compile. Accordingly, you must include both interface and implementation in any file that uses the templates, namely, you must include in main.cu both, kernels.cuh and kernels.cu.

Since no code is generated without an explicit instantiation, compilers tolerate the inclusion more than once of the same template file with both declarations and definitions in a project without generating linkage errors.

There are several tutorials on using templates in C++. An Idiot's Guide to C++ Templates - Part 1, apart from the irritating title, will provide you with a step-by-step introduction to the topic.

CUDA C++ Templating of Kernel Parameter

Tags:

c++

templates

cuda

Adam27X

1 Answers

Vitality

Recent Activity

Donate For Us

CUDA C++ Templating of Kernel Parameter

Tags:

c++

templates

cuda

Adam27X

1 Answers

Vitality

Related questions

Recent Activity

Donate For Us