Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA: Why is it not possible to define static global member functions?

When compiling the code below using nvcc (CUDA 5.0), the error "illegal combination of memory qualifiers" appears, as it apparently is impossible to have global kernels in a class.

class A
{
public:
    __global__ static void kernel();
};

__global__ void A::kernel()
{}

I can understand this restriction when dealing with non-static members, but why does the error still occur when the kernel is declared static? The calling of such members would be no different from calling the function when it is declared in a namespace (A in this case).

A::kernel <<< 1, 1 >>> ();

Is there a reason I'm missing as to why this hasn't been implemented (yet)?

EDIT: Based on the responses in both the answers and comments, I haven't been clear enough on my question. My question is not why an error appears. Obviously, this is because it hasn't been implemented. My question is why it hasn't been implemented. So far, I haven't been able to think of a reason that keeps this feature from being implemented. I realize that I might have forgotten about a special case which would complicate matters, hence the question.

The reasons I believe this to be a reasonable feature are:

  • A static function doesn't have a this pointer So even if the kernel is called on an object that lives on the host, there is no conflict in accessing its data, as this data is inaccessible in the first place (data from what object??).
  • You could argue that if the class has static data associated with it, living on the host, this should in principle be accessible from the static kernel. However, static data isn't supported either, so again no conflict.
  • Calling a static kernel on an object on the host (A a; a.staticKernel<<<...,...>>>();) would be entirely equivalent to calling it without the object at all (A::staticKernel<<<...,...>>>();), as we are used to in regular C++.

What am I missing?

like image 414
JorenHeit Avatar asked Aug 31 '13 19:08

JorenHeit


1 Answers

Fortunately, about 4 years after this question has been asked, clang 4.0 can compile the CUDA language. Consider this example:

class A
{
public:
    __global__ static void kernel();
};

__device__ void A::kernel()
{}

int main()
{
    A::kernel <<< 1, 1 >>> ();
};

When I try to compile it with clang 4.0, I get the following error:

test.cu:7:1: error: kernel function 'kernel' must be a free function or static member function
__global__ void A::kernel()
^
/usr/local/cuda/include/host_defines.h:191:9: note: expanded from macro '__global__'
        __location__(global)
        ^
/usr/local/cuda/include/host_defines.h:88:9: note: expanded from macro '__location__'
        __annotate__(a)
        ^
/usr/local/cuda/include/host_defines.h:86:9: note: expanded from macro '__annotate__'
        __attribute__((a))
        ^
test.cu:7:20: error: __host__ function 'kernel' cannot overload __global__ function 'kernel'
__global__ void A::kernel()
                   ^
test.cu:4:28: note: previous declaration is here
    __global__ static void kernel();
                           ^
2 errors generated.

To satisfy these errors, I've inlined the kernel definition into the class declaration:

class A
{
public:
    __global__ static void kernel()
    {
        // implementation would go here
    }
};

Then clang 4.0 compiles it successfully and it can be executed without any errors. So this is clearly not a limitation of the CUDA language, but its de facto standard compiler. By the way, nvcc has many similar unjustified limitations which clang does not have.

like image 145
Jakub Klinkovský Avatar answered Sep 28 '22 22:09

Jakub Klinkovský