When compiling the code below using nvcc (CUDA 5.0), the error "illegal combination of memory qualifiers" appears, as it apparently is impossible to have global kernels in a class.
class A
{
public:
__global__ static void kernel();
};
__global__ void A::kernel()
{}
I can understand this restriction when dealing with non-static members, but why does the error still occur when the kernel is declared static? The calling of such members would be no different from calling the function when it is declared in a namespace (A
in this case).
A::kernel <<< 1, 1 >>> ();
Is there a reason I'm missing as to why this hasn't been implemented (yet)?
EDIT: Based on the responses in both the answers and comments, I haven't been clear enough on my question. My question is not why an error appears. Obviously, this is because it hasn't been implemented. My question is why it hasn't been implemented. So far, I haven't been able to think of a reason that keeps this feature from being implemented. I realize that I might have forgotten about a special case which would complicate matters, hence the question.
The reasons I believe this to be a reasonable feature are:
this
pointer So even if the kernel is called on an object that lives on the host, there is no conflict in accessing its data, as this data is inaccessible in the first place (data from what object??).A a; a.staticKernel<<<...,...>>>();)
would be entirely equivalent to calling it without the object at all (A::staticKernel<<<...,...>>>();
), as we are used to in regular C++.What am I missing?
Fortunately, about 4 years after this question has been asked, clang 4.0 can compile the CUDA language. Consider this example:
class A
{
public:
__global__ static void kernel();
};
__device__ void A::kernel()
{}
int main()
{
A::kernel <<< 1, 1 >>> ();
};
When I try to compile it with clang 4.0, I get the following error:
test.cu:7:1: error: kernel function 'kernel' must be a free function or static member function
__global__ void A::kernel()
^
/usr/local/cuda/include/host_defines.h:191:9: note: expanded from macro '__global__'
__location__(global)
^
/usr/local/cuda/include/host_defines.h:88:9: note: expanded from macro '__location__'
__annotate__(a)
^
/usr/local/cuda/include/host_defines.h:86:9: note: expanded from macro '__annotate__'
__attribute__((a))
^
test.cu:7:20: error: __host__ function 'kernel' cannot overload __global__ function 'kernel'
__global__ void A::kernel()
^
test.cu:4:28: note: previous declaration is here
__global__ static void kernel();
^
2 errors generated.
To satisfy these errors, I've inlined the kernel definition into the class declaration:
class A
{
public:
__global__ static void kernel()
{
// implementation would go here
}
};
Then clang 4.0 compiles it successfully and it can be executed without any errors. So this is clearly not a limitation of the CUDA language, but its de facto standard compiler. By the way, nvcc has many similar unjustified limitations which clang does not have.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With