I have the following code:
main.cu:
#include "class.h"
int main () {}
class.h:
class Class {
__global__
void Function() {};
};
When I compile this code using the command nvcc -c main.cu -o main.o
, I get the following errors:
class.h(3): warning: inline qualifier ignored for "global" function
class.h(3): error: illegal combination of memory qualifiers
I have a question about each of these errors. Why does it "ignore" the __global__
qualifier for the function, and why is the __global__
memory qualifier illegal in this context? I have read in the documentation that
E.2.10.2. Function Members
Static member functions cannot be __global__ functions.
However, my function is not a static member, as far as I know. Removing the __global__
line allows it to compile, and so does moving the __global__
and void Function();
lines into main.cu. If this actually ISN'T allowed, why does CUDA force this limitation, and what is a way to get around this while still maintaining structured code?
To clarify, I know no other way to make classes that have functions which can create GPU kernels. It seems to me like kernels can only be created from global functions in main.cu. I am fairly new to CUDA programming, so I may just be missing some CUDA conventions which may have been unclear to me. If this is the case, then please let me know so I can keep up with proper programming practice.
When compiling the code below using nvcc (CUDA 5.0), the error "illegal combination of memory qualifiers" appears, as it apparently is impossible to have global kernels in a class. I can understand this restriction when dealing with non-static members, but why does the error still occur when the kernel is declared static?
There are four types of memory allocation in CUDA. The memory allocated in host is by default pageable memory. The data at this memory location is usable by the host. To transfer this data to the device, the CUDA run time copies this memory to a temporary pinned memory and then transfers to the device memory.
If you google "cuda global class member" youll find a number of treatments of this, including SO questions like here and here, your question is arguably a duplicate of those. As a simple suggestion, you could wrap your cuda kernels in host-callable class member functions, to " keep up with proper programming practice."
cudaFree — to recover the allocated device memory There are four types of memory allocation in CUDA. The memory allocated in host is by default pageable memory. The data at this memory location is usable by the host.
My understanding is that you want to use CUDA kernels in an OOP fashion. If this was the case, the class structure below should work:
// myclass.h
class MyClass {
public:
void call_kernel( ... );
};
// myclass.cu
__global__
void my_kernel( ... ) {
// do some work
}
void MyClass::call_kernel() {
// prepare data for the kernel, e.g. allocating memory, copying from host to device, etc.
// run kernel
my_kernel <<< ... >>>( ... );
// copy results from device to host, clean up, etc.
}
Please note that if you have multiple classes containing kernel code, their source code file should all use .cu
extension, and you should enable separate compilation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With