Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA kernel as member function of a class

Tags:

c++

windows

cuda

I am using CUDA 5.0 and a Compute Capability 2.1 card.

The question is quite straightforward: Can a kernel be part of a class? For example:

class Foo
{
private:
 //...
public:
 __global__ void kernel();
};

__global__ void Foo::kernel()
{
 //implementation here
}

If not then the solution is to make a wrapper function that is member of the class and calls the kernel internally?

And if yes, then will it have access to the private attributes as a normal private function?

(I'm not just trying it and see what happens because my project has several other errors right now and also I think it's a good reference question. It was difficult for me to find reference for using CUDA with C++. Basic functionality examples can be found but not strategies for structured code.)

like image 240
George Aprilis Avatar asked Dec 06 '12 16:12

George Aprilis


1 Answers

Let me leave cuda dynamic parallelism out of the discussion for the moment (i.e. assume compute capability 3.0 or prior).

remember __ global__ is used for cuda functions that will (only) be called from the host (but execute on the device). If you instantiate this object on the device, it won't work. Furthermore, to get device-accessible private data to be available to the member function, the object would have to be instantiated on the device.

So you could have a kernel invocation (ie. mykernel<<<blocks,threads>>>(...); embedded in a host object member function, but the kernel definition (i.e. the function definition with the __ global__ decorator) would normally precede the object definition in your source code. And as stated already, such a methodology could not be used for an object instantiated on the device. It would also not have access to ordinary private data defined elsewhere in the object. (It may be possible to come up with a scheme for a host-only object that does create device data, using pointers in global memory, that would then be accessible on the device, but such a scheme seems quite convoluted to me at first glance).

Normally, device-usable member functions would be preceded by the __ device__ decorator. In this case, all the code in the device member function executes from within the thread that called it.

This question gives an example (in my edited answer) of a C++ object with a member function callable from both the host and the device, with appropriate data copying between host and device objects.

like image 84
Robert Crovella Avatar answered Sep 20 '22 19:09

Robert Crovella