Appendix D of the 3.2 version of the CUDA documentation refers to C++ support in CUDA device code.
It is clearly mentioned that CUDA supports "Classes for devices of compute capability 2.x". However, I'm working with devices of compute capability 1.1 and 1.3 and I can use this feature!
For instance, this code works:
// class definition voluntary simplified
class Foo {
private:
int x_;
public:
__device__ Foo() { x_ = 42; }
__device__ void bar() { return x_; }
};
//kernel using the previous class
__global__ void testKernel(uint32_t* ddata) {
Foo f;
ddata[threadIdx.x] = f.bar();
}
I'm also able to use widespread libraries such as Thrust::random random generation classes.
My only guess is that I'm able to do so thanks to the automatic inlining of __device__
marked function, but this does not explain the handling of member variables withal.
Have you ever used such features in the same conditions, or can you explain to me why my CUDA code behaves this way? Is there something wrong in the reference guide?
CUDA stands for Compute Unified Device Architecture. It is an extension of C/C++ programming. CUDA is a programming language that uses the Graphical Processing Unit (GPU).
CUDA C is essentially C/C++ with a few extensions that allow one to execute functions on the GPU using many threads in parallel.
Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python.
__global__ : 1. A qualifier added to standard C. This alerts the compiler that a function should be compiled to run on a device (GPU) instead of host (CPU).
Oficially, CUDA has no support for classes on devices prior to 2.0.
Practically, from my experience, you can use all C++ features on all devices as long as the functionality can be resolved at compile-time. Devices prior to 2.0 do not support function calls (all functions are inlined) and no program jumps to a variable address (only jumps at constant address).
This means, you can use the following C++ constructs:
You cannot use the following:
Actually, all examples in chapter D.6 of the CUDA Programming Guide can compile for devices <2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With