I have a newbie doubt regarding how do CUDA kernels work.
If have the following code (which use the function cuPrintf
taken from here):
#include "cuPrintf.cu"
__global__ void testKernel(int param){
cuPrintf("Param value: %d\n", param);
}
int main(void){
// initialize cuPrintf
cudaPrintfInit();
int a = 456;
testKernel<<<4,1>>>(a);
// display the device's greeting
cudaPrintfDisplay();
// clean up after cuPrintf
cudaPrintfEnd();
}
The output of the execution is:
Param value: 456
Param value: 456
Param value: 456
Param value: 456
I cannot get how the kernel can read the correct value of the parameter I pass, isn't it allocated in the host memory? Can the GPU read from the host memory?
Thanks,
Andrea
The kernel is a function executed on the GPU. Every CUDA kernel starts with a __global__ declaration specifier. Programmers provide a unique global ID to each thread by using built-in variables. Figure 2. CUDA kernels are subdivided into blocks.
Dynamic Parallelism in CUDA 5.0 enables a CUDA kernel to create and synchronize new nested work, using the CUDA runtime API to launch other kernels, optionally synchronize on kernel completion, perform device memory management, and create and use streams and events, all without CPU involvement.
kernel cannot allocate, and only isbits types in device arrays: CUDA C has no garbage collection, and Julia has no manual deallocations, let alone on the device to deal with data that live independently of the CuArray. no try-catch-finally in kernel: CUDA C does not support exception handling on device (v11.
According to the section E.2.5.2. Function Parameters in CUDA C Programming Guide
__global__ function parameters are passed to the device:
The declaration void testKernel(int param)
says that param
is passed by value, not by reference. In other words, the stack contains a copy of a
's value, not a pointer to a
. CUDA copies the stack to the kernel running on the GPU.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With