When I call a kernel with ill-set parameters (e.g. more than 512 threads per block) or when the operations inside it require more than what my device has to offer (e.g. too many registers) the kernel is simply not executed. There is no exception or return value to indicate what happened though.
I'd like to know if there's a way to verify if a kernel was executed or not.
In a CUDA program implementation, both development and production code, always check the return value of each CUDA runtime synchronous or asynchronous API call to see if there is any CUDA synchronous error, always run CUDA runtime error capturing API calls, such as cudaGetLastError , after kernel launch calls to see if ...
In order to launch a CUDA kernel we need to specify the block dimension and the grid dimension from the host code. I'll consider the same Hello World! code considered in the previous article. In the above code, to launch the CUDA kernel two 1's are initialised between the angle brackets.
The kernel is a function executed on the GPU. Every CUDA kernel starts with a __global__ declaration specifier. Programmers provide a unique global ID to each thread by using built-in variables. Figure 2. CUDA kernels are subdivided into blocks.
CUDA Dynamic Parallelism Under dynamic parallelism, one kernel may launch another kernel, and that kernel may launch another, and so on. Each subordinate launch is considered a new “nesting level,” and the total number of levels is the “nesting depth” of the program.
try this
kernel<<<blocks, threads>>>(params); cudaError_t err = cudaGetLastError(); if (err != cudaSuccess)      printf("Error: %s\n", cudaGetErrorString(err)); This should give you a detailed error about what went wrong.
EDIT: Here's a more detailed answer about how to properly check errors in CUDA:
Also you could print something from the kernel. This may be useful for debugging.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With