What is the equivalent technique of an assertion in CUDA kernel code?
There does not seem to be an assert for CUDA kernel code. I want a way to catch programmer mistakes easily in kernel code. A mechanism where I can set conditions that need to be true and the kernel should bail out when the condition is false with an error message.
In order to run a kernel on the CUDA threads, we need two things. First, in the main() function of the program, we call the function to be executed by each thread on the GPU. This invocation is called Kernel Launch and with it we need provide the number of threads and their grouping.
When a kernel is called, its execution configuration is provided through <<<...>>> syntax, e.g. cuda_hello<<<1,1>>>() . In CUDA terminology, this is called "kernel launch".
Basically, a child CUDA kernel can be called from within a parent CUDA kernel and then optionally synchronize on the completion of that child CUDA Kernel. The parent CUDA kernel can consume the output produced from the child CUDA kernel, all without CPU involvement [136].
Figure 1 shows that the CUDA kernel is a function that gets executed on GPU. The parallel portion of your applications is executed K times in parallel by K different CUDA threads, as opposed to only one time like regular C/C++ functions. Figure 1. The kernel is a function executed on the GPU.
Such function can be called through host code, e.g. the main () function in the example, and is also known as " kernels ". When a kernel is called, its execution configuration is provided through <<<...>>> syntax, e.g. cuda_hello<<<1,1>>> (). In CUDA terminology, this is called " kernel launch ".
Compiling a CUDA program is similar to C program. NVIDIA provides a CUDA compiler called nvcc in the CUDA toolkit to compile CUDA code, typically stored in a file with extension .cu. For example You might see following warning when compiling a CUDA program using above command
CUDA C++ extends C++ by allowing the programmer to define C++ functions, called kernels, that, when called, are executed N times in parallel by N different CUDA threads, as opposed to only once like regular C++ functions.
When a CUDA program on the host CPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available execution capacity. The threads of a thread block execute concurrently on one multiprocessor, and multiple thread blocks can execute concurrently on one multiprocessor.
For devices of cc 2.x or above, assertion , void assert(int expression)
, could be used within a kernel such that threads with expression == 0
send a message to stderr once a host synchronization function is called.
For other cases or when assertion cannot be used (e.g. on MacOS), you won't be able to return an error message or error code to the host from a kernel.
Instead, I would set a error state and check it from the host. Use device global memory or (better) mapped host memory for storing an error state, passed as a parameter to each kernel call. Use if statements in the kernel, and of if the statements fail, set the error code and return. You will be able to check the error code from the host after the kernel call, but keep in mind that you will have synchronize the host and device after the kernel launch before checking the error code. I guess this will work fine for development but not so much for production.
As to printing an error message straight from the device
I would like to point out that an assert may occur in one thread only, but if you decide to early terminate that thread its absense may cause other bugs (and probably other asserts) happening later; possibly leading to a complete kernel crash and loose of all information on the GPU.
Also, the answer given at " Using assert within kernel invocation " will work only if the assert is used directly in the __ global__ function and not deeper, somewhere inside __ device__ function.
My suggestion is, that even an assert fails, you proceed normally with your code, but leave an error message. You can use mapped, pinned memory (you map host RAM memory into GPU address space) to store error codes/messages. That way, even if your kernel crashes and GPU is reset, you are likely to obtain valuable information in that mapped memory. If I am not mistaken, mapped, pinned memory is supported by almost all devices of Compute Capability 1.1 and higher.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With