Hi I have a doubt about programming in CUDA. I have the following code:
int main () {
for (;;) {
kernel_1 (x1, x2, ....);
kernel_2 (x1, x2 ...);
kernel_3_Reduction (x1);
// code manipulation host_x1
// Copy the pointer device to host
cpy (host_x1, x1, DeviceToHost)
cpu_code_x1_manipulation;
kernel_ (x1, x2, ....);
}
}
So when the copies made and how do I ensure that kernel_1, kernel_2 kernel_3 and completed their tasks?
Synchronization between Threads The CUDA API has a method, __syncthreads() to synchronize threads. When the method is encountered in the kernel, all threads in a block will be blocked at the calling location until each of them reaches the location.
Figure 1 shows that the CUDA kernel is a function that gets executed on GPU. The parallel portion of your applications is executed K times in parallel by K different CUDA threads, as opposed to only one time like regular C/C++ functions. Figure 1. The kernel is a function executed on the GPU.
3 Answers. Show activity on this post. Kernel calls are asynchronous from the point of view of the CPU so if you call 2 kernels in succession the second one will be called without waiting for the first one to finish. It only means that the control returns to the CPU immediately.
The simplest form of synchronization in CUDA is the __syncthreads() function which works like a barrier. Once one thread reaches the __syncthreads() call, it will wait until all threads have reached it.
All operations launched on the same stream are synchronized. In the code above, all kernels will run one after another. You will have to explicitly specify streams if you need kernel_1 and kernel_2 run in parallel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With