Does host wait for device to finish its execution compeletely? e.g. the program has the structure as follows
// cpu code segment
// data transfer from host to device
QUESTION - WILL CPU WAIT FOR DEVICE TO FINISH TRANSFER? IF NO, IS IT POSSIBLE? IF YES, HOW?
// kernel launch
QUESTION - WILL CPU WAIT FOR DEVICE TO LET IT FINISH KERNEL EXECUTION (CONSIDERING KERNEL EXECUTION WILL TAKE NOTABLE TIME say-5 sec)? IF NO, IS IT POSSIBLE? IF YES, HOW?
// data transfer from device to host
// program terminates after printing some information
The kernel execution configuration defines the dimensions of a grid and its blocks. Unique coordinates in blockIdx and threadIdx variables allow threads of a grid to identify themselves and their domains of data.
cudaDeviceSynchronize() returns any error code of that has occurred in any of those kernels. Note that when a thread calls cudaDeviceSynchronize() , it is not aware which kernel launch constructs has been already executed by other threads in the block.
The synchronization functions of the CUDA run-time can let you achieve what you want.
cudaDeviceSynchronize()
:
When you call this function, the CPU will wait until the device has completed ALL its work, whether it is memory copy or kernel execution.
cudaStreamSynchronize(cudaStream)
:
This function will block the CPU until the specified CUDA stream has finished its execution. Other CUDA streams will continue their execution asynchronously.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With