Does CPU waits for DEVICE to let it finish its kernel execution....?

Tags:

cuda

Does host wait for device to finish its execution compeletely? e.g. the program has the structure as follows

// cpu code segment

// data transfer from host to device

QUESTION - WILL CPU WAIT FOR DEVICE TO FINISH TRANSFER? IF NO, IS IT POSSIBLE? IF YES, HOW?

// kernel launch

QUESTION - WILL CPU WAIT FOR DEVICE TO LET IT FINISH KERNEL EXECUTION (CONSIDERING KERNEL EXECUTION WILL TAKE NOTABLE TIME say-5 sec)? IF NO, IS IT POSSIBLE? IF YES, HOW?

// data transfer from device to host

// program terminates after printing some information

435

asked Sep 28 '12 11:09

Jitendra

1 Answers

The synchronization functions of the CUDA run-time can let you achieve what you want.

cudaDeviceSynchronize():

When you call this function, the CPU will wait until the device has completed ALL its work, whether it is memory copy or kernel execution.

cudaStreamSynchronize(cudaStream):

This function will block the CPU until the specified CUDA stream has finished its execution. Other CUDA streams will continue their execution asynchronously.

answered Oct 13 '22 21:10

sgarizvi

Related questions
                            
                                pyCUDA vs C performance differences?
                            
                                How to install libcusolver.so.11
                            
                                Any Lisp extensions for CUDA?
                            
                                How to install Cudnn from command line
                            
                                Unified Memory profiling failed
                            
                                Double precision floating point in CUDA
                            
                                Kernel parameter passing in CUDA?
                            
                                CUDA max threads in a block
                            
                                Tensorflow cannot open libcuda.so.1
                            
                                How can I get the nvcc CUDA compiler to optimize more?
                            
                                Why aren't there bank conflicts in global memory for Cuda/OpenCL?
                            
                                CUDA error message : unspecified launch failure
                            
                                What's wrong with casting like (void**)&device_array?
                            
                                nvidia-smi Failed to initialize NVML: GPU access blocked by the operating system
                            
                                Interpreting the verbose output of ptxas, part I
                            
                                Using openMP in the cuda host code?
                            
                                Is it possible to access hard disk directly from gpu?
                            
                                Numpy, BLAS and CUBLAS
                            
                                Allocate 2D Array on Device Memory in CUDA
                            
                                What is the difference between PyCUDA and NumbaPro CUDA Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With