Just a general question about cublas. For a single thread, if there is not memory transfer from GPU to CPU (e.g. cublasGetVector), will the cublas kernel functions (eg cublasDgemm) automatically be synchronized with the host?
cublasDgemm();
//cublasGetVector();
host_functions()
Furthermore, what about between two adjacent kernel calls?
cublasDgemm();
cublasDgemm();
and, what about a synchronized transfer that does not involve the global memory used in the previous kernel?
cublasDgemm(...gA...gB...gC);
cublasGetVector(...gD...D...);
No, the CUBLAS API is, with the exception of a few Level 1 routines which return a scalar value, asynchronous.
Level 3 routines like cublasDgemm
don't block the host, you need to call a blocking API routine like a synchronous memory transfer or an explicit host-GPU synchronisation call to ensure that the CUBLAS call has completed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With