Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cudaMemcpy & blocking

Tags:

cuda

I'm confused by some comments I've seen about blocking and cudaMemcpy. It is my understanding that the Fermi HW can simultaneously execute kernels and do a cudaMemcpy.

I read that Lib func cudaMemcpy() is a blocking function. Does this mean the func will block further execution until the copy has has fully completed? OR Does this mean the copy won't start until the previous kernels have finished?

e.g. Does this code provide the same blocking operation?

SomeCudaCall<<<25,34>>>(someData);
cudaThreadSynchronize();

vs

SomeCudaCall<<<25,34>>>(someParam);
cudaMemcpy(toHere, fromHere, sizeof(int), cudaMemcpyHostToDevice);
like image 671
Doug Avatar asked Jul 23 '12 19:07

Doug


People also ask

What is cudaMemcpy?

cudaMemcpy() Blocks the CPU until the copy is complete. Copy begins when all preceding CUDA calls have completed. cudaMemcpyAsync() Asynchronous, does not block the CPU.

Is cudaMemcpy blocked?

Most CUDA calls are synchronous (often called “blocking”). An example of a blocking call is cudaMemcpy().

What does CUDA malloc do?

malloc() allocates dynamic memory on host i.e. on cpu. Allocating global memory on device you need to call cudaMalloc(). To operate on data using gpu your hole data needs to transfer on global memory. cudaMalloc() only allocates memory, it'll not copies your data on device memory.

What is CUDA runtime API?

The CUDA runtime makes it possible to compile and link your CUDA kernels into executables. This means that you don't have to distribute cubin files with your application, or deal with loading them through the driver API. As you have noted, it is generally easier to use.


1 Answers

Your examples are equivalent. If you want asynchronous execution you can use streams or contexts and cudaMemcpyAsync, so that you can overlap execution with copy.

like image 172
perreal Avatar answered Nov 11 '22 23:11

perreal