Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I know that cudaMemcpyAsync is done reading host memory?

Tags:

cuda

For example... Here's what I see in NVIDIA's docs:

cudaMemcpyAsync(a_d, a_h, size, cudaMemcpyHostToDevice, 0);
kernel<<<grid, block>>>(a_d);
cpuFunction();

Let's say this is wrapped in a function...

void consume() {
  cudaMemcpyAsync(a_d, a_h, size, cudaMemcpyHostToDevice, 0);
  kernel<<<grid, block>>>(a_d);
}

What if I also have a function

void produce() {
  // do stuff
  a_h[0] = 1;
  a_h[1] = 3;
  a_h[2] = 5;
  //...
}

If I call:

produce();
consume();
produce(); // problem??

The second produce() will start to change the memory on the host at a_h

How do I know that CUDA isn't still reading the host memory during the asynchronous memory copy routine?

How can I safely write to the host a_h memory without disrupting that asynchronous mem copy?

EDIT---

I know I can call cudaDeviceSynchronize() or cudaStreamSynchronize() but that will also wait for kernel to complete. I would prefer to not wait until kernel is done.

I want to start writing to host a_h as soon as possible, while not waiting for kernel to finish.

like image 462
tmsimont Avatar asked Mar 10 '17 13:03

tmsimont


People also ask

What is pinned memory Cuda?

Pinned memory is used to speed up a CPU to GPU memory copy operation (as executed by e.g. tensor. cuda() in PyTorch) by ensuring that none of the memory that is to be copied is on disk. Memory cached to disk has to be read into RAM before it can be transferred to the GPU—e.g. it has to be copied twice.

What is cudaMallocHost?

cudaMallocHost: Allocates page-locked memory on the host in duncantl/RCUDA: R Bindings for the CUDA Library for GPU Computing.

What is cudaDeviceSynchronize?

cudaDeviceSynchronize() returns any error code of that has occurred in any of those kernels. Note that when a thread calls cudaDeviceSynchronize() , it is not aware which kernel launch constructs has been already executed by other threads in the block.

How do Cuda streams work?

A stream in CUDA is a sequence of operations that execute on the device in the order in which they are issued by the host code. While operations within a stream are guaranteed to execute in the prescribed order, operations in different streams can be interleaved and, when possible, they can even run concurrently.


1 Answers

If you use a stream for the cudaMemcpyAsync call, you can insert an event into the stream after the asynchronous transfer and then use cudaEventSynchronize to synchronize on that event. This guarantees that the copy has finished, but doesn't rely on the device being idle or the stream being empty.

like image 137
2 revs Avatar answered Sep 21 '22 23:09

2 revs