How do I know that cudaMemcpyAsync is done reading host memory?

Tags:

cuda

For example... Here's what I see in NVIDIA's docs:

cudaMemcpyAsync(a_d, a_h, size, cudaMemcpyHostToDevice, 0);
kernel<<<grid, block>>>(a_d);
cpuFunction();

Let's say this is wrapped in a function...

void consume() {
  cudaMemcpyAsync(a_d, a_h, size, cudaMemcpyHostToDevice, 0);
  kernel<<<grid, block>>>(a_d);
}

What if I also have a function

void produce() {
  // do stuff
  a_h[0] = 1;
  a_h[1] = 3;
  a_h[2] = 5;
  //...
}

If I call:

produce();
consume();
produce(); // problem??

The second produce() will start to change the memory on the host at a_h

How do I know that CUDA isn't still reading the host memory during the asynchronous memory copy routine?

How can I safely write to the host a_h memory without disrupting that asynchronous mem copy?

EDIT---

I know I can call cudaDeviceSynchronize() or cudaStreamSynchronize() but that will also wait for kernel to complete. I would prefer to not wait until kernel is done.

I want to start writing to host a_h as soon as possible, while not waiting for kernel to finish.

462

asked Mar 10 '17 13:03

1 Answers

If you use a stream for the cudaMemcpyAsync call, you can insert an event into the stream after the asynchronous transfer and then use cudaEventSynchronize to synchronize on that event. This guarantees that the copy has finished, but doesn't rely on the device being idle or the stream being empty.

137

answered Sep 21 '22 23:09

2 revs

Related questions
                            
                                Element-by-element vector multiplication with CUDA
                            
                                Sorting (small) arrays by key in CUDA
                            
                                Cannot run CUDA code that queries NVML - error regarding libnvidia-ml.so
                            
                                RGB to greyscale conversion using CUDA
                            
                                Understanding CUDA profiler output (nvprof)
                            
                                How to read successfully from a 2D texture
                            
                                How does the opencl command queue work, and what can I ask of it
                            
                                How to decrement each element of a device_vector by a constant?
                            
                                Poor performance for calculating eigenvalues and eigenvectors on GPU
                            
                                Should CUDA events and streams always be destroyed?
                            
                                Determinant calculation with CUDA [closed]
                            
                                cryptography hardware acceleration with GPU
                            
                                How to make vector-type-value to pinned memory in cuda
                            
                                Cuda virtual class
                            
                                How do I get a free version (non-trial) of the compiler "Cuda Fortran"? [closed]
                            
                                Passing the PTX program to the CUDA driver directly
                            
                                Redirecting CUDA printf to a C++ stream
                            
                                Numba Matrix Vector multiplication
                            
                                Cudafy cannot find cublas, cudafft
                            
                                ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I know that cudaMemcpyAsync is done reading host memory?

Tags:

cuda

tmsimont

People also ask

1 Answers

2 revs

Recent Activity

Donate For Us