Device sync during async memcpy in CUDA

Question

Suppose I want to perform an async memcpy host to device in CUDA, then immediately run the kernel. How can I test in the kernel if the async transfer has completed ?

Jared Hoberock · Accepted Answer

Sequencing your asynchronous copy and kernel launch using a CUDA "stream" ensures that the kernel executes after the asynchronous transfer has completed. The following code example demonstrates:

#include <stdio.h>

__global__ void kernel(const int *ptr)
{
  printf("Hello, %d
", *ptr);
}

int main()
{
  int *h_ptr = 0;

  // allocate pinned host memory with cudaMallocHost
  // pinned memory is required for asynchronous copy
  cudaMallocHost(&h_ptr, sizeof(int));

  // look for thirteen in the output
  *h_ptr = 13;

  // allocate device memory
  int *d_ptr = 0;
  cudaMalloc(&d_ptr, sizeof(int));

  // create a stream
  cudaStream_t stream;
  cudaStreamCreate(&stream);

  // sequence the asynchronous copy on our stream
  cudaMemcpyAsync(d_ptr, h_ptr, sizeof(int), cudaMemcpyHostToDevice, stream);

  // sequence the kernel on our stream after the copy
  // the kernel will execute after the copy has completed
  kernel<<<1,1,0,stream>>>(d_ptr);

  // clean up after ourselves
  cudaStreamDestroy(stream);
  cudaFree(d_ptr);
  cudaFreeHost(h_ptr);
}

And the output:

$ nvcc -arch=sm_20 async.cu -run
Hello, 13

I don't believe there's any supported way to test from within a kernel whether some asynchronous condition (such as the completion of an asynchronous transfer) has been met. CUDA thread blocks are assumed to execute completely independently from other threads of execution.

Device sync during async memcpy in CUDA

Tags:

asynchronous

cuda

memcpy

Stefano Borini

1 Answers

Jared Hoberock

Recent Activity

Donate For Us

Device sync during async memcpy in CUDA

Tags:

asynchronous

cuda

memcpy

Stefano Borini

1 Answers

Jared Hoberock

Related questions

Recent Activity

Donate For Us