Better or the same: CPU memcpy() vs device cudaMemcpy() on pinned, mapped memory in CUDA?

Question

I have:

Host memory that has been successfully pinned and mapped using cudaHostAlloc(..., cudaHostAllocMapped) or cudaHostRegister(..., cudaHostRegisterMapped);
Device pointers have been obtained using cudaHostGetDevicePointer(...).

I initiate cudaMemcpy(..., cudaMemcpyDeviceToDevice) on src and dest device pointers that point to two different regions of pinned+mapped memory obtained by the technique above. Everything works fine.

Question: should I continue doing this or just use a traditional CPU-style memcpy() since everything is in system memory anyway? ...or are they the same (i.e. does cudaMemcpy map to a straight memcpy when both src and dest are pinned)?

(I am still using the cudaMemcpy method because previously everything was in device global memory, but have since switched to pinned memory due to gmem size constraints)

harrism · Accepted Answer

With cudaMemcpy the CUDA driver detects that you are copying from a host pointer to a host pointer and the copy is done on the CPU. You can of course use memcpy on the CPU yourself if you prefer.

If you use cudaMemcpy, there may be an extra stream synchronize performed before doing the copy (which you may see in the profiler, but I'm guessing there—test and see).

On a UVA system you can just use cudaMemcpyDefault as talonmies says in his answer. But if you don’t have UVA (sm_20+ and 64-bit OS), then you have to call the right copy (e.g. cudaMemcpyDeviceToDevice). If you cudaHostRegister() everything you are interested in then cudaMemcpyDeviceToDevice will end up doing the following depending on the where the memory is located:

Host <-> Host: performed by the CPU (memcpy)
Host <-> Device: DMA (device copy engine)
Device <-> Device: Memcpy CUDA kernel (runs on the SMs, launched by driver)

Better or the same: CPU memcpy() vs device cudaMemcpy() on pinned, mapped memory in CUDA?

Tags:

cuda

memcpy

mikepcw

1 Answers

harrism

Recent Activity

Donate For Us

Better or the same: CPU memcpy() vs device cudaMemcpy() on pinned, mapped memory in CUDA?

Tags:

cuda

memcpy

mikepcw

1 Answers

harrism

Related questions

Recent Activity

Donate For Us