When to use cudaHostRegister() and cudaHostAlloc()? What is the meaning of "Pinned or page-locked" memory? Which are the equivalent in OpenCL?

I am just new with this APIs of the Nvidia and some expressions are not so clear for me. I was wondering if somebody can help me to understand when and how to use these CUDA commands in a simply way. To be more precise:

Studing how is possible to speed up some applications with parallel execution of a kernel (with CUDA for example), at some point I was facing the problem of speeding up the interaction Host-Device. I have some informations, taken surfing on the web, but I am little bit confused. It clear that you can go faster when it is possible to use cudaHostRegister() and/or cudaHostAlloc(). Here it is explained that

"you can use the cudaHostRegister() command to take some data (already allocated) and pin it avoiding extra copy to take into the GPU".

What is the meaning of "pin the memory"? Why is it so fast? How can I do this previously in this field? After, in the same video in the link, they continue explaining that

"if you are transferring PINNED memory, you can use the asynchronous memory transfer, cudaMemcpyAsync(), which let's the CPU keep working during the memory transfer".

Are the PCIe transaction managed entirely from the CPU? Is there a manager of a bus that takes care of this? Also partial answers are really appreciated to re-compose the puzzle at the end.

It is also appreciate to have some link about the equivalent APIs in OpenCL.

What does pinned memory mean?

Pinned memory is virtual memory pages that are specially marked so that they cannot be paged out. They are allocated with special system API function calls. The important point for us is that CPU memory that serves as the source of destination of a DMA transfer must be allocated as pinned memory.

What is pinned memory in Cuda?

For CUDA 8. x and below, pinned memory is “non-pageable”, which means that the shared memory region will not be coherent. In a non-coherent environment, pages are not cached and every access by the GPU (device) will use the system memory directly (skipping cache), causing higher latency and bandwidth usage.

What is page locked memory?

With paged memory, the specific memory, which is allowed to be paged in or paged out, is called pageable memory. Conversely, the specific memory, which is not allowed to be paged in or paged out, is called page-locked memory or pinned memory. Page-locked memory will not communicate with hard drive.

What is the meaning of "pin the memory"?

It means make the memory page locked. That is telling the operating system virtual memory manager that the memory pages must stay in physical ram so that they can be directly accessed by the GPU across the PCI-express bus.

Why is it so fast?

In one word, DMA. When the memory is page locked, the GPU DMA engine can directly run the transfer without requiring the host CPU, which reduces overall latency and decreases net transfer times.

Are the PCIe transaction managed entirely from the CPU?

No. See above.

Is there a manager of a bus that takes care of this?

No. The GPU manages the transfers. In this context there is no such thing as a bus master

EDIT: Seems like CUDA treats pinned and page-locked as the same as per the Pinned Host Memory section in this blog written by Mark Harris. This means by answer is moot and the best answer should be taken as is.

I bumped into this question while looking for something else. For all future users, I think @talonmies answers the question perfectly, but I'd like to bring to notice a slight difference between locking and pinning pages - the former ensures that the memory is not pageable but the kernel is free to move it around and the latter ensures that it stays in memory (i.e. non-pageable) but also is mapped to the same address. Here's a reference to the same.

When to use cudaHostRegister() and cudaHostAlloc()? What is the meaning of "Pinned or page-locked" memory? Which are the equivalent in OpenCL?

Tags:

memory-management

cuda

opencl

Leos313

People also ask

2 Answers

5 revs

Kshitij Lakhani

Recent Activity

Donate For Us

When to use cudaHostRegister() and cudaHostAlloc()? What is the meaning of "Pinned or page-locked" memory? Which are the equivalent in OpenCL?

Tags:

memory-management

cuda

opencl

Leos313

People also ask

2 Answers

5 revs

Kshitij Lakhani

Related questions

Recent Activity

Donate For Us