Why is CUDA pinned memory so fast?

Tags:

I observe substantial speedups in data transfer when I use pinned memory for CUDA data transfers. On linux, the underlying system call for achieving this is mlock. From the man page of mlock, it states that locking the page prevents it from being swapped out:

mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully;

In my tests, I had a fews gigs of free memory on my system so there was never any risk that the memory pages could've been swapped out yet I still observed the speedup. Can anyone explain what's really going on here?, any insight or info is much appreciated.

595

asked Apr 20 '11 21:04

Gearoid Murphy

1 Answers

CUDA Driver checks, if the memory range is locked or not and then it will use a different codepath. Locked memory is stored in the physical memory (RAM), so device can fetch it w/o help from CPU (DMA, aka Async copy; device only need list of physical pages). Not-locked memory can generate a page fault on access, and it is stored not only in memory (e.g. it can be in swap), so driver need to access every page of non-locked memory, copy it into pinned buffer and pass it to DMA (Syncronious, page-by-page copy).

As described here http://forums.nvidia.com/index.php?showtopic=164661

host memory used by the asynchronous mem copy call needs to be page locked through cudaMallocHost or cudaHostAlloc.

I can also recommend to check cudaMemcpyAsync and cudaHostAlloc manuals at developer.download.nvidia.com. HostAlloc says that cuda driver can detect pinned memory:

The driver tracks the virtual memory ranges allocated with this(cudaHostAlloc) function and automatically accelerates calls to functions such as cudaMemcpy().

answered Sep 23 '22 02:09

osgx

Related questions
                            
                                Required and Optional Arguments Using Boost Library Program Options
                            
                                What is the point of STL Character Traits?
                            
                                Why do C and C++ allow the expression (int) + 4*5?
                            
                                Passing std::string by Value or Reference [duplicate]
                            
                                long long in C/C++
                            
                                Variadic template pack expansion
                            
                                std::thread calling method of class [duplicate]
                            
                                Can't use enum class as unordered_map key
                            
                                Restrict C++ Template Parameter to Subclass
                            
                                Operator new initializes memory to zero
                            
                                Fast rectangle to rectangle intersection
                            
                                Is calling destructor manually always a sign of bad design?
                            
                                Right way to split an std::string into a vector<string>
                            
                                How to get file extension from string in C++
                            
                                What exactly does GCC's -Wpsabi option do? What are the implications of supressing it?
                            
                                Unicode encoding for string literals in C++11
                            
                                How is the C++ exception handling runtime implemented?
                            
                                Meaning of int (*) (int *) = 5 (or any integer value)
                            
                                Java 8 times faster with arrays than std::vector in C++. What did I do wrong?
                            
                                __cdecl or __stdcall on Windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is CUDA pinned memory so fast?

Tags:

c++

c

linux

cuda

Gearoid Murphy

People also ask

1 Answers

osgx

Recent Activity

Donate For Us