Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA and pinned (page locked) memory not page locked at all?

I try to figure out if CUDA (or the OpenCL implementation) tells the truth when I require pinned (page locked) memory.

I tried cudaMallocHost and looked at the /proc/meminfo values Mlocked and Unevictable , both stay at 0 and never go up (/proc/<pid>/status reports VmLck also as 0). I used mlock to page lock memory and the values go up as expected.

So two possible reasons for this behavior might be:

  1. I don't get page locked memory from the CUDA API and the cudaSuccess is a fake
  2. CUDA bypasses the OS counters for page locked memory because CUDA does some magic with the linux kernel

So the actual question is: Why can’t I get the values for page locked memory from the OS when I use CUDA to allocate page locked memory?

Additionally: Where can I get the right values if not from /proc/meminfo or /proc/<pid>/status?

Thanks!

System: Ubuntu 14.04.01 LTS; CUDA 6.5; Nvidida Driver 340.29; Nvidia Tesla K20c

like image 825
Michael Haidl Avatar asked Nov 12 '14 14:11

Michael Haidl


2 Answers

It would seem that the pinned allocator on CUDA 6.5 under the hood is using mmap() with MAP_FIXED. Although I am not an OS expert, this apparently has the effect of "pinning" memory, i.e. ensuring that its address never changes. However this is not a complete explanation. Refer to the answer by @Jeff which points out what is almost certainly the "missing piece".

Let's consider a short test program:

#include <stdio.h>
#define DSIZE (1048576*1024)

int main(){

  int *data;
  cudaFree(0);
  system("cat /proc/meminfo > out1.txt");
  printf("*$*before alloc\n");
  cudaHostAlloc(&data, DSIZE, cudaHostAllocDefault);
  printf("*$*after alloc\n");
  system("cat /proc/meminfo > out2.txt");
  cudaFreeHost(data);
  system("cat /proc/meminfo > out3.txt");
  return 0;
}

If we run this program with strace, and excerpt the output part between the printf statements, we have:

write(1, "*$*before alloc\n", 16*$*before alloc)       = 16
mmap(0x204500000, 1073741824, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS, 0, 0) = 0x204500000
ioctl(11, 0xc0304627, 0x7fffcf72cce0)   = 0
ioctl(3, 0xc0384657, 0x7fffcf72cd70)    = 0
write(1, "*$*after alloc\n", 15*$*after alloc)        = 15

(note that 1073741824 is exactly one gigabyte, i.e. the same as the requested 1048576*1024)

Reviewing the description of mmap, we have:

address gives a preferred starting address for the mapping. NULL expresses no preference. Any previous mapping at that address is automatically removed. The address you give may still be changed, unless you use the MAP_FIXED flag.

Therefore, assuming the mmap command is successful, the virtual address requested will be fixed, which is probably useful, but not the whole story.

As I mentioned, I am not a OS expert, and it's not obvious to me what exactly about this system call would create a "pinned" mapping/allocation. It may be that the combination of MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS somehow creates a pinned underlying allocation, but I've not found any evidence to support that.

Based on this article it seems that even mlock()-ed pages would not meet the needs of DMA activity, which is one of the key goals of pinned host pages in CUDA. Therefore, it seems that something else is providing the actual "pinning" (i.e. guaranteeing that the underlying physical pages are always memory-resident, and that their virtual-to-physical mapping doesn't change -- the latter part of this is possibly accomplished by MAP_FIXED along with whatever mechanism guarantees that the underlying physical pages don't move in any way).

This mechanism apparently does not use mlock(), and so the mlock'ed pages don't change, before and after. However we would expect a change in the mapping statistic, and if we diff the out1.txt and out2.txt produced by the above program, we see (excerpted):

< Mapped:            87488 kB
---
> Mapped:          1135904 kB

The difference is approximately a gigabyte, the amount of "pinned" memory requested.

like image 159
Robert Crovella Avatar answered Nov 07 '22 15:11

Robert Crovella


Page-locked can mean different things. For user-space applications it usually means keeping the page in memory to avoid a page fault:

"A page that has been locked into memory with a call like mlock() is required to always be physically present in the system's RAM. At a superficial level, locked pages should thus never cause a page fault when accessed by an application. But there is nothing that requires a locked page to always be present in the same place; the kernel is free to move a locked page if the need arises." [1]

Note that these locked pages can still be moved around and aren't suitable for I/O device access.

Instead, another notion of page-locked is called pinning. A pinned page keeps the same physical mapping. Drivers that need this typically do it rather directly and bypass locked page accounting. cudaMallocHost almost certainly uses the cuda driver to pin the pages in this fashion.

More info at [1] below.

[1] https://lwn.net/Articles/600502/

like image 40
Jeff Avatar answered Nov 07 '22 14:11

Jeff