Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the Cuda runtime reserve 80 GiB virtual memory upon initialization?

Tags:

cuda

I was profiling my Cuda 4 program and it turned out that at some stage the running process used over 80 GiB of virtual memory. That was a lot more than I would have expected. After examining the evolution of the memory map over time and comparing what line of code it is executing it turned out that after these simple instructions the virtual memory usage bumped up to over 80 GiB:

  int deviceCount;
  cudaGetDeviceCount(&deviceCount);
  if (deviceCount == 0) {
    perror("No devices supporting CUDA");
  }

Clearly, this is the first Cuda call, thus the runtime got initialized. After this the memory map looks like (truncated):

Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000   89796   14716       0 r-x--  prg
0000000005db1000      12      12       8 rw---  prg
0000000005db4000      80      76      76 rw---    [ anon ]
0000000007343000   39192   37492   37492 rw---    [ anon ]
0000000200000000    4608       0       0 -----    [ anon ]
0000000200480000    1536    1536    1536 rw---    [ anon ]
0000000200600000 83879936       0       0 -----    [ anon ]

Now with this huge memory area mapped into virtual memory space.

Okay, its maybe not a big problem since reserving/allocating memory in Linux doesn't do much unless you actually write to this memory. But it's really annoying since for example MPI jobs have to be specified with the maximum amount of vmem usable by the job. And 80GiB that's s just a lower boundary then for Cuda jobs - one has to add all other stuff too.

I can imagine that it has to do with the so-called scratch space that Cuda maintains. A kind of memory pool for kernel code that can dynamically grow and shrink. But that's speculation. Also it's allocated in device memory.

Any insights?

like image 233
ritter Avatar asked Jul 24 '12 12:07

ritter


1 Answers

Nothing to do with scratch space, it is the result of the addressing system that allows unified andressing and peer to peer access between host and multiple GPUs. The CUDA driver registers all the GPU(s) memory + host memory in a single virtual address space using the kernel's virtual memory system. It isn't actually memory consumption, per se, it is just a "trick" to map all the available address spaces into a linear virtual space for unified addressing.

like image 75
talonmies Avatar answered Oct 24 '22 21:10

talonmies