Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is CUDA memory managed?

Tags:

cuda

gpu

nvidia

When I run my CUDA program which allocates only a small amount of global memory (below 20 M), I got a "out of memory" error. (From other people's posts, I think the problem is related to memory fragmentation) I try to understand this problem, and realize I have a couple of questions related to CUDA memory management.

  1. Is there a virtual memory concept in CUDA?

  2. If only one kernel is allowed to run on CUDA simultaneously, after its termination, will all of the memory it used or allocated released? If not, when these memory got free released?

  3. If more than one kernel are allowed to run on CUDA, how can they make sure the memory they use do not overlap?

Can anyone help me answer these questions? Thanks

Edit 1: operating system: x86_64 GNU/Linux CUDA version: 4.0 Device: Geforce 200, It is one of the GPUS attached to the machine, and I don't think it is a display device.

Edit 2: The following is what I got after doing some research. Feel free to correct me.

  1. CUDA will create one context for each host thread. This context will keep information such as what portion of memory (pre allocated memory or dynamically allocated memory) has been reserved for this application so that other application can not write to it. When this application terminates (not kernel) , this portion of memory will be released.

  2. CUDA memory is maintained by a link list. When an application needs to allocate memory, it will go through this link list to see if there is continuous memory chunk available for allocation. If it fails to find such a chunk, a "out of memory" error will report to the users even though the total available memory size is greater than the requested memory. And that is the problem related to memory fragmentation.

  3. cuMemGetInfo will tell you how much memory is free, but not necessarily how much memory you can allocate in a maximum allocation due to memory fragmentation.

  4. On Vista platform (WDDM), GPU memory virtualization is possible. That is, multiple applications can allocate almost the whole GPU memory and WDDM will manage swapping data back to main memory.

New questions: 1. If the memory reserved in the context will be fully released after the application has been terminated, memory fragmentation should not exist. There must be some kind of data left in the memory. 2. Is there any way to restructure the GPU memory ?

like image 373
xhe8 Avatar asked Dec 30 '11 22:12

xhe8


People also ask

What memory system is used in CUDA?

CUDA also uses an abstract memory type called local memory. Local memory is not a separate memory system per se but rather a memory location used to hold spilled registers. Register spilling occurs when a thread block requires more register storage than is available on an SM.

Is CUDA shared memory?

Shared memory is a powerful feature for writing well optimized CUDA code. Access to shared memory is much faster than global memory access because it is located on chip. Because shared memory is shared by threads in a thread block, it provides a mechanism for threads to cooperate.

Which functions are used in CUDA to allocate de allocate and free the memory in CUDA?

allocate and free memory dynamically from a fixed-size heap in global memory. The CUDA in-kernel malloc() function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request.

Does CUDA use C or C++?

CUDA C is essentially C/C++ with a few extensions that allow one to execute functions on the GPU using many threads in parallel.


2 Answers

The device memory available to your code at runtime is basically calculated as

Free memory =   total memory                - display driver reservations                - CUDA driver reservations               - CUDA context static allocations (local memory, constant memory, device code)               - CUDA context runtime heap (in kernel allocations, recursive call stack, printf buffer, only on Fermi and newer GPUs)               - CUDA context user allocations (global memory, textures) 

if you are getting an out of memory message, then it is likely that one or more of the first three items is consuming most of the GPU memory before your user code ever tries to get memory in the GPU. If, as you have indicated, you are not running on a display GPU, then the context static allocations are the most likely source of your problem. CUDA works by pre-allocating all the memory a context requires at the time the context is established on the device. There are a lot of things which get allocated to support a context, but the single biggest consumer in a context is local memory. The runtime must reserve the maximum amount of local memory which any kernel in a context will consume for the maximum number of threads which each multiprocessor can run simultaneously, for each multiprocess on the device. This can run into hundreds of Mb of memory if a local memory heavy kernel is loaded on a device with a lot of multiprocessors.

The best way to see what might be going on is to write a host program with no device code which establishes a context and calls cudaMemGetInfo. That will show you how much memory the device has with the minimal context overhead on it. Then run you problematic code, adding the same cudaMemGetInfo call before the first cudaMalloc call that will then give you the amount of memory your context is using. That might let you get a handle of where the memory is going. It is very unlikely that fragmentation is the problem if you are getting failure on the first cudaMalloc call.

like image 85
talonmies Avatar answered Oct 07 '22 18:10

talonmies


  1. GPU off-chip memory is separated in global, local and constant memory. This three memory types are a virtual memory concept. Global memory is free for all threads, local is just for one thread only (mostly used for register spilling) and constant memory is cached global memory (writable only from host code). Have a look at 5.3.2 from the CUDA C Programming Guide.

  2. EDIT: removed

  3. Memory allocated via cudaMalloc does never overlap. For the memory a kernel allocates during runtime should be enough memory available. If you are out of memory and try to start a kernel (only a guess from me) you should get the "unknown error" error message. The driver than was unable to start and/or executes the kernel.

like image 26
Michael Haidl Avatar answered Oct 07 '22 18:10

Michael Haidl