Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA: stack and heap

Tags:

c++

c

cuda

As in title, can someone make sense for me more about heap and stack in CUDA? Does it have any different with original heap and stack in CPU memory?

I got a problem when I increase stack size in CUDA, it seem to have its limitation, because when I set stack size over 1024*300 (Tesla M2090) by cudaDeviceSetLimit, I got an error: argument invalid.

Another problem I want to ask is: when I set heap size to very large number (about 2GB) to allocate my RTree (data structure) with 2000 elements, I got an error in runtime: too many resources requested to launch

Any idea?

P/s: I launch with only single thread (kernel<<<1,1>>>)

like image 523
Hoang Thong Avatar asked Dec 19 '22 20:12

Hoang Thong


1 Answers

About stack and heap

Stack is allocated per thread and has an hardware limit (see below). Heap reside in global memory, can be allocated using malloc() and must be explicitly freed using free() (CUDA doc).

This device functions:

void* malloc(size_t size);
void free(void* ptr);

can be useful but I would recommend to use them only when they are really needed. It would be a better approach to rethink the code to allocate the memory using the host-side functions (as cudaMalloc).


The stack size has an hardware limit which can be computed (according to this answer by @njuffa) by the minimum of:

  • amount of local memory per thread
  • available GPU memory / number of SMs / maximum resident threads per SM

As you are increasing the size, and you are running only one thread, I guess your problem is the second limit, which in your case (TESLA M2090) should be: 6144/16/512 = 750KB.


The heap has a fixed size (default 8MB) that must be specified before any call to malloc() by using the function cudaDeviceSetLimit. Be aware that the memory allocated will be at least the size requested due to some allocation overhead. Also it is worth mentioning that the memory limit is not per-thread but instead has the lifetime of the CUDA context (until released by a call to free()) and can be used by thread in a subsequent kernel launch.

Related posts on stack: ... stack frame for kernels, ... local memory per cuda thread

Related posts on heap: ... heap memory ..., ... heap memory limitations per thread

like image 191
terence hill Avatar answered Jan 03 '23 01:01

terence hill