Is it possible to dynamically allocate memory on a GPU's Global memory inside the Kernel?
i don't know how big will my answer be, therefore i need a way to allocate memory for each part of the answer. CUDA 4.0 alloww us to use the RAM... is it a good idea or will it reduce the speed??
Dynamic memory allocation is the process of assigning the memory space during the execution time or the run time. Reasons and Advantage of allocating memory dynamically: When we do not know how much amount of memory would be needed for the program beforehand.
Dynamic memory allocation and deallocation are very slow operations when compared to automatic memory allocation and deallocation. In other words, the heap is much slower than the stack.
In this memory allocation scheme, execution is faster than dynamic memory allocation. In this memory allocation scheme, execution is slower than static memory allocation. In this memory is allocated at compile time.
So one of the reasons why allocators are slow is that the allocation algorithm needs some time to find an available block of a given size. But that is not all. As time progresses it gets harder and harder to find the block of the appropriate size. The reason for this is called memory fragmentation.
it is possible to use malloc inside a kernel. check the following which is taken from nvidia cuda guide:
__global__ void mallocTest()
{
char* ptr = (char*)malloc(123);
printf(“Thread %d got pointer: %p\n”, threadIdx.x, ptr);
free(ptr);
}
void main()
{
cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);
mallocTest<<<1, 5>>>();
cudaThreadSynchronize();
}
will output:
Thread 0 got pointer: 00057020
Thread 1 got pointer: 0005708c
Thread 2 got pointer: 000570f8
Thread 3 got pointer: 00057164
From CUDA 4.0 you will be able to use new
and delete
operators from c++ instead of malloc
and free
from c.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With