Is there a way in CUDA to allocate memory dynamically in device-side functions ? I could not find any examples of doing this.
From the CUDA C Programming manual:
B.15 Dynamic Global Memory Allocation
void* malloc(size_t size); void free(void* ptr);
allocate and free memory dynamically from a fixed-size heap in global memory.
The CUDA in-kernel
malloc()
function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request. The returned pointer is guaranteed to be aligned to a 16-byte boundary.The CUDA in-kernel
free()
function deallocates the memory pointed to byptr
, which must have been returned by a previous call tomalloc()
. Ifptr
isNULL
, the call to free() is ignored. Repeated calls to free() with the same ptr has undefined behavior.The memory allocated by a given CUDA thread via
malloc()
remains allocated for the lifetime of the CUDA context, or until it is explicitly released by a call tofree()
. It can be used by any other CUDA threads even from subsequent kernel launches. Any CUDA thread may free memory allocated by another thread, but care should be taken to ensure that the same pointer is not freed more than once.
Memory management on a CUDA device is similar to how it is done in CPU programming. You need to allocate memory space on the host, transfer the data to the device using the built-in API, retrieve the data (transfer the data back to the host), and finally free the allocated memory.
The CUDA in-kernel malloc() function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request.
It is used for storing data that will not change over the course of kernel execution. It supports short-latency, high-bandwidth, read-only access by the device when all threads simultaneously access the same location. There is a total of 64K constant memory on a CUDA capable device.
Definition. cudaMalloc is a function that can be called from the host or the device to allocate memory on the device, much like malloc for the host. The memory allocated with cudaMalloc must be freed with cudaFree.
According to http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf you should be able to use malloc() and free() in a device function.
Page 122
B.15 Dynamic Global Memory Allocation void* malloc(size_t size); void free(void* ptr); allocate and free memory dynamically from a fixed-size heap in global memory.
The example given in the manual.
__global__ void mallocTest()
{
char* ptr = (char*)malloc(123);
printf(“Thread %d got pointer: %p\n”, threadIdx.x, ptr);
free(ptr);
}
void main()
{
// Set a heap size of 128 megabytes. Note that this must
// be done before any kernel is launched.
cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);
mallocTest<<<1, 5>>>();
cudaThreadSynchronize();
}
You need the compiler paramter -arch=sm_20 and a card that supports >2x architecture.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With