Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA allocate memory in __device__ function

Is there a way in CUDA to allocate memory dynamically in device-side functions ? I could not find any examples of doing this.

From the CUDA C Programming manual:

B.15 Dynamic Global Memory Allocation

void* malloc(size_t size); 
void free(void* ptr); 

allocate and free memory dynamically from a fixed-size heap in global memory.

The CUDA in-kernel malloc() function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request. The returned pointer is guaranteed to be aligned to a 16-byte boundary.

The CUDA in-kernel free() function deallocates the memory pointed to by ptr, which must have been returned by a previous call to malloc(). If ptr is NULL, the call to free() is ignored. Repeated calls to free() with the same ptr has undefined behavior.

The memory allocated by a given CUDA thread via malloc() remains allocated for the lifetime of the CUDA context, or until it is explicitly released by a call to free(). It can be used by any other CUDA threads even from subsequent kernel launches. Any CUDA thread may free memory allocated by another thread, but care should be taken to ensure that the same pointer is not freed more than once.

like image 986
SparcU Avatar asked Jan 17 '11 16:01

SparcU


People also ask

How do I allocate device memory in CUDA?

Memory management on a CUDA device is similar to how it is done in CPU programming. You need to allocate memory space on the host, transfer the data to the device using the built-in API, retrieve the data (transfer the data back to the host), and finally free the allocated memory.

Which function is allocated memory from GPU?

The CUDA in-kernel malloc() function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request.

What is device memory in CUDA?

It is used for storing data that will not change over the course of kernel execution. It supports short-latency, high-bandwidth, read-only access by the device when all threads simultaneously access the same location. There is a total of 64K constant memory on a CUDA capable device.

What does CUDA malloc do?

Definition. cudaMalloc is a function that can be called from the host or the device to allocate memory on the device, much like malloc for the host. The memory allocated with cudaMalloc must be freed with cudaFree.


1 Answers

According to http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf you should be able to use malloc() and free() in a device function.

Page 122

B.15 Dynamic Global Memory Allocation void* malloc(size_t size); void free(void* ptr); allocate and free memory dynamically from a fixed-size heap in global memory.

The example given in the manual.

__global__ void mallocTest()
{
    char* ptr = (char*)malloc(123);
    printf(“Thread %d got pointer: %p\n”, threadIdx.x, ptr);
    free(ptr);
}

void main()
{
    // Set a heap size of 128 megabytes. Note that this must
    // be done before any kernel is launched.
    cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);
    mallocTest<<<1, 5>>>();
    cudaThreadSynchronize();
}

You need the compiler paramter -arch=sm_20 and a card that supports >2x architecture.

like image 78
Nate Avatar answered Oct 22 '22 06:10

Nate