Is there a way in CUDA to allocate memory dynamically in device-side functions ? I could not find any examples of doing this. From the CUDA C Programming manual: <blockquote> B.15 Dynamic Global Memory Allocation <pre class="prettyprint"><code>void* malloc(size_t size); void free(void* ptr); </code></pre> allocate and free memory dynamically from a fixed-size heap in global memory. The CUDA in-kernel <code>malloc()</code> function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request. The returned pointer is guaranteed to be aligned to a 16-byte boundary. The CUDA in-kernel <code>free()</code> function deallocates the memory pointed to by <code>ptr</code>, which must have been returned by a previous call to <code>malloc()</code>. If <code>ptr</code> is <code>NULL</code>, the call to free() is ignored. Repeated calls to free() with the same ptr has undefined behavior. The memory allocated by a given CUDA thread via <code>malloc()</code> remains allocated for the lifetime of the CUDA context, or until it is explicitly released by a call to <code>free()</code>. It can be used by any other CUDA threads even from subsequent kernel launches. Any CUDA thread may free memory allocated by another thread, but care should be taken to ensure that the same pointer is not freed more than once. </blockquote>

According to http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf you should be able to use malloc() and free() in a device function. Page 122 B.15 Dynamic Global Memory Allocation void* malloc(size_t size); void free(void* ptr); allocate and free memory dynamically from a fixed-size heap in global memory. The example given in the manual. <pre class="prettyprint"><code>__global__ void mallocTest() { char* ptr = (char*)malloc(123); printf(“Thread %d got pointer: %p\n”, threadIdx.x, ptr); free(ptr); } void main() { // Set a heap size of 128 megabytes. Note that this must // be done before any kernel is launched. cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024); mallocTest<<<1, 5>>>(); cudaThreadSynchronize(); } </code></pre> You need the compiler paramter -arch=sm_20 and a card that supports >2x architecture.

CUDA allocate memory in device function

Tags:

memory-management

dynamic-memory-allocation

cuda

Is there a way in CUDA to allocate memory dynamically in device-side functions ? I could not find any examples of doing this.

From the CUDA C Programming manual:

B.15 Dynamic Global Memory Allocation
void* malloc(size_t size); 
void free(void* ptr); 
allocate and free memory dynamically from a fixed-size heap in global memory.

The CUDA in-kernel malloc() function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request. The returned pointer is guaranteed to be aligned to a 16-byte boundary.

The CUDA in-kernel free() function deallocates the memory pointed to by ptr, which must have been returned by a previous call to malloc(). If ptr is NULL, the call to free() is ignored. Repeated calls to free() with the same ptr has undefined behavior.

The memory allocated by a given CUDA thread via malloc() remains allocated for the lifetime of the CUDA context, or until it is explicitly released by a call to free(). It can be used by any other CUDA threads even from subsequent kernel launches. Any CUDA thread may free memory allocated by another thread, but care should be taken to ensure that the same pointer is not freed more than once.

986

asked Jan 17 '11 16:01

SparcU

1 Answers

According to http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf you should be able to use malloc() and free() in a device function.

Page 122

B.15 Dynamic Global Memory Allocation void* malloc(size_t size); void free(void* ptr); allocate and free memory dynamically from a fixed-size heap in global memory.

The example given in the manual.

__global__ void mallocTest()
{
    char* ptr = (char*)malloc(123);
    printf(“Thread %d got pointer: %p\n”, threadIdx.x, ptr);
    free(ptr);
}

void main()
{
    // Set a heap size of 128 megabytes. Note that this must
    // be done before any kernel is launched.
    cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);
    mallocTest<<<1, 5>>>();
    cudaThreadSynchronize();
}

You need the compiler paramter -arch=sm_20 and a card that supports >2x architecture.

answered Oct 22 '22 06:10

Nate

Related questions
                            
                                Postgres why is swap-usage growing? How to reduce it? - AWS RDS
                            
                                GetRef's memory consumption (garbage collection) changed with KB4525236
                            
                                Can a TLB hit lead to page fault in memory?
                            
                                Explain this memory consumption pattern in Amazon RDS/Mysql?
                            
                                Memory-efficient custom deleter for std::unique_ptr?
                            
                                Do I need to release the COM object on every 'foreach' iteration?
                            
                                Memory allocator with custom pointer type
                            
                                How can I polymorphically store and access different types from the same inheritance hierarchy in contiguous memory?
                            
                                ARC - alloc into a strong property. Creates leak?
                            
                                Can we unit test memory allocation?
                            
                                In managed code, how do I achieve good locality of reference?
                            
                                Determining page numbers and offsets for given addresses
                            
                                Strange Bitmap using 1 Mb of Heap
                            
                                Making deep copy of UIImage
                            
                                Memory-aware LRU caching in Python?
                            
                                Does Ruby 2.2 Have Memory Issues on Heroku?
                            
                                Alternative for Garbage Collector
                            
                                C++: struct and new keyword
                            
                                Memory leak in Go http standard library?
                            
                                Release in viewDidUnload and dealloc both?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CUDA allocate memory in device function

Tags:

memory-management

dynamic-memory-allocation

cuda

SparcU

People also ask

1 Answers

Nate

Recent Activity

Donate For Us

CUDA allocate memory in __device__ function

Tags:

memory-management

dynamic-memory-allocation

cuda

SparcU

People also ask

1 Answers

Nate

Related questions

Recent Activity

Donate For Us

CUDA allocate memory in device function