Dynamic Allocating memory on GPU

Tags:

cuda

Is it possible to dynamically allocate memory on a GPU's Global memory inside the Kernel?
i don't know how big will my answer be, therefore i need a way to allocate memory for each part of the answer. CUDA 4.0 alloww us to use the RAM... is it a good idea or will it reduce the speed??

342

asked Mar 09 '11 16:03

linda

2 Answers

it is possible to use malloc inside a kernel. check the following which is taken from nvidia cuda guide:

__global__ void mallocTest() 
{ 
  char* ptr = (char*)malloc(123); 
  printf(“Thread %d got pointer: %p\n”, threadIdx.x, ptr); 
  free(ptr); 
} 
void main() 
{ 
  cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024); 
  mallocTest<<<1, 5>>>(); 
  cudaThreadSynchronize(); 
} 

will output: 
Thread 0 got pointer: 00057020 
Thread 1 got pointer: 0005708c 
Thread 2 got pointer: 000570f8 
Thread 3 got pointer: 00057164

answered Oct 14 '22 01:10

scatman

From CUDA 4.0 you will be able to use new and delete operators from c++ instead of malloc and free from c.

answered Oct 14 '22 01:10

kokosing

Related questions
                            
                                Why am I getting "nvcc fatal : redefinition of argument 'optimize'"?
                            
                                cufft.lib for win32 is missing
                            
                                Could not locate deviceQuery on my installation Cuda toolkit v7.5 on Windows 10
                            
                                How to Get CUDA Toolkit Version at Compile Time Without nvcc?
                            
                                How to remove all PTX from compiled CUDA to prevent Intellectual Property leaks
                            
                                How to convert CUDA clock cycles to milliseconds?
                            
                                CUDA Device To Device transfer expensive
                            
                                CUDA streams and context
                            
                                Is there a good way use a read only hashmap on cuda?
                            
                                Dealing with large switch statements in CUDA
                            
                                Multi-GPU profiling (Several CPUs , MPI/CUDA Hybrid)
                            
                                How many grids in CUDA
                            
                                GTX 680 , Keplers and maximum registers per thread
                            
                                Scaling in inverse FFT by cuFFT
                            
                                CUDA pow function with integer arguments
                            
                                QR decomposition to solve linear systems in CUDA
                            
                                task scheduling of NVIDIA GPU
                            
                                grid_group not found in CUDA 9
                            
                                Can't find CUDA_INCLUDE_DIRS in latest CMAKE [duplicate]
                            
                                Type Qualifiers for a device class in CUDA

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With