How to define a CUDA shared memory with a size known at run time?

Tags:

gpu-shared-memory

The __shared__ memory in CUDA seems to require a known size at compile time. However, in my problem, the __shared__ memory size is only know at run time, i.e.

int size=get_size();
__shared__ mem[size];

This will end up with "error: constant value is not known", and I'm not sure how to get around this problem.

570

asked Mar 30 '12 02:03

1 Answers

The purpose of shared memory is to allow the threads in a block to collaborate. When you declare an array as __shared__, each thread in the block sees the same memory, so it would not make sense for a given thread to be able to set its own size for an array in shared memory.

However, the special case of dynamically specifying the size of a single __shared__ array that is the same size for all threads IS supported. See allocating shared memory.

If you do need to dynamically allocate memory for each thread, you can use new or malloc inside a kernel (on Fermi), but they allocate global memory, which is likely to be slow.

135

answered Oct 16 '22 03:10

Roger Dahl

Related questions
                            
                                CUDA: illegal combination of memory qualifiers
                            
                                Can I prefetch specific data to a specific cache level in a CUDA kernel?
                            
                                Where does Cuda kernel code reside on nvidia GPU?
                            
                                Best strategy for profiling memory usage of my code (open source) and 3rd party code(closed source)
                            
                                Tracking down cuda kernel register usage
                            
                                CUDA constant memory banks
                            
                                No CUDA-capable device is detected
                            
                                Is it possible to call cufft library calls in device function?
                            
                                Is it possible to have a persistent cuda kernel running and communicating with cpu asynchronously ?
                            
                                Is it possible to emulate a GPU for CUDA/OpenCL unit testing purposes?
                            
                                CUDA C using single precision flop on doubles
                            
                                CUDA without CUDA enabled gpu [duplicate]
                            
                                How is GPU and memory utilization defined in nvidia-smi results?
                            
                                Sparse matrix-vector multiplication in CUDA
                            
                                CUDA: cudaEventElapsedTime returns device not ready error
                            
                                Can using kernel parameters cause bank conflicts? [closed]
                            
                                Creating DLL from CUDA using nvcc
                            
                                Instruction Level Parallelism (ILP) and out-of-order execution on NVIDIA GPUs
                            
                                Constant Memory vs Texture Memory vs Global Memory in CUDA
                            
                                Does cudaFreeHost care what device is active when cudaMallocHost is used to allocate memory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to define a CUDA shared memory with a size known at run time?

Tags:

cuda

gpu-shared-memory

Hailiang Zhang

People also ask

1 Answers

Roger Dahl

Recent Activity

Donate For Us