Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA shared memory occupy twice the space than needed

I just noticed that my CUDA kernel uses exactly twice the space than that calculated by 'theory'. e.g.

__global__ void foo( )
{
    __shared__ double t;
    t = 1;
}

PTX info shows:
ptxas info : Function properties for _Z3foov, 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 16 bytes smem, 32 bytes cmem[0]

But the size of a double is only 8.

More example:

__global__ void foo( )
{
    __shared__ int t[1024];
    t[0] = 1;
}

ptxas info : Used 3 registers, 8192 bytes smem, 32 bytes cmem[0]

Could someone explain why?

like image 271
Rainn Avatar asked Nov 12 '22 18:11

Rainn


1 Answers

Seems that the problem has gone in the current CUDA compiler.

__shared__ int a[1024];

compiled with command 'nvcc -m64 -Xptxas -v -ccbin /opt/gcc-4.6.3/bin/g++-4.6.3 shmem.cu' gives

ptxas info    : Used 1 registers, 4112 bytes smem, 4 bytes cmem[1]

There are some shared memory overhead in this case, but the usage is not doubled.

like image 109
Rainn Avatar answered Jan 04 '23 01:01

Rainn