CUDA shared memory occupy twice the space than needed

Question

I just noticed that my CUDA kernel uses exactly twice the space than that calculated by 'theory'. e.g.

__global__ void foo( )
{
    __shared__ double t;
    t = 1;
}

PTX info shows:
ptxas info : Function properties for _Z3foov, 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 16 bytes smem, 32 bytes cmem[0]

But the size of a double is only 8.

More example:

__global__ void foo( )
{
    __shared__ int t[1024];
    t[0] = 1;
}

ptxas info : Used 3 registers, 8192 bytes smem, 32 bytes cmem[0]

Could someone explain why?

Rainn · Accepted Answer

Seems that the problem has gone in the current CUDA compiler.

__shared__ int a[1024];

compiled with command 'nvcc -m64 -Xptxas -v -ccbin /opt/gcc-4.6.3/bin/g++-4.6.3 shmem.cu' gives

ptxas info    : Used 1 registers, 4112 bytes smem, 4 bytes cmem[1]

There are some shared memory overhead in this case, but the usage is not doubled.

CUDA shared memory occupy twice the space than needed

Tags:

cuda

shared-memory

Rainn

1 Answers

Rainn

Recent Activity

Donate For Us

CUDA shared memory occupy twice the space than needed

Tags:

cuda

shared-memory

Rainn

1 Answers

Rainn

Related questions

Recent Activity

Donate For Us