Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What CUDA shared memory size means

I am trying to solve this problem myself but I can't. So I want to get yours advice.

I am writing kernel code like this. VGA is GTX 580.

xxxx <<< blockNum, threadNum, SharedSize >>> (... threadNum ...)
(note. SharedSize is set 2*threadNum)

__global__ void xxxx(..., int threadNum, ...)
{
    extern __shared__ int shared[];
    int* sub_arr = &shared[0];
    int* sub_numCounting = &shared[threadNum];
    ...
}

My program creates about 1085 blocks and 1024 threads per block.

(I am trying to handle huge size of array)

So size of shared memory per block is 8192(1024*2*4)bytes, right?

I figure out I can use maximum 49152bytes in shared memory per block on GTX 580 by using cudaDeviceProp.

And I know GTX 580 has 16 processors, thread block can be implemented on processor.

But my program occurs error.(8192bytes < 49152bytes)

I use "printf" in kernel to see whether well operates or not but several blocks not operates. (Although I create 1085blocks, actually only 50~100 blocks operates.)

And I want to know whether blocks which operated on same processor share same shared memory address or not. ( If not, allocates other memory for shared memory? )

I can't certainly understand what maximum size of shared memory per block means.

Give me advice.

like image 856
Umbrella Avatar asked Dec 09 '22 22:12

Umbrella


1 Answers

Yes, blocks on the same multiprocessor shared the same amount of shared memory, which is 48KB per multiprocessor for your GPU card (compute capability 2.0). So if you have N blocks on the same multiprocessor, the maximum size of shared memory per block is (48/N) KB.

like image 108
chaohuang Avatar answered Mar 06 '23 18:03

chaohuang