Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA: bank conflicts between different warps?

I just learned (from Why only one of the warps is executed by a SM in cuda?) that Kepler GPUs can actually execute instructions from several (apparently 4) warps at once.

Can a shared memory bank also serve four requests at once? If not, that would mean that bank conflicts can occur between threads of different warps that happen to be executed concurrently, even though there are no bank conflicts within any of the individual warps, right? Is there any information on this?

like image 713
user3314215 Avatar asked Feb 15 '14 19:02

user3314215


People also ask

What is Bank conflict in shared memory in CUDA?

Let’s start our discussion on bank conflict in shared memory in CUDA. CUDA is the parallel programming model to write general purpose parallel programs that will be executed on the GPU. Bank conflicts in GPUs are specific to shared memory and it is one of the many reasons to slow down the GPU kernel.

What is the size of a bank in a warp?

For devices of compute capability 1.x, the warp size is 32 threads and the number of banks is 16. A shared memory request for a warp is split into one request for the first half of the warp and one request for the second half of the warp. Note that no bank conflict occurs if only one memory location per bank is accessed by a half warp of threads.

What are bank conflicts in GPUs?

Bank conflicts in GPUs are specific to shared memory and it is one of the many reasons to slow down the GPU kernel. Bank conflicts arise because of some specific access pattern of data in shared memory.

What is a bank conflict?

So we have learned that a bank conflict arises if any of the threads in a half warp access different words in the same bank. When a bank conflict happens the access to the data is serialized. So, when bank conflict happens or when not, how we recognize that?


1 Answers

Compute capability 3.x devices (Kepler) have 4 warps schedulers per SM. On each cycle each warp scheduler selects a warp and issues 1-2 instructions from the warp. The SM only has one load store unit (LSU) unit that services L1 and shared memory requests so only 1 of the 8 potential instructions can be dispatched to the LSU so bank conflicts between warps will not occur.

like image 132
Greg Smith Avatar answered Sep 30 '22 13:09

Greg Smith