Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this a conflict-free memory bank access?

Tags:

memory

cuda

gpgpu

Here is an image taken from the CUDA C Programming Guide:

enter image description here

The guide says that this is an example of a Conflict-free access since threads 3, 4, 6, 7 and 9 access the same word within bank 5.

I don't quite understand why is this conflict-free, since not only threads 3, 4, 6, 7 and 9 access the same work within same bank (shouldn't that be an example of memory conflict?) but also thread 5 has to access bank 4.

Could you please explain to me this case?

like image 730
syntagma Avatar asked Mar 19 '14 17:03

syntagma


People also ask

What is memory bank conflict?

The shared memory that can be accessed in parallel is divided into modules (also called banks). If two memory locations (addresses) occur in the same bank, then you get a bank conflict during which the access is done serially, losing the advantages of parallel access.

What are bank conflicts Cuda?

Bank conflict happens when some processing units in GPU access the same shared memory. When bank conflict happens, the program instruction executed in parallel is executed in sequential. As a result, it makes the performance lower.


1 Answers

Note that a bank is not the same thing as a word or location in shared memory. A bank refers collectively to all words in shared memory that satisfy a certain address pattern condition.

In general, shared memory bank conflicts can be avoided if all accesses from a warp (or half-warp in cc 1.x) go to separate banks. These accesses need not be in warp order, i.e. they can be scrambled, as long as the request from each thread targets a separate bank.

The above description covers every arrow in your diagram except those arrows pointing to bank 5.

If we had no other information, then multiple arrows targetting a single bank would indicate a potential bank conflict.

However, there is an exception, when not only are the accesses targetting the same bank, but they are targetting the same word in memory. When multiple shared memory requests target the same word in memory, then the shared memory system has a broadcast mechanism to take the data contained in that word, and service it to all the requesting threads, in a single cycle.

From the documentation(http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory-1-x):

Shared memory features a broadcast mechanism whereby a 32-bit word can be read and broadcast to several threads simultaneously when servicing one memory read request. This reduces the number of bank conflicts when several threads read from an address within the same 32-bit word.

like image 122
Robert Crovella Avatar answered Sep 20 '22 03:09

Robert Crovella