Here is an image taken from the CUDA C Programming Guide:
The guide says that this is an example of a Conflict-free access since threads 3, 4, 6, 7 and 9 access the same word within bank 5.
I don't quite understand why is this conflict-free, since not only threads 3, 4, 6, 7 and 9 access the same work within same bank (shouldn't that be an example of memory conflict?) but also thread 5 has to access bank 4.
Could you please explain to me this case?
The shared memory that can be accessed in parallel is divided into modules (also called banks). If two memory locations (addresses) occur in the same bank, then you get a bank conflict during which the access is done serially, losing the advantages of parallel access.
Bank conflict happens when some processing units in GPU access the same shared memory. When bank conflict happens, the program instruction executed in parallel is executed in sequential. As a result, it makes the performance lower.
Note that a bank is not the same thing as a word or location in shared memory. A bank refers collectively to all words in shared memory that satisfy a certain address pattern condition.
In general, shared memory bank conflicts can be avoided if all accesses from a warp (or half-warp in cc 1.x) go to separate banks. These accesses need not be in warp order, i.e. they can be scrambled, as long as the request from each thread targets a separate bank.
The above description covers every arrow in your diagram except those arrows pointing to bank 5.
If we had no other information, then multiple arrows targetting a single bank would indicate a potential bank conflict.
However, there is an exception, when not only are the accesses targetting the same bank, but they are targetting the same word in memory. When multiple shared memory requests target the same word in memory, then the shared memory system has a broadcast mechanism to take the data contained in that word, and service it to all the requesting threads, in a single cycle.
From the documentation(http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory-1-x):
Shared memory features a broadcast mechanism whereby a 32-bit word can be read and broadcast to several threads simultaneously when servicing one memory read request. This reduces the number of bank conflicts when several threads read from an address within the same 32-bit word.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With