Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between a bank conflict and channel conflict on AMD hardware?

Tags:

amd

opencl

I am learning OpenCL programming and running some programs on AMD GPU. I referred the AMD OpenCL Programming guide to read about global memory optimization for GCN Architecture. I am not able to understand the difference between a bank conflict and a channel conflict.

Can someone explain me what is the difference between them? Thanks in advance.

like image 614
Manideep Avatar asked Jul 25 '15 17:07

Manideep


1 Answers

If two memory access requests are directed to the same controller, the hardware serializes the access. This is called a channel conflict. Which means, each of integrated memory controller circuits can serve to a single task at a time, if you happen to map any two tasks' address to access to same channel, they are served serially.

Similarly, if two memory access requests go to the same memory bank, hardware serializes the access. This is called a bank conflict. If there are multiple memory chips, then you should avoid using a stride of the special width of the hardware.

Example with 4 channels and 2 banks: (not a real world example since banks must be more than or equal to channels)

address   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17
channel   1  2  3  4  1  2  3  4  1  2   3   4   1   2   3   4   1
bank      1  2  1  2  1  2  1  2  1  2   1   2   1   2   1   2   1

so you should not read like this:

   address    1  3  5  7   9
   channel    1  3  1  3   1  // %50 channel conflict
   bank       1  1  1  1   1  //%100 bank conflict,serialized on bank level

nor this:

   address    1    5     9    13
   channel    1    1     1    1     // %100 channel conflict, serialized
   bank       1    1     1    1     // %100 bank conflict, serialized

but this could be ok:

   address    1    6     11    16
   channel    1    2     3     4   // no conflict, %100 channel usage
   bank       1    2     1     2   // no conflict, %100 bank usage

because the stride is not a multiple of channel nor bank widths.

Edit: if your algorithms are more of a local-storage optimized, then you should pay attention to local data store channel conflicts. On top of this, some cards can use constant memory as an independent channel source to speed up reading rates.

Edit: You can use multiple wavefronts to hide conflict-based latencies or you can use instruction level parallelism too.

Edit: Number of local data store channels are much faster and more numerous than global channels so optimizing for LDS (local data share) is very important so uniform-gathering on global channels then scattering on local channels shouldn't be as problematic as scattering on global channels and uniform-gathering on local channels.

http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-472173

For an AMD APU with a decent mainboard, you should be able to select an n-way channel interleaving or n-way bank interleaving for your desire if your software is not alterable.

like image 53
huseyin tugrul buyukisik Avatar answered Nov 05 '22 21:11

huseyin tugrul buyukisik