Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The number of coalesced and uncoalesced memory transactions in gpu compute 1.3

The cuda profiler manual states that due to the more relaxed coalescing policy, the number of uncoalesced memory transactions will always be zero. But I'm sure that there are still uncoalescing. How to calculate it? Is there any tools or simulator around that can help? Among them, which one seems to be the most accurate? Thanks

like image 464
Zk1001 Avatar asked Dec 21 '22 00:12

Zk1001


1 Answers

In devices 1.0, you had only two options:

  • Memory access is coalesced and all data is fetched in one memory transaction
  • Memory access is uncoalesced and data is fetched one-by-one - hence, always 16 memory transactions (half-warp).

In devices 1.2 and 1.3 however this is done differently. Imagine your device memory divided into chunks of 128 bytes each. You need as many memory transactions as the number of chunks you hit. So:

  • if you get perfectly coalesced access, you get 1 memory transaction
  • if you just misalign you may get 2 memory transactions
  • if every thread access every n-th word, you can get 3, 4, or even more memory transactions
  • in worst case you can get 16 memory transactions
  • but even if access is somewhat random, but localised, two threads may happen to fall into the same chunk and you will need less than 16 memory transactions

There are so many cases, so putting it into just 2 categories: coalesced/uncoalesced does not make any sense anymore. That is why, the Cuda Profiler went a different way. They simply count the number of memory transactions. The more random your access pattern is, the higher memory transaction count, even if you have the same count of memory access instructions.

The above is slightly simplified model. In reality, memory transaction can access 128-byte, 64-byte or 32-byte wide chunk - to save up bandwidth. Look for columns load 128b, load 64b, load 32b, and store 128b, store 64b, store 32b in your profiler.

like image 88
CygnusX1 Avatar answered Dec 29 '22 12:12

CygnusX1