The number of coalesced and uncoalesced memory transactions in gpu compute 1.3

Question

The cuda profiler manual states that due to the more relaxed coalescing policy, the number of uncoalesced memory transactions will always be zero. But I'm sure that there are still uncoalescing. How to calculate it? Is there any tools or simulator around that can help? Among them, which one seems to be the most accurate? Thanks

CygnusX1 · Accepted Answer

In devices 1.0, you had only two options:

Memory access is coalesced and all data is fetched in one memory transaction
Memory access is uncoalesced and data is fetched one-by-one - hence, always 16 memory transactions (half-warp).

In devices 1.2 and 1.3 however this is done differently. Imagine your device memory divided into chunks of 128 bytes each. You need as many memory transactions as the number of chunks you hit. So:

if you get perfectly coalesced access, you get 1 memory transaction
if you just misalign you may get 2 memory transactions
if every thread access every n-th word, you can get 3, 4, or even more memory transactions
in worst case you can get 16 memory transactions
but even if access is somewhat random, but localised, two threads may happen to fall into the same chunk and you will need less than 16 memory transactions

There are so many cases, so putting it into just 2 categories: coalesced/uncoalesced does not make any sense anymore. That is why, the Cuda Profiler went a different way. They simply count the number of memory transactions. The more random your access pattern is, the higher memory transaction count, even if you have the same count of memory access instructions.

The above is slightly simplified model. In reality, memory transaction can access 128-byte, 64-byte or 32-byte wide chunk - to save up bandwidth. Look for columns load 128b, load 64b, load 32b, and store 128b, store 64b, store 32b in your profiler.

The number of coalesced and uncoalesced memory transactions in gpu compute 1.3

Tags:

cuda

gpgpu

gpu

opencl

Zk1001

1 Answers

CygnusX1

Recent Activity

Donate For Us

The number of coalesced and uncoalesced memory transactions in gpu compute 1.3

Tags:

cuda

gpgpu

gpu

opencl

Zk1001

1 Answers

CygnusX1

Related questions

Recent Activity

Donate For Us