What is the difference between dram_read_transactions and gld_transactions in CUDA profiler?

Question

In CUDA profiler, there are two metrics called dram_read_transactions and gld_transactions. The cuda profiler user guide says "gld_transactions" means the number of global memory load transactions, while "dram_read_transactions" means device memory read transactions. I cannot tell the difference between these descriptions because reading data means loading data and global memory is dram. But the profiling results of these two metrics are different. I tested with one kernel. For the same kernel with different threads settings, the gld_transactions is always the same value 33554432. And this value is stable. But for dram_read_transactions, two different threads settings lead to different values, they are roughly 4486636 and 4197096. For the word "roughly" I mean these values are not stable because they slightly change from one execution to another. We can also see the dram_transactions is much less than gld_transactions. So my questions can be summarized here:

What is the real difference between gld_transactions and dram_read_transactions?
Why the dram_read_transactions is much smaller than gld_transactions?
For different threads settings, why the gld_transactions value is stable while dram_read_transactions is unstable?

I think once we know the answer for question (1), then questions (2) and (3) can be easily explained. So can anyone explain this? Thanks in advance.

What is the real difference between gld_transactions and dram_read_transactions?
Why the dram_read_transactions is much smaller than gld_transactions?
For different threads settings, why the gld_transactions value is stable while dram_read_transactions is unstable?

I think once we know the answer for question (1), then questions (2) and (3) can be easily explained. So can anyone explain this? Thanks in advance.

Robert Crovella · Accepted Answer

A global load refers to a logical memory space. A dram read refers to a transaction on a physical resource. This statement of yours:

reading data means loading data and global memory is dram.

is either incorrect or glossing over important details.

Fundamentally, global loads are issued by instructions executed by a warp. The initial target of these loads will be L1 or L2 cache (usually). A global load, if satisfied by cache contents, will never become a dram read transaction. On the other hand, if the target of the global load is not in a cache, then it will become a dram read transaction (typically/usually).

Furthermore, the global memory space is not the only memory space. There are other memory spaces, such as local. Transactions to "local" memory can also ultimately be serviced in a variety of ways, one of which would be actually triggering a dram read. Such a transaction would not show up in any "global" metric but would show up in the dram read transaction metric.

I find this diagram/chart in the nsight VSE documentation (and tool help), of the logical and physical arrangement of memory on a GPU to be helpful in inderstanding this. I have excerpted the chart here, and highlighted in red the "links" that correspond to the metrics you identified:

GPU logical/physical memory diagram showing two different transaction types

This answer gives a more detailed decoding of the above diagram, for relevant metrics.

What is the difference between dram_read_transactions and gld_transactions in CUDA profiler?

Tags:

cuda

silence_lamb

1 Answers

Robert Crovella

Recent Activity

Donate For Us

What is the difference between dram_read_transactions and gld_transactions in CUDA profiler?

Tags:

cuda

silence_lamb

1 Answers

Robert Crovella

Related questions

Recent Activity

Donate For Us