What is a transaction and a request in the 'gld_transactions_per_request' metric of the Cuda profiler?

Question

For a perfectly coalesced accesses to an array of 4096 doubles, each 8 bytes, nvprof reports the following metrics on a Nvidia Tesla V100:

global_load_requests: 128
gld_transactions: 1024
gld_transactions_per_request: 8.000000

I cannot find a specific definition of what a transaction and a request to global memory are exactly, so I am having trouble understanding these metrics. Therefore my questions:

How is a memory request defined?
How is a memory transaction defined?
Does gld_transactions_per_request = 8.00000 indicate perfectly coalesced accesses to doubles?

In an attempt to explain it to myself, this what I have come up with:

Request: a load on the warp-level, i.e. one warp-level instruction merged from 32 threads. In this scenario a 32 threads * 8 bytes = 256 byte load. -- Is this correct?
Transaction: a 32 byte load instruction. In this scenario one transaction defined in this way is able to load 32 bytes / 8 bytes = 4 doubles. -- Is this correct? If so, is this the largest load instruction Cuda implements?

Using these definitions, I arrive at the same values as nvprof: Accessing 4096 array items requires 128 warp-level instructions (=requests) with 32 threads each. Using 32 byte loads (=transactions) results in the 1024 transactions.

s.feng · Accepted Answer

A memory "request" is an instruction which accesses memory, and a "transaction" is the movement of a unit of data between two regions of memory.

What is a transaction and a request in the 'gld_transactions_per_request' metric of the Cuda profiler?

Tags:

cuda

nvprof

anroesti

1 Answers

s.feng

Recent Activity

Donate For Us

What is a transaction and a request in the 'gld_transactions_per_request' metric of the Cuda profiler?

Tags:

cuda

nvprof

anroesti

1 Answers

s.feng

Related questions

Recent Activity

Donate For Us