Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a transaction and a request in the 'gld_transactions_per_request' metric of the Cuda profiler?

Tags:

cuda

nvprof

For a perfectly coalesced accesses to an array of 4096 doubles, each 8 bytes, nvprof reports the following metrics on a Nvidia Tesla V100:

global_load_requests: 128
gld_transactions: 1024
gld_transactions_per_request: 8.000000

I cannot find a specific definition of what a transaction and a request to global memory are exactly, so I am having trouble understanding these metrics. Therefore my questions:

  1. How is a memory request defined?
  2. How is a memory transaction defined?
  3. Does gld_transactions_per_request = 8.00000 indicate perfectly coalesced accesses to doubles?

In an attempt to explain it to myself, this what I have come up with:

  • Request: a load on the warp-level, i.e. one warp-level instruction merged from 32 threads. In this scenario a 32 threads * 8 bytes = 256 byte load. -- Is this correct?
  • Transaction: a 32 byte load instruction. In this scenario one transaction defined in this way is able to load 32 bytes / 8 bytes = 4 doubles. -- Is this correct? If so, is this the largest load instruction Cuda implements?

Using these definitions, I arrive at the same values as nvprof: Accessing 4096 array items requires 128 warp-level instructions (=requests) with 32 threads each. Using 32 byte loads (=transactions) results in the 1024 transactions.

like image 389
anroesti Avatar asked Dec 11 '25 22:12

anroesti


1 Answers

A memory "request" is an instruction which accesses memory, and a "transaction" is the movement of a unit of data between two regions of memory.

like image 60
s.feng Avatar answered Dec 14 '25 05:12

s.feng



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!