When we use CUDA
profiler nvvp
, there are several "overhead"s correlated with instructions, for example:
My Questions are:
Attachment: I've found all the formulas computing these overheads in the 'CUDA Profiler Users Guide' packed in CUDA5 toolkit.
You can find some of the answers to your question here:
Why does CUDA Profiler indicate replayed instructions: 82% != global replay + local replay + shared replay?
Replayed Instructions (%) This gives the percentage of instructions replayed during kernel execution. Replayed instructions are the difference between the numbers of instructions that are actually issued by the hardware to the number of instructions that are to be executed by the kernel. Ideally this should be zero. This is calculated as 100 * (instructions issued - instruction executed) / instruction issued
Global memory replay (%) Percentage of replayed instructions caused due to global memory accesses. This is calculated as 100 * (l1 global load miss) / instructions issued
Local memory replay (%) Percentage of replayed instructions caused due to local memory accesses. This is calculated as 100 * (l1 local load miss + l1 local store miss) / instructions issued
Shared bank conflict replay (%) Percentage of replayed instructions caused due to shared memory bank conflicts. This is calculated as 100 * (l1 shared conflict)/ instructions issued
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With