What is the difference between using a CPU timer and the CUDA timer event to measure the time taken for the execution of some CUDA code?
Which of these should a CUDA programmer use?
And why?
what I know:
CPU timer usage would involve calling cudaThreadSynchronize
before any time is noted.
For noting the time, one of these could be used:
clock()
QueryPerformanceCounter
(on Windows)CUDA timer event would involve recording before and after by using cudaEventRecord
. At a later time, the elapsed time would be obtained by calling cudaEventSynchronize
on the events, followed by cudaEventElapsedTime
to obtain the elapsed time.
The answer to the first part of question is that cudaEvents timers are based off high resolution counters on board the GPU, and they have lower latency and better resolution than using a host timer because they come "off the metal". You should expect sub-microsecond resolution from the cudaEvents timers. You should prefer them for timing GPU operations for precisely that reason. The per-stream nature of cudaEvents can also be useful for instrumenting asynchronous operations like simultaneous kernel execution and overlapped copy and kernel execution. Doing that sort of time measurement is just about impossible using host timers.
EDIT: I won't answer the last paragraph because you deleted it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With