I am a bit confused about the usage of cudaEvent_t
. Currently, I am using the clock()
call like this to find the duration of a kernel call:
cudaThreadSynchronize();
clock_t begin = clock();
fooKernel<<< x, y >>>( z, w );
cudaThreadSynchronize();
clock_t end = clock();
// Print time difference: ( end - begin )
Looking for a timer of higher-resolution I am considering using cudaEvent_t
. Do I need to call cudaThreadSynchronize()
before I note down the time using cudaEventRecord()
or is it redundant?
The reason I am asking is because there is another call cudaEventSynchronize()
, which seems to wait until the event is recorded. If the recording is delayed, won't the time difference that is calculated show some extra time after the kernel has finished execution?
cudaEventSynchronize() tells CUDA to wait until everything is done before logging the previous cudaEventRecord. total and execution are ordinary floats. Times in milliseconds.
The kernel is a function executed on the GPU. Every CUDA kernel starts with a __global__ declaration specifier. Programmers provide a unique global ID to each thread by using built-in variables. Figure 2. CUDA kernels are subdivided into blocks.
In order to launch a CUDA kernel we need to specify the block dimension and the grid dimension from the host code. I'll consider the same Hello World! code considered in the previous article. In the above code, to launch the CUDA kernel two 1's are initialised between the angle brackets.
Actually there are even more synchronization functions (cudaStreamSynchronize
). The programming guide has a detailed description what every one of those does. Using events as timers basically comes down to this:
//create events
cudaEvent_t event1, event2;
cudaEventCreate(&event1);
cudaEventCreate(&event2);
//record events around kernel launch
cudaEventRecord(event1, 0); //where 0 is the default stream
kernel<<<grid,block>>>(...); //also using the default stream
cudaEventRecord(event2, 0);
//synchronize
cudaEventSynchronize(event1); //optional
cudaEventSynchronize(event2); //wait for the event to be executed!
//calculate time
float dt_ms;
cudaEventElapsedTime(&dt_ms, event1, event2);
It's important to synchronize on event2
because you want to make sure everything got executed before calculating the time. As both events and the kernel are on the same stream (order is preserved) event1
and kernel
got executed too.
You could call cudaStreamSynchronize
or even cudaThreadSynchronize
instead but both are overkill in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With