CUDA: cudaEvent_t and cudaThreadSynchronize usage

I am a bit confused about the usage of cudaEvent_t. Currently, I am using the clock() call like this to find the duration of a kernel call:

cudaThreadSynchronize();
clock_t begin = clock();

fooKernel<<< x, y >>>( z, w );

cudaThreadSynchronize();
clock_t end = clock();

// Print time difference: ( end - begin )

Looking for a timer of higher-resolution I am considering using cudaEvent_t. Do I need to call cudaThreadSynchronize() before I note down the time using cudaEventRecord() or is it redundant?

The reason I am asking is because there is another call cudaEventSynchronize(), which seems to wait until the event is recorded. If the recording is delayed, won't the time difference that is calculated show some extra time after the kernel has finished execution?

What is cudaEventSynchronize?

cudaEventSynchronize() tells CUDA to wait until everything is done before logging the previous cudaEventRecord. total and execution are ordinary floats. Times in milliseconds.

What is a kernel in CUDA?

The kernel is a function executed on the GPU. Every CUDA kernel starts with a __global__ declaration specifier. Programmers provide a unique global ID to each thread by using built-in variables. Figure 2. CUDA kernels are subdivided into blocks.

How do I launch a CUDA kernel?

In order to launch a CUDA kernel we need to specify the block dimension and the grid dimension from the host code. I'll consider the same Hello World! code considered in the previous article. In the above code, to launch the CUDA kernel two 1's are initialised between the angle brackets.

Actually there are even more synchronization functions (cudaStreamSynchronize). The programming guide has a detailed description what every one of those does. Using events as timers basically comes down to this:

//create events
cudaEvent_t event1, event2;
cudaEventCreate(&event1);
cudaEventCreate(&event2);

//record events around kernel launch
cudaEventRecord(event1, 0); //where 0 is the default stream
kernel<<<grid,block>>>(...); //also using the default stream
cudaEventRecord(event2, 0);

//synchronize
cudaEventSynchronize(event1); //optional
cudaEventSynchronize(event2); //wait for the event to be executed!

//calculate time
float dt_ms;
cudaEventElapsedTime(&dt_ms, event1, event2);

It's important to synchronize on event2 because you want to make sure everything got executed before calculating the time. As both events and the kernel are on the same stream (order is preserved) event1 and kernel got executed too.

You could call cudaStreamSynchronize or even cudaThreadSynchronize instead but both are overkill in this case.

CUDA: cudaEvent_t and cudaThreadSynchronize usage

Tags:

cuda

timer

Ashwin Nanjappa

People also ask

1 Answers

Jonas Bötel

Recent Activity

Donate For Us

CUDA: cudaEvent_t and cudaThreadSynchronize usage

Tags:

cuda

timer

Ashwin Nanjappa

People also ask

1 Answers

Jonas Bötel

Related questions

Recent Activity

Donate For Us