I need to time a CUDA kernel execution. The Best Practices Guide says that we can use either events or standard timing functions like clock()
in Windows. My problem is that using these two functions gives me a totally different result.
In fact, the result given by events seems to be huge compared to the actual speed in practice.
What I actually need all this for is to be able to predict the running time of a computation by first running a reduced version of it on a smaller data set. Unfortunately, the results of this benchmark are totally unrealistic, being either too optimistic (clock()
) or waaaay too pessimistic (events).
You can use the compute visula profiler which will be great for your purpose. it measures the time of every cuda function and tells you how many times you called it .
CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on its own GPUs (graphics processing units). CUDA enables developers to speed up compute-intensive applications by harnessing the power of GPUs for the parallelizable part of the computation.
In a typical CUDA program, data are first send from main memory to the GPU memory, then the CPU sends instructions to the GPU, then the GPU schedules and executes the kernel on the available parallel hardware, and finally results are copied back from the GPU memory to the CPU memory. ...
Most CUDA calls are synchronous (often called “blocking”). An example of a blocking call is cudaMemcpy().
You could do something along the lines of :
#include <sys/time.h>
struct timeval t1, t2;
gettimeofday(&t1, 0);
kernel_call<<<dimGrid, dimBlock, 0>>>();
HANDLE_ERROR(cudaThreadSynchronize();)
gettimeofday(&t2, 0);
double time = (1000000.0*(t2.tv_sec-t1.tv_sec) + t2.tv_usec-t1.tv_usec)/1000.0;
printf("Time to generate: %3.1f ms \n", time);
or:
float time;
cudaEvent_t start, stop;
HANDLE_ERROR( cudaEventCreate(&start) );
HANDLE_ERROR( cudaEventCreate(&stop) );
HANDLE_ERROR( cudaEventRecord(start, 0) );
kernel_call<<<dimGrid, dimBlock, 0>>>();
HANDLE_ERROR( cudaEventRecord(stop, 0) );
HANDLE_ERROR( cudaEventSynchronize(stop) );
HANDLE_ERROR( cudaEventElapsedTime(&time, start, stop) );
printf("Time to generate: %3.1f ms \n", time);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With