I need to time a CUDA kernel execution. The Best Practices Guide says that we can use either events or standard timing functions like <code>clock()</code> in Windows. My problem is that using these two functions gives me a totally different result. In fact, the result given by events seems to be huge compared to the actual speed in practice. What I actually need all this for is to be able to predict the running time of a computation by first running a reduced version of it on a smaller data set. Unfortunately, the results of this benchmark are totally unrealistic, being either too optimistic (<code>clock()</code>) or waaaay too pessimistic (events).

You could do something along the lines of : <pre class="prettyprint"><code>#include <sys/time.h> struct timeval t1, t2; gettimeofday(&t1, 0); kernel_call<<<dimGrid, dimBlock, 0>>>(); HANDLE_ERROR(cudaThreadSynchronize();) gettimeofday(&t2, 0); double time = (1000000.0*(t2.tv_sec-t1.tv_sec) + t2.tv_usec-t1.tv_usec)/1000.0; printf("Time to generate: %3.1f ms \n", time); </code></pre> or: <pre class="prettyprint"><code>float time; cudaEvent_t start, stop; HANDLE_ERROR( cudaEventCreate(&start) ); HANDLE_ERROR( cudaEventCreate(&stop) ); HANDLE_ERROR( cudaEventRecord(start, 0) ); kernel_call<<<dimGrid, dimBlock, 0>>>(); HANDLE_ERROR( cudaEventRecord(stop, 0) ); HANDLE_ERROR( cudaEventSynchronize(stop) ); HANDLE_ERROR( cudaEventElapsedTime(&time, start, stop) ); printf("Time to generate: %3.1f ms \n", time); </code></pre>

Timing CUDA operations

Tags:

c

benchmarking

cuda

I need to time a CUDA kernel execution. The Best Practices Guide says that we can use either events or standard timing functions like clock() in Windows. My problem is that using these two functions gives me a totally different result. In fact, the result given by events seems to be huge compared to the actual speed in practice.

What I actually need all this for is to be able to predict the running time of a computation by first running a reduced version of it on a smaller data set. Unfortunately, the results of this benchmark are totally unrealistic, being either too optimistic (clock()) or waaaay too pessimistic (events).

408

asked Oct 24 '11 13:10

Tudor

1 Answers

You could do something along the lines of :

#include <sys/time.h>

struct timeval t1, t2;

gettimeofday(&t1, 0);

kernel_call<<<dimGrid, dimBlock, 0>>>();

HANDLE_ERROR(cudaThreadSynchronize();)

gettimeofday(&t2, 0);

double time = (1000000.0*(t2.tv_sec-t1.tv_sec) + t2.tv_usec-t1.tv_usec)/1000.0;

printf("Time to generate:  %3.1f ms \n", time);

or:

float time;
cudaEvent_t start, stop;

HANDLE_ERROR( cudaEventCreate(&start) );
HANDLE_ERROR( cudaEventCreate(&stop) );
HANDLE_ERROR( cudaEventRecord(start, 0) );

kernel_call<<<dimGrid, dimBlock, 0>>>();

HANDLE_ERROR( cudaEventRecord(stop, 0) );
HANDLE_ERROR( cudaEventSynchronize(stop) );
HANDLE_ERROR( cudaEventElapsedTime(&time, start, stop) );

printf("Time to generate:  %3.1f ms \n", time);

154

answered Sep 29 '22 12:09

fbielejec

Related questions
                            
                                How do I create a global variable that is thread-specific in C using POSIX threads?
                            
                                Why am I getting undefined reference to pthread_mutexattr_settype?
                            
                                C - Undefined Reference to WSAStartup@8'
                            
                                Multicore programming: what's necessary to do it?
                            
                                What is the pointer-to-pointer technique for the simpler traversal of linked lists? [duplicate]
                            
                                Is there a portable C compiler for windows?
                            
                                Is typedef'ing a pointer type considered bad practice? [duplicate]
                            
                                sizeof a struct member [duplicate]
                            
                                Are bit flags using ints in C/C++ actually safe?
                            
                                How is infinity represented in a C double?
                            
                                Segmentation fault when calling a function located in the heap
                            
                                Fastest Implementation of the Natural Exponential Function Using SSE
                            
                                Reading "integer" size bytes from a char* array.
                            
                                How to use regular expressions in C?
                            
                                In C, accessing my array index is faster or accessing by pointer is faster?
                            
                                Wrap around explanation for signed and unsigned variables in C?
                            
                                How to work with pointer to pointer to structure in C?
                            
                                strpos in C- how does it work
                            
                                sprintf function's buffer overflow?
                            
                                Why is my pointer not null after free?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With