Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to measure the inner kernel time in NVIDIA CUDA?

Tags:

I want to measure time inner kernel of GPU, how how to measure it in NVIDIA CUDA? e.g.

__global__ void kernelSample() {   some code here   get start time    some code here    get stop time    some code here } 
like image 723
Amin Avatar asked May 14 '12 15:05

Amin


People also ask

How does CUDA calculate execution time?

You can use the compute visula profiler which will be great for your purpose. it measures the time of every cuda function and tells you how many times you called it .

What is kernel function in CUDA?

Figure 1 shows that the CUDA kernel is a function that gets executed on GPU. The parallel portion of your applications is executed K times in parallel by K different CUDA threads, as opposed to only one time like regular C/C++ functions. Figure 1. The kernel is a function executed on the GPU.

What are the CUDA kernel limitations?

There is a maximum number of CUDA instructions per kernel: 2 million before CC 2.0, 512 million after. OK, thank you.

What is the correct way to launch CUDA kernel?

In order to run a kernel on the CUDA threads, we need two things. First, in the main() function of the program, we call the function to be executed by each thread on the GPU. This invocation is called Kernel Launch and with it we need provide the number of threads and their grouping.


1 Answers

You can do something like this:

__global__ void kernelSample(int *runtime) {   // ....   clock_t start_time = clock();    //some code here    clock_t stop_time = clock();   // ....    runtime[tidx] = (int)(stop_time - start_time); } 

Which gives the number of clock cycles between the two calls. Be a little careful though, the timer will overflow after a couple of seconds, so you should be sure that the duration of code between successive calls is quite short. You should also be aware that the compiler and assembler do perform instruction re-ordering so you might want to check that the clock calls don't wind up getting put next to each other in the SASS output (use cudaobjdump to check).

like image 96
talonmies Avatar answered Sep 20 '22 18:09

talonmies