I want to measure time inner kernel of GPU, how how to measure it in NVIDIA CUDA? e.g.
__global__ void kernelSample() { some code here get start time some code here get stop time some code here }
You can use the compute visula profiler which will be great for your purpose. it measures the time of every cuda function and tells you how many times you called it .
Figure 1 shows that the CUDA kernel is a function that gets executed on GPU. The parallel portion of your applications is executed K times in parallel by K different CUDA threads, as opposed to only one time like regular C/C++ functions. Figure 1. The kernel is a function executed on the GPU.
There is a maximum number of CUDA instructions per kernel: 2 million before CC 2.0, 512 million after. OK, thank you.
In order to run a kernel on the CUDA threads, we need two things. First, in the main() function of the program, we call the function to be executed by each thread on the GPU. This invocation is called Kernel Launch and with it we need provide the number of threads and their grouping.
You can do something like this:
__global__ void kernelSample(int *runtime) { // .... clock_t start_time = clock(); //some code here clock_t stop_time = clock(); // .... runtime[tidx] = (int)(stop_time - start_time); }
Which gives the number of clock cycles between the two calls. Be a little careful though, the timer will overflow after a couple of seconds, so you should be sure that the duration of code between successive calls is quite short. You should also be aware that the compiler and assembler do perform instruction re-ordering so you might want to check that the clock calls don't wind up getting put next to each other in the SASS output (use cudaobjdump
to check).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With