Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the warmup code necessary when measuring CUDA kernel running time?

Tags:

cuda

gpu

In page 85, professional CUDA C programming:

int main()
{
    ......
    // run a warmup kernel to remove overhead
    size_t iStart,iElaps;
    cudaDeviceSynchronize();
    iStart = seconds();
    warmingup<<<grid, block>>> (d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("warmup <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x, iElaps );

    // run kernel 1
    iStart = seconds();
    mathKernel1<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("mathKernel1 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds();
    mathKernel2<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel2 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds ();
    mathKernel3<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel3 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps);
    ......
}

We can see there is a warmup before measuring the running time of different kernels.

From GPU cards warming up?, I know the reason is:

If they are non-display cards, it might well be the driver shutting itself down after a period of inactivity. So what you are seeing on the first run might well be initialization overhead that only happens once.

So if my GPU card isn't inactive for a long time, e.g, I just use it to run some programs, it should not need to run any warmup code. Is my understanding right?

like image 991
Nan Xiao Avatar asked Feb 06 '23 13:02

Nan Xiao


1 Answers

Besides the GPU being in a power saving state there can be a number of other reasons why the first launch of a kernel could be slower than further runs:

  • just-in-time compilation
  • transfer of kernel to GPU memory
  • cache content
  • ...

For these reasons it is always good practice to perform at least one "warmup run" before the timed kernel run, if you are interested in the sustained speed that consecutive kernel launches achieve.

If however you have a specific application and use case in mind, it always makes sense to benchmark that application under the relevant circumstances. Be prepared though for much larger variations in runtime in that less controlled measurement.

like image 134
tera Avatar answered Feb 19 '23 04:02

tera