Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I override the CUDA kernel execution time limit on Windows with a secondary GPUs?

From Nvidia's website, it explain the time-out problem:

Q: What is the maximum kernel execution time? On Windows, individual GPU program launches have a maximum run time of around 5 seconds. Exceeding this time limit usually will cause a launch failure reported through the CUDA driver or the CUDA runtime, but in some cases can hang the entire machine, requiring a hard reset. This is caused by the Windows "watchdog" timer that causes programs using the primary graphics adapter to time out if they run longer than the maximum allowed time.

For this reason it is recommended that CUDA is run on a GPU that is NOT attached to a display and does not have the Windows desktop extended onto it. In this case, the system must contain at least one NVIDIA GPU that serves as the primary graphics adapter.

Source: https://developer.nvidia.com/cuda-faq

So it seems that, nvidia believes, or at least strongly implys, having multi- (nvidia) gpus, and with proper configuration, can prevent this from happening?

But how? so far I tried lots ways but there is still the annoying time-out on a GK110 GPU that is: (1) plugging in the secondary PCIE 16X slots; (2) Not being connected to any monitors (3) Is setted to use as an exclusive physX card in driver control panel (as recommended by some other guys), but the block-out is still there.

like image 774
user2003564 Avatar asked Mar 03 '13 00:03

user2003564


People also ask

What are the limitations of CUDA kernel?

kernel cannot allocate, and only isbits types in device arrays: CUDA C has no garbage collection, and Julia has no manual deallocations, let alone on the device to deal with data that live independently of the CuArray. no try-catch-finally in kernel: CUDA C does not support exception handling on device (v11.

What is the maximum kernel execution time?

The kernel itself runs approximately 1300 ms resulting in an overall time of two seconds for both the memory transfers and computation.

What is function of __ global __ qualifier in CUDA program?

__global__ : 1. A qualifier added to standard C. This alerts the compiler that a function should be compiled to run on a device (GPU) instead of host (CPU).

Is the GPU kernel execution execution synchronous or asynchronous?

Kernel launches are asynchronous with respect to the host. Details of concurrent kernel execution and data transfers can be found in the CUDA Programmers Guide.


1 Answers

If your GK110 is a Tesla K20c GPU, then you should switch the device from wddm mode to TCC mode. This can be done with the nvidia-smi.exe tool that gets installed with the driver. Use the windows search function to find this file (nvidia-smi.exe) then use the command line help (`nvidia-smi --help) to discover the commands necessary to switch a GPU from WDDM to TCC mode.

Once you have done this, the windows watchdog mechanism will no longer pay attention to your GK110 device.

If on the other hand it is a GeForce GPU, there is no way to switch it to TCC mode. Your only option is to modify the registry settings, which is somewhat difficult. Your mileage may vary, as the exact structure of the reg keys varies by OS.

If a GPU is in WDDM mode, it is subject to the watchdog timer.

like image 199
Robert Crovella Avatar answered Sep 22 '22 06:09

Robert Crovella