When running long OpenCL computations on Windows using the GPU that also runs the main display, the OS may interrupt the computation with Timeout Detection and Recovery.
In my experience (Java, using JavaCL by NativeLibs4Java, with an NVidia GPU) this manifests as an "Out Of Resources" (cl_out_of_resources) error when ivoking clEnqueueReadBuffer.
The problem is that I get the exact same message when the OpenCL program for other reasons (e.g., because of accessing invalid memory).
Is there a (semi) reliable way to distinguish between an "Out of Resources" caused by TDR and an "Out of Resources" caused by other problems?
Alternately, can I at least reliably (in Java / through OpenCL API) determine that the GPU used for computation is also running the display?
I am aware of this question however, the answer there is concerned with scenarios when clFinish does not return, which is not a problem for me (my code so far never stayed frozen within the OpenCL API).
Is there a (semi) reliable way to distinguish between an "Out of Resources" caused by TDR and an "Out of Resources" caused by other problems?
1)
If you can access
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers KeyValue : TdrDelay ValueType : REG_DWORD ValueData : Number of seconds to delay. 2 seconds is the default value.
from WMI to multiply it by
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers KeyValue : TdrLimitCount ValueType : REG_DWORD ValueData : Number of TDRs before crashing. The default value is 5.
again with WMI. You get 10 seconds when you multiply these. And, you should get
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers KeyValue : TdrLimitTime ValueType : REG_DWORD ValueData : Number of seconds before crashing. 60 seconds is the default value.
that should read 60 seconds from WMI.
For this example computer, it takes 5 x 2-second+1 extra delays before 60 seconds final to crash limit. Then you can check from application if last stopwatch counter exceeded those limits. If yes, probably it is TDR. There is also a thread-exit-from-driver time limit on top of these,
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers KeyValue : TdrDdiDelay ValueType : REG_DWORD ValueData : Number of seconds to leave the driver. 5 seconds is the default value.
which is 5 seconds default. Accessing an invalid memory segment should exit quicker. Maybe you can increase these TDR time limits from WMI up to some minutes so it can let the program compute without crashing becauso of preemption starvation. But changing registry could be dangerous, for example you set TDR time limit to 1 second or some slice of it, then windows may never boot without constant TDR crashes so just reading those variables must be safer.
2)
You separate total work into much smaller parts. If data is not separable, copy it once, then start enqueueing the long-runnning kernel as very-short-ranged-kernels n times with some waiting between any two.
Then, you must be sure that TDR is elliminated. If this version runs but the long-running-kernel doesn't, it is TDR fault.If it is opposite, it is memory crash. Looks like this:
short running x 1024 times
long running
long running <---- fail? TDR! because memory would crash short ver. too!
long running
another try:
short running x 1024 times <---- fail? memory! because only 1ms per kernel
long running
long running
long running
Alternately, can I at least reliably (in Java / through OpenCL API) determine that the GPU used for computation is also running the display?
1)
Use interoperability properties of both devices:
// taken from Intel's site:
std::vector<cl_device_id> devs (devNum);
//reading the info
clGetGLContextInfoKHR(props, CL_DEVICES_FOR_GL_CONTEXT_KHR, bytes, devs, NULL))
this gives interoperable devices list. You should get its id to exclude it if you don't want to use it.
2)
Have another thread run some opengl or directx static object drawing code to keep one of the gpus busy. Then test all gpus simultaneously using another thread for some trivial opencl kernel codes. Test:
you should not copy any data between devices while doing this so CPU/RAM will not be bottleneck.
3)
If data is separable, then you can use a divide-and-conquer algorithm to give any gpu get its own work only when it is available and let display part more flexibility (because this is performance-aware solution and could be similar to short-running version but scheduling is done on multiple gpus)
4)
I didn't check because I sold my 2nd gpu but, you should try
CL_DEVICE_TYPE_DEFAULT
in your multi-gpu system to test if it gets display gpu or not. Shut down pc, plug monitor cable to other card, try again. Shut down, change seats of cards, try again. Shut down, remove one of the cards so only 1 gpu and 1 cpu is left, try again. If all these give only display gpu then it should be marking display gpu as default.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With