According to the CUDA programming guide, you can disable asynchronous kernel launches at run time by setting an environment variable (CUDA_LAUNCH_BLOCKING=1).
This is a helpful tool for debugging. I also want to determine the benefit in my code from using concurrent kernels and transfers.
I want to also disable other concurrent calls, in particular cudaMemcpyAsync
.
Does CUDA_LAUNCH_BLOCKING
affect these kinds of calls in addition to kernel launches? I suspect not. What would be the best alternative? I can add cudaStreamSynchronize
calls, but I would prefer a run time solution. I can run in the debugger, but that will affect the timing and defeat the purpose.
Setting CUDA_LAUNCH_BLOCKING won't effect the streams API at all. If you add some debug code to force all your streams code to use stream 0, all the calls other than kernel calls will revert to synchronous behaviour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With