If I register a callback via cudaStreamAddCallback()
, what thread is going to run it ?
The CUDA documentation says that cudaStreamAddCallback
adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each
cudaStreamAddCallback
call, a callback will be executed exactly once. The callback will block later work in the stream until it is finished.
but says nothing about how the callback itself is called.
Just to flesh out comments so that this question has an answer and will fall off the unanswered queue:
The short answer is that this is an internal implementation detail of the CUDA runtime and you don't need to worry about it.
The longer answer is that if you look carefully at the operation of the CUDA runtime, you will notice that context establishment on a device (be it explicit via the driver API, or implicit via the runtime API) spawns a small thread pool. It is these threads which are used to implement features of the runtime like stream command queues and call back operations. Again, an internal implementation detail which the programmer doesn't need to know about.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With