When creating a CUDA event, you can optionally turn on the cudaEventBlockingSync flag. But - what if the difference between creating an event with or without the flag? I read the fine manual; it just doesn't make sense to me. What is the "calling host thread", and what "blocks" when you don't use the flag?
4.6.2.7 cudaError_t cudaEventSynchronize(cudaEvent_t event)
Blocks until the event has actually been recorded. ... Waiting for an event that was created with the cudaEventBlockingSync flag will cause the calling host thread to block until the event has actually been recorded.
cudaEventBlockingSync will define how the host will wait for the event to happen.
When cudaEventBlockingSync is SET the CPU can give up the host thread. i.e. The CPU will be passed a different thread (possibly of a process). The host thread will re-acquire the CPU at a later time. With this approach, the host thread does not monopolize all the CPU time, the host can be allowed to do other work.
When cudaEventBlockingSync is NOT SET the CPU will busy-wait, i.e. the CPU will enter a check-event loop. When this happens the CPU just spins, looking for the event to occur. This usually causes the CPU performance meter to peg-out to 100%. With this approach, the host thread monopolizes all the CPU time.
Not setting cudaEventBlockingSync results in the minimum latency from kernel execution conclusion to the control returning to the thread. Which setting you want to use depends on what the kernel is doing. i.e. How long will it take for the event to happen, versus, how much schedule overhead is involved with the CPU blocking. Not setting this flag comes at the cost of not being able to do any other CPU work (other threads) while waiting for the event to occur.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With