A stream in CUDA is a sequence of operations that execute on the device in the order in which they are issued by the host code. While operations within a stream are guaranteed to execute in the prescribed order, operations in different streams can be interleaved and, when possible, they can even run concurrently.
3 Answers. Show activity on this post. Kernel calls are asynchronous from the point of view of the CPU so if you call 2 kernels in succession the second one will be called without waiting for the first one to finish. It only means that the control returns to the CPU immediately.
CUDA events are synchronization markers that can be used to monitor the device's progress, to accurately measure timing, and to synchronize CUDA streams. The underlying CUDA events are lazily initialized when the event is first recorded or exported to another process.
There is no realistic limit to the number of streams you can create (at least 1000's). However, there's a limit to the number of streams you can use effectively to achieve concurrency.
In Fermi, the architecture supports 16-way concurrent kernel launches, but there is only a single connection from the host to the GPU. So even if you have 16 CUDA streams, they'll eventually get funneled into one HW queue. This can create false data-dependencies, and limit the amount of concurrency one can easily get.
With Kepler, the number of connections between the Host and the GPU is now 32 (instead of one with Fermi). With the new Hyper-Q technology, it is now much easier to keep the GPU busy with concurrent work.
I haven't seen a limit in any documentation, but that doesn't mean all streams will execute concurrently, since that is a hard hardware limit (Multiprocessors, registers, etc).
According to this NVIDIA presentation, max is 16 streams (on Fermi). http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf
To clarify, I've successfully created more than 16 streams, but I think the hardware can only support 16 concurrent kernels, so the excess ones are wasted in terms of concurrency.
Kepler is probably different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With