Is there a maximum number of streams in CUDA?

Tags:

cuda

People also ask

What are streams in CUDA?

A stream in CUDA is a sequence of operations that execute on the device in the order in which they are issued by the host code. While operations within a stream are guaranteed to execute in the prescribed order, operations in different streams can be interleaved and, when possible, they can even run concurrently.

Are CUDA kernels asynchronous?

3 Answers. Show activity on this post. Kernel calls are asynchronous from the point of view of the CPU so if you call 2 kernels in succession the second one will be called without waiting for the first one to finish. It only means that the control returns to the CPU immediately.

What is a CUDA event?

CUDA events are synchronization markers that can be used to monitor the device's progress, to accurately measure timing, and to synchronize CUDA streams. The underlying CUDA events are lazily initialized when the event is first recorded or exported to another process.

There is no realistic limit to the number of streams you can create (at least 1000's). However, there's a limit to the number of streams you can use effectively to achieve concurrency.

In Fermi, the architecture supports 16-way concurrent kernel launches, but there is only a single connection from the host to the GPU. So even if you have 16 CUDA streams, they'll eventually get funneled into one HW queue. This can create false data-dependencies, and limit the amount of concurrency one can easily get.

With Kepler, the number of connections between the Host and the GPU is now 32 (instead of one with Fermi). With the new Hyper-Q technology, it is now much easier to keep the GPU busy with concurrent work.

I haven't seen a limit in any documentation, but that doesn't mean all streams will execute concurrently, since that is a hard hardware limit (Multiprocessors, registers, etc).

According to this NVIDIA presentation, max is 16 streams (on Fermi). http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

To clarify, I've successfully created more than 16 streams, but I think the hardware can only support 16 concurrent kernels, so the excess ones are wasted in terms of concurrency.

Kepler is probably different.

Related questions
                            
                                What is the difference between PyCUDA and NumbaPro CUDA Python?
                            
                                Does CPU waits for DEVICE to let it finish its kernel execution....?
                            
                                Texture memory in CUDA: Concept and simple example to demonstrate performance
                            
                                error: cuda_runtime.h: No such file or directory
                            
                                what is difference between "-arch sm_13" and "-arch sm_20"
                            
                                CUDA: __syncthreads() inside if statements
                            
                                Concurrent writes in the same global memory location
                            
                                How to calculate Gflops of a kernel
                            
                                Why is "a =(b>0)?1:0" better than "if-else" version in CUDA?
                            
                                Gustafson's law vs Amdahl's law
                            
                                NVIDIA CUDA Video Encoder (NVCUVENC) input from device texture array
                            
                                Nvcc missing when installing cudatoolkit?
                            
                                CUDA vs OpenCL performance comparison
                            
                                CUDA: Tiled matrix-matrix multiplication with shared memory and matrix size which is non-multiple of the block size
                            
                                From thrust::device_vector to raw pointer and back?
                            
                                NVidia CUDA toolkit 7.5.27 failing to install on OS X
                            
                                Difference with CUDA Hardware Quadro 4000 Vs. GeForce 480
                            
                                Have you successfully used a GPGPU? [closed]
                            
                                help me understand cuda
                            
                                How is CUDA memory managed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a maximum number of streams in CUDA?

Tags:

cuda

People also ask

Related questions

Recent Activity

Donate For Us