Does Tensorflow utilize Cuda streams automatically for concurrent execution of the computation graph on a single GPU or should streams be assigned manually to ops/tensors ?
A stream in CUDA is a sequence of operations that execute on the device in the order in which they are issued by the host code. While operations within a stream are guaranteed to execute in the prescribed order, operations in different streams can be interleaved and, when possible, they can even run concurrently.
The GPU-enabled version of TensorFlow has the following requirements: 64-bit Linux. Python 2.7. CUDA 7.5 (CUDA 8.0 required for Pascal GPUs)
What is CUDA? It provides everything you need to develop GPU-accelerated applications. A parallel computing platform and application programming interface model,it enables developers to speed up compute-intensive applications by harnessing the power of GPUs for the parallelizable part of the computation.
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES ) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation. To limit TensorFlow to a specific set of GPUs, use the tf.
For now, TensorFlow only uses one compute stream, and multiple copy streams. Some kernels may choose to use multiple streams for computation, while maintaining a single-stream semantics.
Our experiment showed that enabling multi-stream automatically does not bring much performance gains, since most of our kernels are large enough to utilize all processors in GPU. But enabling multi-stream would disable our current design to recycle GPU memory aggressively.
This is a decision we might revisit in the future. If that happens, it is likely for TensorFlow to automatically assign ops/kernels to different Cuda streams, without exposing them to users.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With