This example shows how to profile tensorflow programs. I have used this tool to profile my program, a simple LSTM. And the results is shown as: <code>/gpu:0/stream:all Compute(pid 5)</code> <img src="https://i.stack.imgur.com/KpNnA.png" alt="MatMul_AllCompute"> <code>/job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)</code> <img src="https://i.stack.imgur.com/BEm9a.png" alt="MatMul_GpuCompute"> My question : a)what is the meaning of each row. b)Especially what is the difference between <code>/gpu:0/stream:all Compute(pid 5)</code> and <code>/job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)</code>. c)Why their execution time are different, namely <code>0.072ms</code> and <code>0.094ms</code>.

Here's an update from one of the engineers: The '/gpu:0/stream:*' timelsines are hardware tracing of CUDA kernel execution times. The '/gpu:0' lines are the TF software device enqueueing the ops on the CUDA stream (usually takes almost zero time)

Understanding tensorflow profiling results

1 Answers

Here's an update from one of the engineers:

The '/gpu:0/stream:*' timelsines are hardware tracing of CUDA kernel execution times.

The '/gpu:0' lines are the TF software device enqueueing the ops on the CUDA stream (usually takes almost zero time)

111

answered Oct 26 '22 15:10

Pete Warden

Related questions
                            
                                Is there a way to check if mxnet uses my gpu?
                            
                                What is the difference between tf.keras.layers versus tf.layers?
                            
                                String Matching Using Recurrent Neural Networks
                            
                                Tensorflow: Passing a session to a python multiprocess
                            
                                Maximize tensorflow multi gpu performance
                            
                                tensorflow doing gradients on sparse variable
                            
                                Tensorflow startup time?
                            
                                Keras: convert pretrained weights between theano and tensorflow
                            
                                Retrain Tensorflow final layer but still use previous Imagenet classes
                            
                                actor critic policy loss going to zero (with no improvement)
                            
                                How to properly set steps_per_epoch and validation_steps in Keras?
                            
                                Changing CNN to work with 3D convolutions
                            
                                TF2 add report_tensor_allocations_upon_oom to RunOptions
                            
                                Hidden import Tensorflow package not found when using Pyinstaller
                            
                                Add weights to .pb file exported by TensorFlow
                            
                                List of headers to use Tensorflow C++ API using libtensorflow_cc.so

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding tensorflow profiling results

Tags:

profiling

tensorflow

pgplus1628

People also ask

1 Answers

Pete Warden

Recent Activity

Donate For Us