This example shows how to profile tensorflow programs. I have used this tool to profile my program, a simple LSTM. And the results is shown as:
/gpu:0/stream:all Compute(pid 5)
/job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)
My question :
a)what is the meaning of each row.
b)Especially what is the difference between /gpu:0/stream:all Compute(pid 5)
and /job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)
.
c)Why their execution time are different, namely 0.072ms
and 0.094ms
.
Profiling helps understand the hardware resource consumption (time and memory) of the various TensorFlow operations (ops) in your model and resolve performance bottlenecks and, ultimately, make the model execute faster.
Users can enable those CPU optimizations by setting the the environment variable TF_ENABLE_ONEDNN_OPTS=1 for the official x86-64 TensorFlow after v2. 5. Most of the recommendations work on both official x86-64 TensorFlow and Intel® Optimization for TensorFlow.
Limiting GPU memory growth To limit TensorFlow to a specific set of GPUs, use the tf. config. set_visible_devices method. In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.
OOM (Out Of Memory) errors can occur when building and training a neural network model on the GPU. The size of the model is limited by the available memory on the GPU. The following may occur when a model has exhausted the memory : Resource Exhausted Error : an error message that indicates Out Of Memory (OOM)
Here's an update from one of the engineers:
The '/gpu:0/stream:*' timelsines are hardware tracing of CUDA kernel execution times.
The '/gpu:0' lines are the TF software device enqueueing the ops on the CUDA stream (usually takes almost zero time)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With