Is there a way to calculate the time cost for each node in a TensorFlow network?
I find it hard to locate the performance bottlenecks.
EDIT: The Timeline
profiler is really awesome (https://stackoverflow.com/a/37774470/3632556).
Profiling helps understand the hardware resource consumption (time and memory) of the various TensorFlow operations (ops) in your model and resolve performance bottlenecks and, ultimately, make the model execute faster.
Limiting GPU memory growth To limit TensorFlow to a specific set of GPUs, use the tf. config. set_visible_devices method. In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.
OOM (Out Of Memory) errors can occur when building and training a neural network model on the GPU. The size of the model is limited by the available memory on the GPU. The following may occur when a model has exhausted the memory : Resource Exhausted Error : an error message that indicates Out Of Memory (OOM)
If you want to find how much time was spent on each operation at TF, you can do this in tensorboard using runtime statistics. You will need to do something like this (check the full example in the above-mentioned link):
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) run_metadata = tf.RunMetadata() sess.run(<values_you_want_to_execute>, options=run_options, run_metadata=run_metadata) your_writer.add_run_metadata(run_metadata, 'step%d' % i)
Better than just printing it you can see it in tensorboard:
Additionally, clicking on a node will display the exact total memory, compute time, and tensor output sizes.
Also now tensorflow has a debugger. Here is a tutorial of how to use it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With