Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Training TensorFlow model with summary operations is much slower than without summary operations

I am training an Inception-like model using TensorFlow r1.0 with GPU Nvidia Titan X.

I added some summary operations to visualize the training procedure, using code as follows:

def variable_summaries(var):
"""Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
    with tf.name_scope('summaries'):
        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(var))
        tf.summary.scalar('min', tf.reduce_min(var))
        tf.summary.histogram('histogram', var)

When I run these operations, the time cost of training one epoch is about 400 seconds. But when I turn off these operations, the time cost of training one epoch is just 90 seconds.

How to optimize the graph to minimize the summary operations time cost?

like image 646
Da Tong Avatar asked Feb 23 '17 04:02

Da Tong


People also ask

What is TensorFlow profiler?

The TensorFlow Profiler (or the Profiler) provides a set of tools that you can use to measure the training performance and resource consumption of your TensorFlow models. This new version of the Profiler is integrated into TensorBoard, and builds upon existing capabilities such as the Trace Viewer.


1 Answers

Summaries of course slow down the training process, because you do more operations and you need to write them to disc. Also, histogram summaries slow the training even more, because for histograms you need more data to be copied from GPU to CPU than for scalar values. So I would try to use histogram logging less often than the rest, that could make some difference.

The usual solution is to compute summaries only every X batches. Since you compute summaries only one per epoch and not every batch, it might be worth trying even less summaries logging.

Depends on how many batches you have in your dataset, but usually you don't lose much information by gathering a bit less logs.

like image 117
Matěj Račinský Avatar answered Oct 05 '22 06:10

Matěj Račinský