Training TensorFlow model with summary operations is much slower than without summary operations

Tags:

I am training an Inception-like model using TensorFlow r1.0 with GPU Nvidia Titan X.

I added some summary operations to visualize the training procedure, using code as follows:

def variable_summaries(var):
"""Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
    with tf.name_scope('summaries'):
        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(var))
        tf.summary.scalar('min', tf.reduce_min(var))
        tf.summary.histogram('histogram', var)

When I run these operations, the time cost of training one epoch is about 400 seconds. But when I turn off these operations, the time cost of training one epoch is just 90 seconds.

How to optimize the graph to minimize the summary operations time cost?

646

asked Feb 23 '17 04:02

Da Tong

1 Answers

Summaries of course slow down the training process, because you do more operations and you need to write them to disc. Also, histogram summaries slow the training even more, because for histograms you need more data to be copied from GPU to CPU than for scalar values. So I would try to use histogram logging less often than the rest, that could make some difference.

The usual solution is to compute summaries only every X batches. Since you compute summaries only one per epoch and not every batch, it might be worth trying even less summaries logging.

Depends on how many batches you have in your dataset, but usually you don't lose much information by gathering a bit less logs.

117

answered Oct 05 '22 06:10

Matěj Račinský

Related questions
                            
                                restore Tensorflow model without extracting from directory
                            
                                Implementing a batch dependent loss in Keras
                            
                                How to configure tensorflow legacy/train.py model.cpk output interval
                            
                                Tensorflow-Deeplearning - Correlation between input and output
                            
                                How to implement Beholder (Tensorboard plugin) for Keras?
                            
                                Keras predict loop memory leak using tf.data.Dataset but not with a numpy array
                            
                                How a robust background removal is implemented?
                            
                                How to convert tf.contrib to Tensorflow 2.0
                            
                                Why does Tensorflow 2 give a warning (but still work anyway) when the input is a pandas dataframe?
                            
                                Getting Model Explanations with Tensorflow Serving and SavedModel Estimators
                            
                                Inputting an obscure file type into tensorflow
                            
                                How to store result of an operation (like TOPK) per epoch in keras
                            
                                error when using Mirrored strategy in Tensorflow
                            
                                Keras custom loss function to ignore false negatives of a specific class during semantic segmentation?
                            
                                Layer names for pretrained inception v3 model (tensorflow) [duplicate]
                            
                                Embedding lookup table doesn't mask padding value
                            
                                How to detect which variable is 'nonetype' in tensorflow
                            
                                How to use textsum?
                            
                                Computer restarts with large mini batches in TensorFlow
                            
                                Difference in matrix multiplication tensorflow vs numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Training TensorFlow model with summary operations is much slower than without summary operations

Tags:

tensorflow

nvidia-titan

tensorboard

Da Tong

People also ask

1 Answers

Matěj Račinský

Recent Activity

Donate For Us