What do I need K.clear_session() and del model for (Keras with Tensorflow-gpu)?

Tags:

What I am doing
I am training and using a convolutional neuron network (CNN) for image-classification using Keras with Tensorflow-gpu as backend.

What I am using
- PyCharm Community 2018.1.2
- both Python 2.7 and 3.5 (but not both at a time)
- Ubuntu 16.04
- Keras 2.2.0
- Tensorflow-GPU 1.8.0 as backend

What I want to know
In many codes I see people using

from keras import backend as K   # Do some code, e.g. train and save model  K.clear_session()

or deleting the model after using it:

del model

The keras documentation says regarding clear_session: "Destroys the current TF graph and creates a new one. Useful to avoid clutter from old models / layers." - https://keras.io/backend/

What is the point of doing that and should I do it as well? When loading or creating a new model my model gets overwritten anyway, so why bother?

900

asked Jun 17 '18 08:06

benjamin

1 Answers

K.clear_session() is useful when you're creating multiple models in succession, such as during hyperparameter search or cross-validation. Each model you train adds nodes (potentially numbering in the thousands) to the graph. TensorFlow executes the entire graph whenever you (or Keras) call tf.Session.run() or tf.Tensor.eval(), so your models will become slower and slower to train, and you may also run out of memory. Clearing the session removes all the nodes left over from previous models, freeing memory and preventing slowdown.

Edit 21/06/19:

TensorFlow is lazy-evaluated by default. TensorFlow operations aren't evaluated immediately: creating a tensor or doing some operations to it creates nodes in a dataflow graph. The results are calculated by evaluating the relevant parts of the graph in one go when you call tf.Session.run() or tf.Tensor.eval(). This is so TensorFlow can build an execution plan that allocates operations that can be performed in parallel to different devices. It can also fold adjacent nodes together or remove redundant ones (e.g. if you concatenated two tensors and later split them apart again unchanged). For more details, see https://www.tensorflow.org/guide/graphs

All of your TensorFlow models are stored in the graph as a series of tensors and tensor operations. The basic operation of machine learning is tensor dot product - the output of a neural network is the dot product of the input matrix and the network weights. If you have a single-layer perceptron and 1,000 training samples, then each epoch creates at least 1,000 tensor operations. If you have 1,000 epochs, then your graph contains at least 1,000,000 nodes at the end, before taking into account preprocessing, postprocessing, and more complex models such as recurrent nets, encoder-decoder, attentional models, etc.

The problem is that eventually the graph would be too large to fit into video memory (6 GB in my case), so TF would shuttle parts of the graph from video to main memory and back. Eventually it would even get too large for main memory (12 GB) and start moving between main memory and the hard disk. Needless to say, this made things incredibly, and increasingly, slow as training went on. Before developing this save-model/clear-session/reload-model flow, I calculated that, at the per-epoch rate of slowdown I experienced, my model would have taken longer than the age of the universe to finish training.

Disclaimer: I haven't used TensorFlow in almost a year, so this might have changed. I remember there being quite a few GitHub issues around this so hopefully it has since been fixed.

answered Sep 23 '22 18:09

Chris Swinchatt

Related questions
                            
                                How do I round datetime column to nearest quarter hour
                            
                                check output from CalledProcessError
                            
                                Extract Number from String in Python
                            
                                Simple prime number generator in Python
                            
                                Disabling Python nosetests
                            
                                Python convert decimal to hex
                            
                                Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"
                            
                                Append rows to a pandas DataFrame without making a new copy
                            
                                Is it possible to directly apply an affine transformation matrix to a Mayavi ImageActor object?
                            
                                LCP with sparse matrix
                            
                                Best Practices with Anaconda and Brew
                            
                                Django admin - inline inlines (or, three model editing at once)
                            
                                How to run another Python program without holding up original [duplicate]
                            
                                Why does Django make migrations for help_text and verbose_name changes?
                            
                                can't multiply sequence by non-int of type 'float'
                            
                                "chunksize" parameter in multiprocessing.Pool.map
                            
                                Feature Selection and Reduction for Text Classification
                            
                                Why in numpy `nan == nan` is False while nan in [nan] is True?
                            
                                Special (magic) methods in Python [closed]
                            
                                Java vs Python on Hadoop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What do I need K.clear_session() and del model for (Keras with Tensorflow-gpu)?

Tags:

python

memory-management

tensorflow

keras

benjamin

People also ask

1 Answers

Chris Swinchatt

Recent Activity

Donate For Us