What I am doing
I am training and using a convolutional neuron network (CNN) for image-classification using Keras with Tensorflow-gpu as backend.
What I am using
- PyCharm Community 2018.1.2
- both Python 2.7 and 3.5 (but not both at a time)
- Ubuntu 16.04
- Keras 2.2.0
- Tensorflow-GPU 1.8.0 as backend
What I want to know
In many codes I see people using
from keras import backend as K # Do some code, e.g. train and save model K.clear_session()
or deleting the model after using it:
del model
The keras documentation says regarding clear_session
: "Destroys the current TF graph and creates a new one. Useful to avoid clutter from old models / layers." - https://keras.io/backend/
What is the point of doing that and should I do it as well? When loading or creating a new model my model gets overwritten anyway, so why bother?
Calling clear_session() releases the global state: this helps avoid clutter from old models and layers, especially when memory is limited. # and memory consumption is constant over time.
Keras is a neural network Application Programming Interface (API) for Python that is tightly integrated with TensorFlow, which is used to build machine learning models. Keras' models offer a simple, user-friendly way to define a neural network, which will then be built for you by TensorFlow.
keras models will transparently run on a single GPU with no code changes required.
Currently only TensorFlow backend supports proper cleaning up of the session. This can be done by calling K. clear_session() . This will remove EVERYTHING from memory (models, optimizer objects and anything that has tensors internally).
K.clear_session()
is useful when you're creating multiple models in succession, such as during hyperparameter search or cross-validation. Each model you train adds nodes (potentially numbering in the thousands) to the graph. TensorFlow executes the entire graph whenever you (or Keras) call tf.Session.run()
or tf.Tensor.eval()
, so your models will become slower and slower to train, and you may also run out of memory. Clearing the session removes all the nodes left over from previous models, freeing memory and preventing slowdown.
Edit 21/06/19:
TensorFlow is lazy-evaluated by default. TensorFlow operations aren't evaluated immediately: creating a tensor or doing some operations to it creates nodes in a dataflow graph. The results are calculated by evaluating the relevant parts of the graph in one go when you call tf.Session.run()
or tf.Tensor.eval()
. This is so TensorFlow can build an execution plan that allocates operations that can be performed in parallel to different devices. It can also fold adjacent nodes together or remove redundant ones (e.g. if you concatenated two tensors and later split them apart again unchanged). For more details, see https://www.tensorflow.org/guide/graphs
All of your TensorFlow models are stored in the graph as a series of tensors and tensor operations. The basic operation of machine learning is tensor dot product - the output of a neural network is the dot product of the input matrix and the network weights. If you have a single-layer perceptron and 1,000 training samples, then each epoch creates at least 1,000 tensor operations. If you have 1,000 epochs, then your graph contains at least 1,000,000 nodes at the end, before taking into account preprocessing, postprocessing, and more complex models such as recurrent nets, encoder-decoder, attentional models, etc.
The problem is that eventually the graph would be too large to fit into video memory (6 GB in my case), so TF would shuttle parts of the graph from video to main memory and back. Eventually it would even get too large for main memory (12 GB) and start moving between main memory and the hard disk. Needless to say, this made things incredibly, and increasingly, slow as training went on. Before developing this save-model/clear-session/reload-model flow, I calculated that, at the per-epoch rate of slowdown I experienced, my model would have taken longer than the age of the universe to finish training.
Disclaimer: I haven't used TensorFlow in almost a year, so this might have changed. I remember there being quite a few GitHub issues around this so hopefully it has since been fixed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With