I trained a ResNet50 model using Tensorflow 2.0 by transfer learning. I slightly modified the architecture (new classification layer) and saved the model with the ModelCheckpoint callback https://keras.io/callbacks/#modelcheckpoint during training. Training was fine. The model saved by callback takes ~206 MB on the hard drive.
To predict using the model I did:
I started a Jupyter Lab notebook. I used my_model = tf.keras.models.load_model('../models_using/my_model.hdf5')
for loading the model. (btw, the same occurs using IPython).
I used the free
linux command line tool to measure the free RAM just before the loading and after. The model loading takes about 5 GB of RAM.
I saved the weights of the model and the config as json. This takes about 105 MB.
I loaded the model from the json config and weights. This takes about ~200 MB of RAM.
Compared the predictions of both models. Exactly the same.
I tested the same procedure with a slightly different architeture (trained the same way) and the results were the same.
Can anyone explain the huge RAM usage, and the difference in size of the models on the hard drive?
Btw, given a model in Keras, can you find out the compliation procedure ( optimizer,..)? Model.summary() does not help..
2019-12-07 - EDIT: Thanks to this answer, I conducted a series of tests:
I used the !free
command in JupyterLab to measure the available memory before and after each test. Since I get_weights
returns a list, I used copy.deepcopy
to really copy the objects. Note, the commands below were separate Jupyter cells and the memory comments were added just for this answer.
!free
model = tf.keras.models.load_model('model.hdf5', compile=True)
# 25278624 - 21491888 = 3786.736 MB used
!free
weights = copy.deepcopy(model.get_weights())
# 21491888 - 21440272 = 51.616 MB used
!free
optimizer_weights = copy.deepcopy(model.optimizer.get_weights())
# 21440272 - 21339404 = 100.868 MB used
!free
model2 = tf.keras.models.load_model('model.hdf5', compile=False)
# 21339404 - 21140176 = 199.228 MB used
!free
Loading the model from json:
!free
# loading from json
with open('model_json.json') as f:
model_json_weights = tf.keras.models.model_from_json(f.read())
model_json_weights.load_weights('model_weights.h5')
!free
# 21132664 - 20971616 = 161.048 MB used
The difference between checkpoint and JSON+Weights is in the optimizer:
model.save()
save the optimizer and its weights (load_model
compiles the model) Unless you are using a very simple optimizer, it's normal for it to have about the same number of weights as the model (a tensor of "momentum" for each weight tensor, for instance).
Some optimizers might take two times the size of the model, because it has two tensors of optimizer weights for each tensor of model weights.
Saving and loading the optimizer is important if you want to continue training. Starting training again with a new optimizer without proper weights will sort of destroy the model's performance (at least in the beginning).
Now, the 5GB is not really clear to me. But I suppose that:
Interesting tests:
model.get_weights()
and model.optimizer.get_weights()
. These weights will be numpy, copied from the original tensors load_model(name, compile=True)
load_model(name, compile=False)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With