I am running FastRCNN w/ a ResNet50 architecture. I load the model checkpoint and do inference like this:
saver = tf.train.Saver()
saver.restore(sess, 'model/model.ckpt')
with tf.Session() as sess:
sess.run(y_pred, feed_dict={x: input_data})
Everything seems to be working great. The model takes 0.08s to actually perform inference.
But, I noticed that when I do this my GPU memory usage explodes to 15637MiB / 16280MiB
according to nvidia-smi
.
I found that you could use the option config.gpu_options.allow_growth
to stop Tensorflow from allocating the entire GPU and to instead use GPU memory as needed:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
saver = tf.train.Saver()
saver.restore(sess, 'model/model.ckpt')
with tf.Session(config=config) as sess:
sess.run(y_pred, feed_dict={x: input_data})
Doing this decrease memory usage down to 4875MiB / 16280MiB
. The model still takes 0.08s to run.
Finally, I did this below, where I allocate a fixed amount of memory using per_process_gpu_memory_fraction
.
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.05
saver = tf.train.Saver()
saver.restore(sess, 'model/model.ckpt')
with tf.Session(config=config) as sess:
sess.run(y_pred, feed_dict={x: input_data})
Doing this brings usage down to 1331MiB / 16280MiB
and the model still takes 0.08s to run.
This begs the question - how is TF allocating memory for models upon inference? If I want to load this model 10 times on the same GPU to perform inference in parallel, will that be an issue?
Let's ensure what happens in tf.Session(config=config)
firstly.
It means use submit the default graph def to tensorflow runtime, the runtime then allocate GPU memory accordingly.
Then Tensorflow will allocate all GPU memory unless you limit it by setting per_process_gpu_memory_fraction. It will fail if cannot allocate the amount of memory unless .gpu_options.allow_growth = True
, which tells TF try again to allocate less memory in case of failure, but the iteration always starts with all or fraction portion of GPU memory.
And if you have 10 sessions, each session requires less than 1/10 GPU memory, it should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With