I've noticed that a recent model warns that 2.37G of memory wasn't able to be allocated:
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.37GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
But my GPU is operating at nearly 100% (small input compared to a large model in this case).
If I am reading this correctly, I assume that my model did not fit entirely in GPU memory. However since the GPU is running at 100% am I also to assume that tensorflow is intelligently swapping graph elements in and out of GPU memory asynchronously?
I'm just curious to know what's going on under the hood there.
To know what is going on under the hood add this code to your run function:
run_metadata = tf.RunMetadata()
sess = tf.Session(config=config)
sess.run(train_step,
feed_dict={x: batch_xs,
y_: batch_ys},
options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE),
run_metadata=run_metadata)
trace = timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.ctf.json', 'w') as trace_file:
trace_file.write(trace.generate_chrome_trace_format())
and then open the generated timeline.ctf.json from the chrome://timeline interface and you will see what is going on under the hood.
It is very likely that is swapping GPU memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With