TensorFlow always (pre-)allocates all free memory (VRAM) on my graphics card, which is ok since I want my simulations to run as fast as possible on my workstation.
However, I would like to log how much memory (in sum) TensorFlow really uses. Additionally it would be really nice, if I could also log how much memory single tensors use.
This information is important to measure and compare the memory size that different ML/AI architectures need.
Any tips?
Step 1: Right-click on the taskbar and select “Task Manager” from the menu. Alternatively, you can press the “Ctrl + Shift + Esc” keys together to open Task Manager. Step 2: In the Task Manager window, click on “More details” to check out VRAM usage if you don't see anything.
Limiting GPU memory growth To limit TensorFlow to a specific set of GPUs, use the tf. config. set_visible_devices method. In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.
If a TensorFlow operation has both CPU and GPU implementations, TensorFlow will automatically place the operation to run on a GPU device first. If you have more than one GPU, the GPU with the lowest ID will be selected by default. However, TensorFlow does not place operations into multiple GPUs automatically.
Update, can use TensorFlow ops to query allocator:
# maximum across all sessions and .run calls so far
sess.run(tf.contrib.memory_stats.MaxBytesInUse())
# current usage
sess.run(tf.contrib.memory_stats.BytesInUse())
Also you can get detailed information about session.run
call including all memory being allocations during run
call by looking at RunMetadata
. IE something like this
run_metadata = tf.RunMetadata()
sess.run(c, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE, output_partition_graphs=True), run_metadata=run_metadata)
Here's an end-to-end example -- take column vector, row vector and add them to get a matrix of additions:
import tensorflow as tf
no_opt = tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0,
do_common_subexpression_elimination=False,
do_function_inlining=False,
do_constant_folding=False)
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=no_opt),
log_device_placement=True, allow_soft_placement=False,
device_count={"CPU": 3},
inter_op_parallelism_threads=3,
intra_op_parallelism_threads=1)
sess = tf.Session(config=config)
with tf.device("cpu:0"):
a = tf.ones((13, 1))
with tf.device("cpu:1"):
b = tf.ones((1, 13))
with tf.device("cpu:2"):
c = a+b
sess = tf.Session(config=config)
run_metadata = tf.RunMetadata()
sess.run(c, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE, output_partition_graphs=True), run_metadata=run_metadata)
with open("/tmp/run2.txt", "w") as out:
out.write(str(run_metadata))
If you open run.txt
you'll see messages like this:
node_name: "ones"
allocation_description {
requested_bytes: 52
allocator_name: "cpu"
ptr: 4322108320
}
....
node_name: "ones_1"
allocation_description {
requested_bytes: 52
allocator_name: "cpu"
ptr: 4322092992
}
...
node_name: "add"
allocation_description {
requested_bytes: 676
allocator_name: "cpu"
ptr: 4492163840
So here you can see that a
and b
allocated 52 bytes each (13*4), and the result allocated 676 bytes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With