I've trained 3 models and am now running code that loads each of the 3 checkpoints in sequence and runs predictions using them. I'm using the GPU.
When the first model is loaded it pre-allocates the entire GPU memory (which I want for working through the first batch of data). But it doesn't unload memory when it's finished. When the second model is loaded, using both tf.reset_default_graph()
and with tf.Graph().as_default()
the GPU memory still is fully consumed from the first model, and the second model is then starved of memory.
Is there a way to resolve this, other than using Python subprocesses or multiprocessing to work around the problem (the only solution I've found on via google searches)?
Limiting GPU memory growth To limit TensorFlow to a specific set of GPUs, use the tf. config. set_visible_devices method. In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.
In my model, it appears that “cuda runtime error(2): out of memory” is occurring due to a GPU memory drain. Because PyTorch typically manages large amounts of data, failure to recognize small errors can cause your program to crash to the ground without all its GPU being available.
A git issue from June 2016 (https://github.com/tensorflow/tensorflow/issues/1727) indicates that there is the following problem:
currently the Allocator in the GPUDevice belongs to the ProcessState, which is essentially a global singleton. The first session using GPU initializes it, and frees itself when the process shuts down.
Thus the only workaround would be to use processes and shut them down after the computation.
Example Code:
import tensorflow as tf
import multiprocessing
import numpy as np
def run_tensorflow():
n_input = 10000
n_classes = 1000
# Create model
def multilayer_perceptron(x, weight):
# Hidden layer with RELU activation
layer_1 = tf.matmul(x, weight)
return layer_1
# Store layers weight & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
pred = multilayer_perceptron(x, weights)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(100):
batch_x = np.random.rand(10, 10000)
batch_y = np.random.rand(10, 1000)
sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
print "finished doing stuff with tensorflow!"
if __name__ == "__main__":
# option 1: execute code with extra process
p = multiprocessing.Process(target=run_tensorflow)
p.start()
p.join()
# wait until user presses enter key
raw_input()
# option 2: just execute the function
run_tensorflow()
# wait until user presses enter key
raw_input()
So if you would call the function run_tensorflow()
within a process you created and shut the process down (option 1), the memory is freed. If you just run run_tensorflow()
(option 2) the memory is not freed after the function call.
You can use numba library to release all the gpu memory
pip install numba
from numba import cuda
device = cuda.get_current_device()
device.reset()
This will release all the memory
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With