I am developing in Python an application which uses Tensorflow and another model which with GPUs. I have a PC with many GPUs (3xNVIDIA GTX1080), due to the fact that all models try to use all available GPUs, resulting in OUT_OF_MEMORY_ERROR, I have found that you can assign a specific GPU to a Python script with
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
Here I attach a snippet of my FCN class
class FCN:
def __init__(self):
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
self.keep_probability = tf.placeholder(tf.float32, name="keep_probabilty")
self.image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image")
self.annotation = tf.placeholder(tf.int32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 1], name="annotation")
self.pred_annotation, logits = inference(self.image, self.keep_probability)
tf.summary.image("input_image", self.image, max_outputs=2)
tf.summary.image("ground_truth", tf.cast(self.annotation, tf.uint8), max_outputs=2)
tf.summary.image("pred_annotation", tf.cast(self.pred_annotation, tf.uint8), max_outputs=2)
self.loss = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
labels=tf.squeeze(self.annotation,
squeeze_dims=[3]),
name="entropy")))
tf.summary.scalar("entropy", self.loss)
...
Inside the same file FCN.py
, I have a little main which uses the class and when Tensorflow prints the output I can see that only 1 GPU is used, as I expect.
if __name__ == "__main__":
fcn = FCN()
fcn.train_model()
images_dir = '/home/super/datasets/MeterDataset/full-dataset-gas-images/'
for img_file in os.listdir(images_dir):
fcn.segment(os.path.join(images_dir, img_file))
Output:
2018-01-09 11:31:57.351029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:09:00.0
Total memory: 7.92GiB
Free memory: 7.60GiB
2018-01-09 11:31:57.351047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2018-01-09 11:31:57.351051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2018-01-09 11:31:57.351057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:09:00.0)
The problem arises when I try to instantiate the FCN object from another script.
def main(args):
start_time = datetime.now()
font = cv2.FONT_HERSHEY_SIMPLEX
results_file = "../results.txt"
if os.path.exists(results_file):
os.remove(results_file)
results_file = open(results_file, "a")
fcn = FCN()
Here the creation of the object always uses all 3 GPUs instead of using the only assigned into the __init__()
method.
Here the undesired output:
2018-01-09 11:41:02.537548: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 1 2
2018-01-09 11:41:02.537555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y Y Y
2018-01-09 11:41:02.537558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 1: Y Y Y
2018-01-09 11:41:02.537561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 2: Y Y Y
2018-01-09 11:41:02.537567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:0b:00.0)
2018-01-09 11:41:02.537571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:09:00.0)
2018-01-09 11:41:02.537574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
Limiting GPU memory growth To limit TensorFlow to a specific set of GPUs, use the tf.config.set_visible_devices method. In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.
Lowering the games resolution will decrease GPU load. Cap the FPS so it doesn't render unnecessary frames. Lowering resolution just means the GPU's work goes into making more frames. It will still run at 100%.
Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes.
Here's what you can do:
Run your script with CUDA_VISIBLE_DEVICES
environment variable already setup, as discussed here:
CUDA_VISIBLE_DEVICES=1 python another_script.py
Provide an explicit configuration to the Session
constructor:
config = tf.ConfigProto(device_count={'GPU': 1})
sess = tf.Session(config=config)
... to force tensorflow use only one GPU, not matter how many there are available. You can also set fine-grained list of devices via visible_device_list
(see config.proto
for the details).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With