Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit GPU devices in Tensorflow

I am developing in Python an application which uses Tensorflow and another model which with GPUs. I have a PC with many GPUs (3xNVIDIA GTX1080), due to the fact that all models try to use all available GPUs, resulting in OUT_OF_MEMORY_ERROR, I have found that you can assign a specific GPU to a Python script with

os.environ['CUDA_VISIBLE_DEVICES'] = '1'

Here I attach a snippet of my FCN class

class FCN:
  def __init__(self):
    os.environ['CUDA_VISIBLE_DEVICES'] = '1'
    self.keep_probability = tf.placeholder(tf.float32, name="keep_probabilty")
    self.image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image")
    self.annotation = tf.placeholder(tf.int32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 1], name="annotation")

    self.pred_annotation, logits = inference(self.image, self.keep_probability)
    tf.summary.image("input_image", self.image, max_outputs=2)
    tf.summary.image("ground_truth", tf.cast(self.annotation, tf.uint8), max_outputs=2)
    tf.summary.image("pred_annotation", tf.cast(self.pred_annotation, tf.uint8), max_outputs=2)
    self.loss = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                                               labels=tf.squeeze(self.annotation,
                                                                                                 squeeze_dims=[3]),
                                                                               name="entropy")))
    tf.summary.scalar("entropy", self.loss)

...

Inside the same file FCN.py, I have a little main which uses the class and when Tensorflow prints the output I can see that only 1 GPU is used, as I expect.

if __name__ == "__main__":
  fcn = FCN()
  fcn.train_model()

  images_dir = '/home/super/datasets/MeterDataset/full-dataset-gas-images/'
  for img_file in os.listdir(images_dir):
    fcn.segment(os.path.join(images_dir, img_file))

Output:

2018-01-09 11:31:57.351029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:09:00.0
Total memory: 7.92GiB
Free memory: 7.60GiB
2018-01-09 11:31:57.351047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2018-01-09 11:31:57.351051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2018-01-09 11:31:57.351057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:09:00.0)

The problem arises when I try to instantiate the FCN object from another script.

def main(args):
  start_time = datetime.now()

  font = cv2.FONT_HERSHEY_SIMPLEX

  results_file = "../results.txt"
  if os.path.exists(results_file):
    os.remove(results_file)

  results_file = open(results_file, "a")

  fcn = FCN()

Here the creation of the object always uses all 3 GPUs instead of using the only assigned into the __init__() method.

Here the undesired output:

2018-01-09 11:41:02.537548: I 

tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 1 2 
2018-01-09 11:41:02.537555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y Y Y 
2018-01-09 11:41:02.537558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 1:   Y Y Y 
2018-01-09 11:41:02.537561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 2:   Y Y Y 
2018-01-09 11:41:02.537567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:0b:00.0)
2018-01-09 11:41:02.537571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:09:00.0)
2018-01-09 11:41:02.537574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
like image 582
caleale90 Avatar asked Jan 09 '18 10:01

caleale90


People also ask

How do I limit GPU usage in TensorFlow?

Limiting GPU memory growth To limit TensorFlow to a specific set of GPUs, use the tf.config.set_visible_devices method. In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.

How do I limit GPU usage?

Lowering the games resolution will decrease GPU load. Cap the FPS so it doesn't render unnecessary frames. Lowering resolution just means the GPU's work goes into making more frames. It will still run at 100%.

Can TensorFlow run on multiple GPU?

Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes.


1 Answers

Here's what you can do:

  • Run your script with CUDA_VISIBLE_DEVICES environment variable already setup, as discussed here:

    CUDA_VISIBLE_DEVICES=1 python another_script.py
    
  • Provide an explicit configuration to the Session constructor:

    config = tf.ConfigProto(device_count={'GPU': 1})
    sess = tf.Session(config=config)
    

    ... to force tensorflow use only one GPU, not matter how many there are available. You can also set fine-grained list of devices via visible_device_list (see config.proto for the details).

like image 93
Maxim Avatar answered Sep 18 '22 00:09

Maxim