Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tensorflow using 2 GPU at the same time

First, I'm still newbie in tensorflow. I'm using v0.9 and trying to use the 2 GPUs installed in the machine we have. So, here is what's happening:

  1. When I launch a training data script on the machine, it works only on one of the 2 GPUs. It takes the first one by default gpu:0/.
  2. When I launch another training data script to run on the second GPU (after doing the changes needed i.e. with tf.device..) while keeping the first process running on the first GPU, tensorflow kills the first process and use only the second GPU to run the second process. So it seems only one process at a time is allowed by tensorflow?

What I need is: to be able to launch two separate training data scripts for 2 differents models on 2 different GPUs installed on the same machine. Am I missing something in this case? Is this the expected behavior? Should I go through distributed tensorflow on a local machine to do so?

like image 452
Maystro Avatar asked May 23 '17 12:05

Maystro


Video Answer


2 Answers

Tensorflow tries to allocate some space on every GPU it sees.

To work around this, make Tensorflow see a single (and different) GPU for every script: to do that, you have to use the environment variable CUDA_VISIBLE_DEVICES in this way:

CUDA_VISIBLE_DEVICES=0 python script_one.py
CUDA_VISIBLE_DEVICES=1 python script_two.py

In both script_one.py and script_two.py use tf.device("/gpu:0") to place the device on the only GPU that it sees.

like image 199
nessuno Avatar answered Oct 19 '22 00:10

nessuno


So it seems only one process at a time is allowed by tensorflow?

Nope. I mean, there is no such limit.

Is this the expected behavior? Should I go through distributed tensorflow on a local machine to do so?

It is not the expected behavior, there may be a problem since what you want to do I perfectly possible (I'm currently running it).


First, CUDA used an environement variable CUDA_VISIBLE_DEVICE that, as you can guess, set visible GPUs for the session.

This means, if you want to run two process on different GPU the easier way is to open two console and do:

single GPU process (#1):

export CUDA_VISIBLE_DEVICE=0
./train.py

single GPU process (#2):

export CUDA_VISIBLE_DEVICE=1
./train.py

My guess is that your CUDA_VISIBLE_DEVICE is somehow set to O (or 1) which indeed would be cause problem.

If you want to use both GPU for one process you can run:

Dual-GPU process:

export CUDA_VISIBLE_DEVICE=0,1
./train.py

or even:

CPU Process (disable GPU):

export CUDA_VISIBLE_DEVICE=
./train.py

Hope it helps
pltrdy

like image 6
pltrdy Avatar answered Oct 19 '22 00:10

pltrdy