First, I'm still newbie in tensorflow. I'm using v0.9 and trying to use the 2 GPUs installed in the machine we have. So, here is what's happening:
training data
script on the machine, it works only on one of the 2 GPUs. It takes the first one by default gpu:0/
.training data
script to run on the second GPU (after doing the changes needed i.e. with tf.device..
) while keeping the first process running on the first GPU, tensorflow kills the first process and use only the second GPU to run the second process. So it seems only one process at a time is allowed by tensorflow?What I need is: to be able to launch two separate training data
scripts for 2 differents models on 2 different GPUs installed on the same machine. Am I missing something in this case? Is this the expected behavior? Should I go through distributed tensorflow on a local machine to do so?
Tensorflow tries to allocate some space on every GPU it sees.
To work around this, make Tensorflow see a single (and different) GPU for every script: to do that, you have to use the environment variable CUDA_VISIBLE_DEVICES
in this way:
CUDA_VISIBLE_DEVICES=0 python script_one.py
CUDA_VISIBLE_DEVICES=1 python script_two.py
In both script_one.py
and script_two.py
use tf.device("/gpu:0")
to place the device on the only GPU that it sees.
So it seems only one process at a time is allowed by tensorflow?
Nope. I mean, there is no such limit.
Is this the expected behavior? Should I go through distributed tensorflow on a local machine to do so?
It is not the expected behavior, there may be a problem since what you want to do I perfectly possible (I'm currently running it).
First, CUDA
used an environement variable CUDA_VISIBLE_DEVICE
that, as you can guess, set visible GPUs for the session.
This means, if you want to run two process on different GPU the easier way is to open two console and do:
export CUDA_VISIBLE_DEVICE=0
./train.py
export CUDA_VISIBLE_DEVICE=1
./train.py
My guess is that your CUDA_VISIBLE_DEVICE
is somehow set to O (or 1) which indeed would be cause problem.
If you want to use both GPU for one process you can run:
export CUDA_VISIBLE_DEVICE=0,1
./train.py
or even:
export CUDA_VISIBLE_DEVICE=
./train.py
Hope it helpspltrdy
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With