Tensorflow multiple sessions with multiple GPUs

Tags:

tensorflow

gpu

I have a workstation with 2 GPUs and I am trying to run multiple tensorflow jobs at the same time, so I can train more than one model at once, etc.

For example, I've tried to separate the sessions into different resources via the python API using in script1.py:

with tf.device("/gpu:0"):     # do stuff

in script2.py:

with tf.device("/gpu:1"):     # do stuff

in script3.py

with tf.device("/cpu:0"):     # do stuff

If I run each script by itself I can see that it is using the specified device. (Also the models fit very well into a single GPU and doesn't use another one even if both are available.)

However, if one script is running and I try to run another, I always get this error:

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8 I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:909] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 0 with properties:  name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate (GHz) 1.2155 pciBusID 0000:01:00.0 Total memory: 4.00GiB Free memory: 187.65MiB I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:909] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 1 with properties:  name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate (GHz) 1.2155 pciBusID 0000:04:00.0 Total memory: 4.00GiB Free memory: 221.64MiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 1  I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0:   Y Y  I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 1:   Y Y  I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating    TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:01:00.0) I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 980, pci bus id: 0000:04:00.0) I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 187.40MiB bytes. E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 187.40M (196505600 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Check failed: gpu_mem != nullptr  Could not allocate GPU device memory for device 0. Tried to allocate 187.40MiB Aborted (core dumped)

It seems each tensorflow process is trying to grab all of the GPUs on the machine when it loads even if not all devices are going to be used to run the model.

I see there is an option to limit the amount of GPU each process uses

tf.GPUOptions(per_process_gpu_memory_fraction=0.5)

...I haven't tried it, but this seems like it would make two processes try to share 50% of each GPU instead of running each process on a separate GPU...

Does anyone know how to configure tensorflow to use only one GPU and leave the other available for another tensorflow process?

804

asked Jan 13 '16 19:01

j314erre

1 Answers

TensorFlow will attempt to use (an equal fraction of the memory of) all GPU devices that are visible to it. If you want to run different sessions on different GPUs, you should do the following.

Run each session in a different Python process.
Start each process with a different value for the CUDA_VISIBLE_DEVICES environment variable. For example, if your script is called my_script.py and you have 4 GPUs, you could run the following:
```
$ CUDA_VISIBLE_DEVICES=0 python my_script.py  # Uses GPU 0. $ CUDA_VISIBLE_DEVICES=1 python my_script.py  # Uses GPU 1. $ CUDA_VISIBLE_DEVICES=2,3 python my_script.py  # Uses GPUs 2 and 3. 
```
Note the GPU devices in TensorFlow will still be numbered from zero (i.e. "/gpu:0" etc.), but they will correspond to the devices that you have made visible with CUDA_VISIBLE_DEVICES.

answered Oct 01 '22 22:10

mrry

Related questions
                            
                                Does TensorFlow by default use all available GPUs in the machine?
                            
                                Is it fair to compare SSE/AVX units to GPU cores?
                            
                                Java on GPU: Complete Method directly on GPUin plain Java
                            
                                Double precision floating point in CUDA
                            
                                The variation of cache misses in GPU
                            
                                nvidia-smi Failed to initialize NVML: GPU access blocked by the operating system
                            
                                Low GPU usage by Keras / Tensorflow?
                            
                                Stack guard might have disabled?
                            
                                Is it possible to access hard disk directly from gpu?
                            
                                what is difference between "-arch sm_13" and "-arch sm_20"
                            
                                GPU shared memory size is very small - what can I do about it?
                            
                                How to install Keras with gpu support?
                            
                                How to select batch size automatically to fit GPU?
                            
                                AMD equivalent to NvOptimusEnablement
                            
                                Convert video with ffmpeg using gpu
                            
                                Compiling an OpenCL program using a CL/cl.h file
                            
                                Tensorflow Deep MNIST: Resource exhausted: OOM when allocating tensor with shape[10000,32,28,28]
                            
                                help me understand cuda
                            
                                GPU accelerated math in the browser
                            
                                How is CUDA memory managed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With