Does TensorFlow by default use all available GPUs in the machine?

Tags:

I have 3 GTX Titan GPUs in my machine. I run the example provided in Cifar10 with cifar10_train.py and got the following output:

I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0:   Y N 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 1:   N Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN, pci bus id: 0000:84:00.0)

It looks to me that TensorFlow is trying to initialize itself on two devices (gpu0 and gpu1).

My question is why it only does that on two devices and is there any way to prevent that? (I only want it to run as if there is a single GPU)

412

asked Jan 17 '16 03:01

Zk1001

1 Answers

See: Using GPUs

Manual device placement

If you would like a particular operation to run on a device of your choice instead of what's automatically selected for you, you can use with tf.device to create a device context such that all the operations within that context will have the same device assignment.

# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

You will see that now a and b are assigned to cpu:0. Since a device was not explicitly specified for the MatMul operation, the TensorFlow runtime will choose one based on the operation and available devices (gpu:0 in this example) and automatically copy tensors between devices if required.

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

Earlier Answer 2.

See: Using GPUs

Using a single GPU on a multi-GPU system

If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default. If you would like to run on a different GPU, you will need to specify the preference explicitly:

# Creates a graph.
with tf.device('/gpu:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)

Earlier Answer 1.

From CUDA_VISIBLE_DEVICES – Masking GPUs

Does your CUDA application need to target a specific GPU? If you are writing GPU enabled code, you would typically use a device query to select the desired GPUs. However, a quick and easy solution for testing is to use the environment variable CUDA_VISIBLE_DEVICES to restrict the devices that your CUDA application sees. This can be useful if you are attempting to share resources on a node or you want your GPU enabled executable to target a specific GPU.

Environment Variable Syntax

Results

CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible CUDA_VISIBLE_DEVICES=”0,1” Same as above, quotation marks are optional CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked

CUDA will enumerate the visible devices starting at zero. In the last case, devices 0, 2, 3 will appear as devices 0, 1, 2. If you change the order of the string to “2,3,0”, devices 2,3,0 will be enumerated as 0,1,2 respectively. If CUDA_VISIBLE_DEVICES is set to a device that does not exist, all devices will be masked. You can specify a mix of valid and invalid device numbers. All devices before the invalid value will be enumerated, while all devices after the invalid value will be masked.

To determine the device ID for the available hardware in your system, you can run NVIDIA’s deviceQuery executable included in the CUDA SDK. Happy programming!

Chris Mason

190

answered Sep 27 '22 18:09

Guy Coder

Related questions
                            
                                What is the relation between validation_data and validation_split in Keras' fit function?
                            
                                Problems obtaining most informative features with scikit learn?
                            
                                Use attribute and target matrices for TensorFlow Linear Regression Python
                            
                                Understanding `width_shift_range` and `height_shift_range` arguments in Keras's ImageDataGenerator class
                            
                                Building an SVM with Tensorflow
                            
                                TensorFlow in production for real time predictions in high traffic app - how to use?
                            
                                TensorFlow: does tf.train.batch automatically load the next batch when the batch has finished training?
                            
                                Spark Word2vec vector mathematics
                            
                                TensorBoard: How to plot histogram for gradients?
                            
                                PyTorch: What's the difference between state_dict and parameters()?
                            
                                Do you apply min max scaling separately on training and test data?
                            
                                Class wise precision and recall for multi class classification in Tensorflow?
                            
                                Pytorch: Image label
                            
                                Apply PCA on very large sparse matrix
                            
                                loc function in pandas
                            
                                How can I separate runs of my TensorFlow code in TensorBoard?
                            
                                keras flow_from_directory over or undersample a class
                            
                                What is endpoint error between optical flows?
                            
                                How to include SimpleImputer before CountVectorizer in a scikit-learn Pipeline?
                            
                                How does the KD-tree nearest neighbor search work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does TensorFlow by default use all available GPUs in the machine?

Tags:

machine-learning

tensorflow

computer-vision

gpu

Zk1001

People also ask

1 Answers

Guy Coder

Recent Activity

Donate For Us