MirroredStrategy without NCCL

Tags:

tensorflow

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 x64
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.8.0
Python version: 3.6
Bazel version (if compiling from source): -
GCC/Compiler version (if compiling from source): -
CUDA/cuDNN version: 9.0
GPU model and memory: 3.5
Exact command to reproduce: simple_tfkeras_example.py

I would like to use MirroredStrategy to use multiple GPUs in the same machine. I tried one of the examples: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/distribute/python/examples/simple_tfkeras_example.py

The result is: ValueError: Op type not registered 'NcclAllReduce' in binary running on RAID. Make sure the Op and Kernel are registered in the binary running in this process. while building NodeDef 'NcclAllReduce'

I am using Windows, therefore Nccl is not available. Is it possible to force TensorFlow not to use this library?

534

asked Jun 05 '18 13:06

Sanyo

1 Answers

There are some binaries for NCCL on Windows, but they can be quite annoying to deal with.

As an alternative, Tensorflow gives you three other options in MirroredStrategy that are compatible with Windows natively. They are Hierarchical Copy, Reduce to First GPU, and Reduce to CPU. What you are most likely looking for is Hierarchical Copy, but you can test each of them to see what gives you the best result.

If you are using tensorflow versions older than 2.0, you will use tf.contrib.distribute:

# Hierarchical Copy
cross_tower_ops = tf.contrib.distribute.AllReduceCrossTowerOps(
        'hierarchical_copy', num_packs=number_of_gpus))
    strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)

# Reduce to First GPU
cross_tower_ops = tf.contrib.distribute. ReductionToOneDeviceCrossTowerOps()
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)

# Reduce to CPU
cross_tower_ops = tf.contrib.distribute. ReductionToOneDeviceCrossTowerOps(
    reduce_to_device="/device:CPU:0")
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)

After 2.0, you only need to use tf.distribute! Here is an example setting up an Xception model with 2 GPUs:

strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"], 
                                          cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
with strategy.scope():
    parallel_model = Xception(weights=None,
                              input_shape=(299, 299, 3),
                              classes=number_of_classes)
    parallel_model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

159

answered Nov 15 '22 08:11

Austin

Related questions
                            
                                How to choose convolution strides dynamically?
                            
                                Feature Columns Embedding lookup
                            
                                Tensorflow Dataset extremely slow compared to queues
                            
                                ValueError: Shape must be rank 1 but is rank 0 for 'ROIAlign/Crop' (op: 'CropAndResize') with input shapes: [2,360,475,3], [1,4], [], [2]
                            
                                Keyboard interrupt tensorflow run and save at that point
                            
                                Where is `*` documented in tensorflow?
                            
                                Tensorflow - Can't convert Operation to Tensor
                            
                                Tensorflow Object Detection API has slow inference time with tensorflow serving
                            
                                Why does this neural network learn nothing?
                            
                                Why does tf.matmul(a,b, transpose_b=True) work, but not tf.matmul(a, tf.transpose(b))?
                            
                                Keras - model.predict return classes and not probabilities
                            
                                If you use plus sign instead of tf.add, will tensorflow still calculate gradients correctly?
                            
                                How to use tensorflow seq2seq without embeddings?
                            
                                How do I visualize or plot a multidimensional tensor?
                            
                                tensorflow code TypeError: unsupported operand type(s) for *: 'int' and 'Flag'
                            
                                How to create a tensorflow dataset from a DataFrame with vector columns?
                            
                                Parallel threads with TensorFlow Dataset API and flat_map
                            
                                How does data normalization work in keras during prediction?
                            
                                Difference between tf.clip_by_value and tf.clip_by_global_norm for RNN's and how to decide max value to clip on?
                            
                                ImportError: Could not find 'cudart64_100.dll

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With