I would like to use MirroredStrategy to use multiple GPUs in the same machine. I tried one of the examples: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/distribute/python/examples/simple_tfkeras_example.py
The result is: ValueError: Op type not registered 'NcclAllReduce' in binary running on RAID. Make sure the Op and Kernel are registered in the binary running in this process. while building NodeDef 'NcclAllReduce'
I am using Windows, therefore Nccl is not available. Is it possible to force TensorFlow not to use this library?
Mirrored Strategy MirroredStrategy is a method that you can use to perform synchronous distributed training across multiple GPUs. Using this method, you can create replicas of your model variables which are mirrored across your GPUs.
Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes. tf.
Strategy. scope to specify that a strategy should be used when building an executing your model. (This puts you in the "cross-replica context" for this strategy, which means the strategy is put in control of things like variable placement.)
There are some binaries for NCCL on Windows, but they can be quite annoying to deal with.
As an alternative, Tensorflow gives you three other options in MirroredStrategy that are compatible with Windows natively. They are Hierarchical Copy, Reduce to First GPU, and Reduce to CPU. What you are most likely looking for is Hierarchical Copy, but you can test each of them to see what gives you the best result.
If you are using tensorflow versions older than 2.0, you will use tf.contrib.distribute:
# Hierarchical Copy
cross_tower_ops = tf.contrib.distribute.AllReduceCrossTowerOps(
'hierarchical_copy', num_packs=number_of_gpus))
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)
# Reduce to First GPU
cross_tower_ops = tf.contrib.distribute. ReductionToOneDeviceCrossTowerOps()
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)
# Reduce to CPU
cross_tower_ops = tf.contrib.distribute. ReductionToOneDeviceCrossTowerOps(
reduce_to_device="/device:CPU:0")
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)
After 2.0, you only need to use tf.distribute! Here is an example setting up an Xception model with 2 GPUs:
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"],
cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
with strategy.scope():
parallel_model = Xception(weights=None,
input_shape=(299, 299, 3),
classes=number_of_classes)
parallel_model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With