Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between MirroredStrategy and CentralStorageStrategy

I read the documentation of both CentralStorageStrategy and MirroredStrategy, but can not understand the essence of difference between them.

In MirroredStrategy:

Each variable in the model is mirrored across all the replicas.

In CentralStorageStrategy:

Variables are not mirrored, instead they are placed on the CPU and operations are replicated across all local GPUs.

Source: https://www.tensorflow.org/guide/distributed_training

What does it mean in practice? What are use cases for the CentralStorageStrategy and how does the training work if variables are placed on the CPU in this strategy?

like image 559
Victor Avatar asked Dec 11 '19 11:12

Victor


People also ask

What is MirroredStrategy?

MirroredStrategy , that supports synchronous distributed training on multiple GPUs on one server. It creates one replica per GPU device. Each variable in the model is mirrored across all the replicas. These variables are kept in sync with each other by applying identical updates.

Can TensorFlow use multiple GPUs?

Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes. tf.

How does TensorFlow parallelize?

The TensorFlow runtime parallelizes graph execution across many different dimensions: The individual ops have parallel implementations, using multiple cores in a CPU, or multiple threads in a GPU.


1 Answers

Consider one particular variable (call it "my_var") in your usual, single-GPU, non-distributed use case (e.g. a weight matrix of a convolutional layer).

If you use 4 GPUs, MirroredStrategy will create 4 variables instead of "my_var" variable, one on each GPU. However each variable will have the same value, because they are always updated in the same way. So the variable updates happen in sync on all the GPUs.

In case of the CentralStorageStrategy, only one variable is created for "my_var", in the host (CPU) memory. The updates only happen in one place.

Which one is better probably depends on the computer's topology and how fast CPU-GPU communication is compared with GPU-GPU. If the GPUs can communicate fast with each other, MirroredStrategy may be more efficient. But I'd benchmark it to be sure.

like image 107
isarandi Avatar answered Oct 16 '22 11:10

isarandi