I read the documentation of both CentralStorageStrategy
and MirroredStrategy
, but can not understand the essence of difference between them.
In MirroredStrategy
:
Each variable in the model is mirrored across all the replicas.
In CentralStorageStrategy
:
Variables are not mirrored, instead they are placed on the CPU and operations are replicated across all local GPUs.
Source: https://www.tensorflow.org/guide/distributed_training
What does it mean in practice? What are use cases for the CentralStorageStrategy
and how does the training work if variables are placed on the CPU in this strategy?
MirroredStrategy , that supports synchronous distributed training on multiple GPUs on one server. It creates one replica per GPU device. Each variable in the model is mirrored across all the replicas. These variables are kept in sync with each other by applying identical updates.
Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes. tf.
The TensorFlow runtime parallelizes graph execution across many different dimensions: The individual ops have parallel implementations, using multiple cores in a CPU, or multiple threads in a GPU.
Consider one particular variable (call it "my_var") in your usual, single-GPU, non-distributed use case (e.g. a weight matrix of a convolutional layer).
If you use 4 GPUs, MirroredStrategy will create 4 variables instead of "my_var" variable, one on each GPU. However each variable will have the same value, because they are always updated in the same way. So the variable updates happen in sync on all the GPUs.
In case of the CentralStorageStrategy, only one variable is created for "my_var", in the host (CPU) memory. The updates only happen in one place.
Which one is better probably depends on the computer's topology and how fast CPU-GPU communication is compared with GPU-GPU. If the GPUs can communicate fast with each other, MirroredStrategy may be more efficient. But I'd benchmark it to be sure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With