How does asynchronous training work in distributed Tensorflow?

Tags:

I've read Distributed Tensorflow Doc, and it mentions that in asynchronous training,

each replica of the graph has an independent training loop that executes without coordination.

From what I understand, if we use parameter-server with data parallelism architecture, it means each worker computes gradients and updates its own weights without caring about other workers updates for distributed training Neural Network. As all weights are shared on parameter server (ps), I think ps still has to coordinate (or aggregate) weight updates from all workers in some way. I wonder how does the aggregation work in asynchronous training. Or in more general words, how does asynchronous training work in distributed Tensorflow?

309

asked Mar 31 '17 18:03

Ruofan Kong

1 Answers

When you train asynchronously in Distributed TensorFlow, a particular worker does the following:

The worker reads all of the shared model parameters in parallel from the PS task(s), and copies them to the worker task. These reads are uncoordinated with any concurrent writes, and no locks are acquired: in particular the worker may see partial updates from one or more other workers (e.g. a subset of the updates from another worker may have been applied, or a subset of the elements in a variable may have been updated).
The worker computes gradients locally, based on a batch of input data and the parameter values that it read in step 1.
The worker sends the gradients for each variable to the appropriate PS task, and applies the gradients to their respective variable, using an update rule that is determined by the optimization algorithm (e.g. SGD, SGD with Momentum, Adagrad, Adam, etc.). The update rules typically use (approximately) commutative operations, so they may be applied independently on the updates from each worker, and the state of each variable will be a running aggregate of the sequence of updates received.

In asynchronous training, each update from the worker is applied concurrently, and the updates may be somewhat coordinated if the optional use_locking=True flag was set when the respective optimizer (e.g. tf.train.GradientDescentOptimizer) was initialized. Note however that the locking here only provides mutual exclusion for two concurrent updates, and (as noted above) reads do not acquire locks; the locking does not provide atomicity across the entire set of updates.

(By contrast, in synchronous training, a utility like tf.train.SyncReplicasOptimizer will ensure that all of the workers read the same, up-to-date values for each model parameter; and that all of the updates for a synchronous step are aggregated before they are applied to the underlying variables. To do this, the workers are synchronized by a barrier, which they enter after sending their gradient update, and leave after the aggregated update has been applied to all variables.)

172

answered Oct 14 '22 11:10

mrry

Related questions
                            
                                Python Numpy Data Types Performance
                            
                                why isn't numpy.mean multithreaded?
                            
                                How to remove specific element from sets inside a list using list comprehension
                            
                                how do you include a csrf token when testing a POST endpoint in django?
                            
                                How to make Matplotlib scatterplots transparent as a group?
                            
                                How to prepend a path to sys.path in Python?
                            
                                How to test aws lambda functions locally
                            
                                Fatal Python error on Windows 10 ModuleNotFoundError: No module named 'encodings'
                            
                                Should we gitignore the .python-version file?
                            
                                type object 'datetime.datetime' has no attribute 'fromisoformat'
                            
                                Serve a dynamically generated image with Django
                            
                                struct.error: unpack requires a string argument of length 4
                            
                                Get a subset of a generator
                            
                                What is the correct way to extend a parent class method in modern Python
                            
                                Overlay imshow plots in matplotlib
                            
                                Python: Using continue in a try-finally statement in a loop
                            
                                Python 3.3 - Unicode-objects must be encoded before hashing [duplicate]
                            
                                Big size of python image in Docker
                            
                                Suppress warning of url in beautifulsoup
                            
                                Does Python type hint (annotations) cause some run-time effects? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does asynchronous training work in distributed Tensorflow?

Tags:

python

asynchronous

neural-network

tensorflow

distributed

Ruofan Kong

People also ask

1 Answers

mrry

Recent Activity

Donate For Us