what's a good ratio of parameter servers to masters in distributed tensorflow?

Question

Suppose I have 10 machines with 2 GPU each and I want to run a distributed TensorFlow cluster. How many parameter servers should I allocate VS masters?

Yaroslav Bulatov · Accepted Answer

A good heuristic is to allocate the smallest number of parameter servers so that network bandwidth does not become a bottleneck.

For instance, suppose you have 10 million parameters, and each computation step takes 1 second. This means each second a worker sends 40MB parameter update vector and receives the same size parameter vector. So each worker needs 320 Mbps duplex bandwidth. Suppose you have 10 workers. With a single parameter server, your PS server will require 3.2 Gbps bandwidth.

Now suppose your network cards are capable of 1 Gbps full-duplex. To avoid saturating ethernet cards, you will need at least 4 parameter server workers.

what's a good ratio of parameter servers to masters in distributed tensorflow?

Tags:

tensorflow

distributed

Fra

1 Answers

Yaroslav Bulatov

Recent Activity

Donate For Us

what's a good ratio of parameter servers to masters in distributed tensorflow?

Tags:

tensorflow

distributed

Fra

1 Answers

Yaroslav Bulatov

Related questions

Recent Activity

Donate For Us