Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What has to be inside tf.distribute.Strategy.scope()?

I am currently playing around with Distribution Strategies in tensorflow 2.0 as described here https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/distribute/Strategy

I am wondering what has to go inside a with ...scope() block and what is "optional".

Specifically the following operations. Do I have to put ... inside a with ...scope() for distribution to work?:

  • Optimizer creation
  • Dataset creation
  • Dataset experimental_distribute_dataset
  • apply_gradients call
  • Dataset iteration for loop
  • experimental_run_v2

I have toyed around a little and my code seems to work even when I use no with ...scope at all. I am confused if this has some side-effects I am just not seeing right now.

Code without scope:

strat = tf.distribute.MirroredStrategy()

BATCH_SIZE_PER_REPLICA = 5

print('Replicas: ', strat.num_replicas_in_sync)

global_batch_size = (BATCH_SIZE_PER_REPLICA * strat.num_replicas_in_sync)

dataset = tf.data.Dataset.from_tensors(tf.random.normal([100])).repeat(1000).batch(
    global_batch_size)

g = Model('m', 10, 10, 1, 3)

dist_dataset = strat.experimental_distribute_dataset(dataset)

@tf.function
def train_step(dist_inputs):
  def step_fn(inputs):
    print([(v.name, v.device) for v in g.trainable_variables])
    return g(inputs)

  out = strat.experimental_run_v2(step_fn, args=(dist_inputs,))

for inputs in dist_dataset:
    train_step(inputs)
    break

Code with scope:

strat = tf.distribute.MirroredStrategy()

BATCH_SIZE_PER_REPLICA = 5

print('Replicas: ', strat.num_replicas_in_sync)

global_batch_size = (BATCH_SIZE_PER_REPLICA * strat.num_replicas_in_sync)

with strat.scope():
    dataset = tf.data.Dataset.from_tensors(tf.random.normal([100])).repeat(1000).batch(
        global_batch_size)

    g = Model('m', 10, 10, 1, 3)

    dist_dataset = strat.experimental_distribute_dataset(dataset)

    @tf.function
    def train_step(dist_inputs):
        def step_fn(inputs):
            print([(v.name, v.device) for v in g.trainable_variables])
            return g(inputs)

        out = strat.experimental_run_v2(step_fn, args=(dist_inputs,))

    for inputs in dist_dataset:
        train_step(inputs)
        break

Edit: It seems that strat.experimental_run_v2 automatically enters the scope of strat. So why does with strat.scope() exist?

like image 839
dparted Avatar asked Jun 11 '19 11:06

dparted


1 Answers

You do not need to put Dataset, dataset iteration loop etc. inside scope(). You just need to define your sequential model and its compilation inside it. So something like this-

mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
  model = tf.keras.Sequential()
  model.add(tf.keras.layers.Embedding(vocab_size, 64))
  model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, activation= 'tanh', recurrent_activation= 'sigmoid', recurrent_dropout = 0, unroll = False, use_bias= True)))
  # One or more dense layers.
  # Edit the list in the `for` line to experiment with layer sizes.
  for units in [64, 64]:
    model.add(tf.keras.layers.Dense(units, activation='relu'))
  # Output layer. The first argument is the number of labels.
  model.add(tf.keras.layers.Dense(3, activation='softmax'))
  model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

What it will do is, it will create a replica of model and its parameters on each GPU which will be trained during training. The batch size you will define will be divided by the number of GPUs available and those batches will be sent to those GPUs, for example, if you have batch_size = 64 and you have two GPUs then each GPU will get batch sizes of 32. You can read read more here.

like image 130
Rishabh Sahrawat Avatar answered Sep 17 '22 14:09

Rishabh Sahrawat