What has to be inside tf.distribute.Strategy.scope()?

Question

I am currently playing around with Distribution Strategies in tensorflow 2.0 as described here https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/distribute/Strategy

I am wondering what has to go inside a with ...scope() block and what is "optional".

Specifically the following operations. Do I have to put ... inside a with ...scope() for distribution to work?:

Optimizer creation
Dataset creation
Dataset experimental_distribute_dataset
apply_gradients call
Dataset iteration for loop
experimental_run_v2

I have toyed around a little and my code seems to work even when I use no with ...scope at all. I am confused if this has some side-effects I am just not seeing right now.

Code without scope:

strat = tf.distribute.MirroredStrategy()

BATCH_SIZE_PER_REPLICA = 5

print('Replicas: ', strat.num_replicas_in_sync)

global_batch_size = (BATCH_SIZE_PER_REPLICA * strat.num_replicas_in_sync)

dataset = tf.data.Dataset.from_tensors(tf.random.normal([100])).repeat(1000).batch(
    global_batch_size)

g = Model('m', 10, 10, 1, 3)

dist_dataset = strat.experimental_distribute_dataset(dataset)

@tf.function
def train_step(dist_inputs):
  def step_fn(inputs):
    print([(v.name, v.device) for v in g.trainable_variables])
    return g(inputs)

  out = strat.experimental_run_v2(step_fn, args=(dist_inputs,))

for inputs in dist_dataset:
    train_step(inputs)
    break

Code with scope:

strat = tf.distribute.MirroredStrategy()

BATCH_SIZE_PER_REPLICA = 5

print('Replicas: ', strat.num_replicas_in_sync)

global_batch_size = (BATCH_SIZE_PER_REPLICA * strat.num_replicas_in_sync)

with strat.scope():
    dataset = tf.data.Dataset.from_tensors(tf.random.normal([100])).repeat(1000).batch(
        global_batch_size)

    g = Model('m', 10, 10, 1, 3)

    dist_dataset = strat.experimental_distribute_dataset(dataset)

    @tf.function
    def train_step(dist_inputs):
        def step_fn(inputs):
            print([(v.name, v.device) for v in g.trainable_variables])
            return g(inputs)

        out = strat.experimental_run_v2(step_fn, args=(dist_inputs,))

    for inputs in dist_dataset:
        train_step(inputs)
        break

Edit: It seems that strat.experimental_run_v2 automatically enters the scope of strat. So why does with strat.scope() exist?

Rishabh Sahrawat · Accepted Answer

You do not need to put Dataset, dataset iteration loop etc. inside scope(). You just need to define your sequential model and its compilation inside it. So something like this-

mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
  model = tf.keras.Sequential()
  model.add(tf.keras.layers.Embedding(vocab_size, 64))
  model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, activation= 'tanh', recurrent_activation= 'sigmoid', recurrent_dropout = 0, unroll = False, use_bias= True)))
  # One or more dense layers.
  # Edit the list in the `for` line to experiment with layer sizes.
  for units in [64, 64]:
    model.add(tf.keras.layers.Dense(units, activation='relu'))
  # Output layer. The first argument is the number of labels.
  model.add(tf.keras.layers.Dense(3, activation='softmax'))
  model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

What it will do is, it will create a replica of model and its parameters on each GPU which will be trained during training. The batch size you will define will be divided by the number of GPUs available and those batches will be sent to those GPUs, for example, if you have batch_size = 64 and you have two GPUs then each GPU will get batch sizes of 32. You can read read more here.

What has to be inside tf.distribute.Strategy.scope()?

Tags:

python

tensorflow

tensorflow2.0

dparted

1 Answers

Rishabh Sahrawat

Recent Activity

Donate For Us

What has to be inside tf.distribute.Strategy.scope()?

Tags:

python

tensorflow

tensorflow2.0

dparted

1 Answers

Rishabh Sahrawat

Related questions

Recent Activity

Donate For Us