I am currently playing around with Distribution Strategies in tensorflow 2.0 as described here https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/distribute/Strategy
I am wondering what has to go inside a with ...scope()
block and what is "optional".
Specifically the following operations. Do I have to put ... inside a with ...scope()
for distribution to work?:
I have toyed around a little and my code seems to work even when I use no with ...scope
at all. I am confused if this has some side-effects I am just not seeing right now.
Code without scope
:
strat = tf.distribute.MirroredStrategy()
BATCH_SIZE_PER_REPLICA = 5
print('Replicas: ', strat.num_replicas_in_sync)
global_batch_size = (BATCH_SIZE_PER_REPLICA * strat.num_replicas_in_sync)
dataset = tf.data.Dataset.from_tensors(tf.random.normal([100])).repeat(1000).batch(
global_batch_size)
g = Model('m', 10, 10, 1, 3)
dist_dataset = strat.experimental_distribute_dataset(dataset)
@tf.function
def train_step(dist_inputs):
def step_fn(inputs):
print([(v.name, v.device) for v in g.trainable_variables])
return g(inputs)
out = strat.experimental_run_v2(step_fn, args=(dist_inputs,))
for inputs in dist_dataset:
train_step(inputs)
break
Code with scope:
strat = tf.distribute.MirroredStrategy()
BATCH_SIZE_PER_REPLICA = 5
print('Replicas: ', strat.num_replicas_in_sync)
global_batch_size = (BATCH_SIZE_PER_REPLICA * strat.num_replicas_in_sync)
with strat.scope():
dataset = tf.data.Dataset.from_tensors(tf.random.normal([100])).repeat(1000).batch(
global_batch_size)
g = Model('m', 10, 10, 1, 3)
dist_dataset = strat.experimental_distribute_dataset(dataset)
@tf.function
def train_step(dist_inputs):
def step_fn(inputs):
print([(v.name, v.device) for v in g.trainable_variables])
return g(inputs)
out = strat.experimental_run_v2(step_fn, args=(dist_inputs,))
for inputs in dist_dataset:
train_step(inputs)
break
Edit: It seems that strat.experimental_run_v2
automatically enters the scope of strat
. So why does with strat.scope()
exist?
You do not need to put Dataset, dataset iteration loop etc. inside scope()
. You just need to define your sequential model and its compilation inside it. So something like this-
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(vocab_size, 64))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, activation= 'tanh', recurrent_activation= 'sigmoid', recurrent_dropout = 0, unroll = False, use_bias= True)))
# One or more dense layers.
# Edit the list in the `for` line to experiment with layer sizes.
for units in [64, 64]:
model.add(tf.keras.layers.Dense(units, activation='relu'))
# Output layer. The first argument is the number of labels.
model.add(tf.keras.layers.Dense(3, activation='softmax'))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
What it will do is, it will create a replica of model and its parameters on each GPU which will be trained during training. The batch size you will define will be divided by the number of GPUs available and those batches will be sent to those GPUs, for example, if you have batch_size = 64
and you have two GPUs then each GPU will get batch sizes of 32. You can read read more here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With