I am fairly new to the internals of TensorFlow. Towards trying to understand TensorFlow's implementation of AdamOptimizer, I checked the corresponding subgraph in TensorBoard. There seems to be a duplicate subgraph named name + '_1'
, where name='Adam'
by default.
The following MWE produces the graph below. (Note that I have expanded the x
node!)
import tensorflow as tf
tf.reset_default_graph()
x = tf.Variable(1.0, name='x')
train_step = tf.train.AdamOptimizer(1e-1, name='MyAdam').minimize(x)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
with tf.summary.FileWriter('./logs/mwe') as writer:
writer.add_graph(sess.graph)
I am confused because I would expect the above code to produce just a single namespace inside the graph. Even after examining the relevant source files (namely adam.py
, optimizer.py
and training_ops.cc
), it's not clear to me how/why/where the duplicate is created.
Question: What is the source of the duplicate AdamOptimizer
subgraph?
I can think of the following possibilities:
Due to some initial confusion, I cluttered my original question with detailed instructions for how to set up a reproducible environment with TensorFlow/TensorBoard which reproduces this graph. I have now replaced all that with the clarification about expanding the x
node.
This is not a bug, just a perhaps questionable way of leaking outside of your own scope.
First, not a bug: The Adam optimizer is not duplicated. As can be seen in your graph, there is a single /MyAdam
scope, not two. No problem here.
However, there are two MyAdam
and MyAdam_1
subscopes added to your variable scope. They correspond respectively to the m
and v
variables (and their initialization operations) of the Adam optimizer for this variable.
This is where choices made by the optimizer are debatable. You could indeed reasonably expect the Adam optimizer operations and variables to be strictly defined within its assigned scope. Instead, they choose to creep in the optimized variables' scope to locate the statistics variables.
So, debatable choice to say the least, but not a bug, in the sense that the Adam optimizer is indeed not duplicated.
EDIT
Note that this way of locating variables is common across optimizers -- you can observe the same effect with a MomentumOptimizer
for example. Indeed, this is the standard way of creating slots for optimizers -- see here:
# Scope the slot name in the namespace of the primary variable.
# Set "primary.op.name + '/' + name" as default name, so the scope name of
# optimizer can be shared when reuse is True. Meanwhile when reuse is False
# and the same name has been previously used, the scope name will add '_N'
# as suffix for unique identifications.
So as I understand it, they chose to locate the statistics of a variable within a subscope of the scope of the variable itself, so that if the variable is shared/reused, then its statistics are also shared/reused and do not need to be recomputed. This is indeed a reasonable thing to do, even if again, creeping outside of your scope is somewhat unsettling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With