When calling get_variable() function in tensorflow, the behavior of the "reuse" flag is defined in the tensorflow api doc to be AUTO_REUSE:
reuse: True, None, or tf.AUTO_REUSE; ... When eager execution is enabled, this argument is always forced to be tf.AUTO_REUSE.
However when I really run the demo code as suggested in the webpage:
tf.enable_eager_execution()
def foo():
with tf.variable_scope("foo", reuse=tf.AUTO_REUSE):
v = tf.get_variable("v", [1])
return v
v1 = foo() # Creates v.
v2 = foo() # Gets the same, existing v.
assert v1 == v2
It fails. (It passes if the first line is removed, as expected.)
So how to reuse a variable in eager mode? Is this a bug or I'm missing anything?
In eager mode, things are simpler... except for people that have been brain damaged (like me) by using graphs models for too long.
Eager works in a standard fashion, where variables last only while they are referenced. If you stop referencing them, they are gone.
To do variable sharing, you do the same thing you would naturally do if you were to use numpy (or really anything else) to do the computation: you store variables in an object, and you reuse this object.
This is the reason why eager has so much affinity with the keras API, because keras deals mostly with objects.
So look again at your functions in terms of numpy for example (useful for those like me recovering from graphs). Would you expect two calls to foo
to return the same array object? Of course not.
I found it easiest to reuse variables in Eager Execution by simply passing a reference to the same variable around:
import tensorflow as tf
tf.enable_eager_execution()
import numpy as np
class MyLayer(tf.keras.layers.Layer):
def __init__(self):
super(MyLayer, self).__init__()
def build(self, input_shape):
# bias specific for each layer
self.B = self.add_variable('B', [1])
def call(self, input, A):
# some function involving input, common weights, and layer-specific bias
return tf.matmul(input, A) + self.B
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
def build(self, input_shape):
# common vector of weights
self.A = self.add_variable('A', [int(input_shape[-1]), 1])
# layers which will share A
self.layer1 = MyLayer()
self.layer2 = MyLayer()
def call(self, input):
result1 = self.layer1(input, self.A)
result2 = self.layer2(input, self.A)
return result1 + result2
if __name__ == "__main__":
data = np.random.normal(size=(1000, 3))
model = MyModel()
predictions = model(data)
print('\n\n')
model.summary()
print('\n\n')
print([v.name for v in model.trainable_variables])
The output is:
Thus, we have a shared weight parameter my_model/A
of dimension 3 and two bias parameters my_model/my_layer/B
and my_model/my_layer_1/B
of dimension 1 each, for a total of 5 trainable parameters. The code runs on its own so feel free to play around with it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With