I would like to know why this function:
@tf.function
def train(self,TargetNet,epsilon):
if len(self.experience['s']) < self.min_experiences:
return 0
ids=np.random.randint(low=0,high=len(self.replay_buffer['s']),size=self.batch_size)
states=np.asarray([self.experience['s'][i] for i in ids])
actions=np.asarray([self.experience['a'][i] for i in ids])
rewards=np.asarray([self.experience['r'][i] for i in ids])
next_states=np.asarray([self.experience['s1'][i] for i in ids])
dones = np.asarray([self.experience['done'][i] for i in ids])
q_next_actions=self.get_action(next_states,epsilon)
q_value_next=TargetNet.predict(next_states)
q_value_next=tf.gather_nd(q_value_next,tf.stack((tf.range(self.batch_size),q_next_actions),axis=1))
targets=tf.where(dones, rewards, rewards+self.gamma*q_value_next)
with tf.GradientTape() as tape:
estimates=tf.math.reduce_sum(self.predict(states)*tf.one_hot(actions,self.num_actions),axis=1)
loss=tf.math.reduce_sum(tf.square(estimates - targets))
variables=self.model.trainable_variables
gradients=tape.gradient(loss,variables)
self.optimizer.apply_gradients(zip(gradients,variables))
gives ValueError: Creating variables on a non-first call to a function decorated with tf.function. Whereas this code which is very similiar:
@tf.function
def train(self, TargetNet):
if len(self.experience['s']) < self.min_experiences:
return 0
ids = np.random.randint(low=0, high=len(self.experience['s']), size=self.batch_size)
states = np.asarray([self.experience['s'][i] for i in ids])
actions = np.asarray([self.experience['a'][i] for i in ids])
rewards = np.asarray([self.experience['r'][i] for i in ids])
states_next = np.asarray([self.experience['s2'][i] for i in ids])
dones = np.asarray([self.experience['done'][i] for i in ids])
value_next = np.max(TargetNet.predict(states_next), axis=1)
actual_values = np.where(dones, rewards, rewards+self.gamma*value_next)
with tf.GradientTape() as tape:
selected_action_values = tf.math.reduce_sum(
self.predict(states) * tf.one_hot(actions, self.num_actions), axis=1)
loss = tf.math.reduce_sum(tf.square(actual_values - selected_action_values))
variables = self.model.trainable_variables
gradients = tape.gradient(loss, variables)
self.optimizer.apply_gradients(zip(gradients, variables))
Does not throw an error.Please help me understand why.
EDIT:I removed the parameter epsilon from the function and it works.Is it because the @tf.function decorator is valid only for single argument functions?
Using tf.function you're converting the content of the decorated function: this means that TensorFlow will try to compile your eager code into its graph representation.
The variables, however, are special objects. In fact, when you were using TensorFlow 1.x (graph mode), you were defining the variables only once and then using/updating them.
In tensorflow 2.0, if you use pure eager execution, you can declare and re-use the same variable more than once since a tf.Variable
- in eager mode - is just a plain Python object that gets destroyed as soon as the function ends and the variable, thus, goes out of scope.
In order to make TensorFlow able to correctly convert a function that creates a state (thus, that uses Variables) you have to break the function scope, declaring the variables outside of the function.
In short, if you have a function that works correctly in eager mode, like:
def f():
a = tf.constant([[10,10],[11.,1.]])
x = tf.constant([[1.,0.],[0.,1.]])
b = tf.Variable(12.)
y = tf.matmul(a, x) + b
return y
You have to change it's structure to something like:
b = None
@tf.function
def f():
a = tf.constant([[10, 10], [11., 1.]])
x = tf.constant([[1., 0.], [0., 1.]])
global b
if b is None:
b = tf.Variable(12.)
y = tf.matmul(a, x) + b
print("PRINT: ", y)
tf.print("TF-PRINT: ", y)
return y
f()
in order to make it work correctly with the tf.function
decorator.
I covered this (and others) scenario in several blog posts: the first part analyzes this behavior in the section Handling states breaking the function scope (however I suggest to read it from the beginning and to read also part 2 and 3).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With