There are several experiments that rely on gradient ascent rather than gradient descent. I have looked into some approaches to using "cost" and the minimize function to simulate the "maximize" function, but I am still not certain I know how to properly implement a maximize() function. Also, in most of these cases, I would say they are closer to an unsupervised learning. So given this code concept for a cost function: <pre class="prettyprint"><code>cost = (Yexpected - Ycalculated)^2 train_step = tf.train.AdamOptimizer(0.5).minimize(cost) </code></pre> I would like to write something were I am following the positive gradient and there may not be a Yexpected value: <pre class="prettyprint"><code>maxMe = Function(Ycalculated) train_step = tf.train.AdamOptimizer(0.5).maximize(maxMe) </code></pre> A good example of this need is "http://cs229.stanford.edu/proj2009/LvDuZhai.pdf" with Recurrent Reinforcement Learning. I have read a few papers and references that state changing the sign will flip the direction of movement to increasing gradient, but given TensorFlow's internal calculation of the gradient, I am not sure if this will work to Maximize as I don't know of a way to validate the results: <pre class="prettyprint"><code>maxMe = Function(Ycalculated) train_step = tf.train.AdamOptimizer(0.5).minimize( -1 * maxMe ) </code></pre>

The intuition is simple, the <code>minimize()</code> function keeps squashing the given value, for example, if you start with 5, then for every iteration (for example and depending on the learning rate), the value will become say, 4, then 3, then 2, 1, 0 and so on if possible to bring it down more. Now if you pass -5 at the beginning (which is in fact a +5 but you changed the sign explicitly), the gradient will try to change the parameters to bring the number down more, as for example, -5, -6, -7, -8, ...etc. But in fact, the function is increasing because we changed the sign, and the actual sign is (+). In other words, the gradient, in the latter case, is changing the parameters of the neural network in a way that <code>maximizes</code> the function, not minimizing it. <h3>Toy example with arbitrary numbers:</h3> <pre class="prettyprint"><code>The input x = 1.5, The weight parameter at time (t) w_t = 0.1, The observed response y = 3.0, The learning rate lr = 0.1. x * w = 0.15 (this is y predicted for the current w) loss function = (3.0 - 0.15)^2 = 8.1 Applying gradient descent: w_(t+1) = w_t - lr * (derivative of loss function with respect to w) w_(t+1) = 0.1 - (0.1 * [1.5 * 2(0.15 - 3.0)]) = 0.1 - (-0.855) = 0.955 </code></pre> If we use the new <code>w_(t+1)</code> we will have: <pre class="prettyprint"><code>1.5 * 0.955 = 1.49 (which is closer to the correct answer 3.0) and the new loss is: (3.0 - 1.49)^2 = 2.27 (smaller error). </code></pre> If we keep iterating, we will adjust <code>w</code> to a value that gives us the minimum cost possible. <hr> Now lets repeat the same experiment but with the sign flipped to negative: <pre class="prettyprint"><code>loss function = - (3.0 - 0.15)^2 = -8.1 Applying gradient descent: w_(t+1) = w_t - lr * (derivative of loss function with respect to w) w_(t+1) = 0.1 - (0.1 * [1.5 * -2(0.15 - 3.0)]) = 0.1 - 0.855 = −0.755 </code></pre> If we apply the new <code>w_(t+1)</code> we will have: <pre class="prettyprint"><code>1.5 * −0.755 = −1.1325 and the new loss is: (3.0 - (-1.1325))^2 = 17.07 (the loss function is maximizing!). </code></pre> That is also applicable to any differentiable function, but this is just a simple naive example to demonstrate the idea. So, you can do, as you suggested already: <pre class="prettyprint"><code>optimizer.minimize( -1 * value) </code></pre> Or if you like, create a wrapper function (which in fact is needless, but just to mention it): <pre class="prettyprint"><code>def maximize(optimizer, value, **kwargs): return optimizer.minimize(-value, **kwargs) </code></pre>

Is there an easy way to implement a Optimizer.Maximize() function in TensorFlow

There are several experiments that rely on gradient ascent rather than gradient descent. I have looked into some approaches to using "cost" and the minimize function to simulate the "maximize" function, but I am still not certain I know how to properly implement a maximize() function. Also, in most of these cases, I would say they are closer to an unsupervised learning. So given this code concept for a cost function:

cost = (Yexpected - Ycalculated)^2
train_step = tf.train.AdamOptimizer(0.5).minimize(cost)

I would like to write something were I am following the positive gradient and there may not be a Yexpected value:

maxMe = Function(Ycalculated)
train_step = tf.train.AdamOptimizer(0.5).maximize(maxMe)

A good example of this need is "http://cs229.stanford.edu/proj2009/LvDuZhai.pdf" with Recurrent Reinforcement Learning.

I have read a few papers and references that state changing the sign will flip the direction of movement to increasing gradient, but given TensorFlow's internal calculation of the gradient, I am not sure if this will work to Maximize as I don't know of a way to validate the results:

maxMe = Function(Ycalculated)
train_step = tf.train.AdamOptimizer(0.5).minimize( -1 * maxMe )

How do I create a custom Optimizer in Tensorflow?

This method relies on the (new) Optimizer (class), which we will create, to implement the following methods: _create_slots(), _prepare(), _apply_dense(), and _apply_sparse(). _create_slots() and _prepare() create and initialise additional variables, such as momentum.

What is Optimizer in Tensorflow?

Optimizers are the extended class, which include added information to train a specific model. The optimizer class is initialized with given parameters but it is important to remember that no Tensor is needed. The optimizers are used for improving speed and performance for training a specific model.

What does the minimize function of Optimizer do?

Calling minimize() takes care of both computing the gradients and applying them to the variables. If you want to process the gradients before applying them you can instead use the optimizer in three steps: Compute the gradients with tf. GradientTape .

The intuition is simple, the minimize() function keeps squashing the given value, for example, if you start with 5, then for every iteration (for example and depending on the learning rate), the value will become say, 4, then 3, then 2, 1, 0 and so on if possible to bring it down more. Now if you pass -5 at the beginning (which is in fact a +5 but you changed the sign explicitly), the gradient will try to change the parameters to bring the number down more, as for example, -5, -6, -7, -8, ...etc. But in fact, the function is increasing because we changed the sign, and the actual sign is (+). In other words, the gradient, in the latter case, is changing the parameters of the neural network in a way that maximizes the function, not minimizing it.

Toy example with arbitrary numbers:

The input x = 1.5, The weight parameter at time (t) w_t = 0.1, 
The observed response y = 3.0, The learning rate lr = 0.1.

x * w = 0.15 (this is y predicted for the current w)

loss function = (3.0 - 0.15)^2 = 8.1

Applying gradient descent:

w_(t+1) = w_t - lr * (derivative of loss function with respect to w)

w_(t+1) = 0.1 - (0.1 * [1.5 * 2(0.15 - 3.0)]) =  0.1 - (-0.855) = 0.955

If we use the new w_(t+1) we will have:

1.5 * 0.955 = 1.49 (which is closer to the correct answer 3.0)

and the new loss is: (3.0 - 1.49)^2 = 2.27 (smaller error).

If we keep iterating, we will adjust w to a value that gives us the minimum cost possible.

Now lets repeat the same experiment but with the sign flipped to negative:

loss function = - (3.0 - 0.15)^2 = -8.1

Applying gradient descent:

w_(t+1) = w_t - lr * (derivative of loss function with respect to w)

w_(t+1) = 0.1 - (0.1 * [1.5 * -2(0.15 - 3.0)]) =  0.1 - 0.855 = −0.755

If we apply the new w_(t+1) we will have:

1.5 * −0.755 = −1.1325 and the new loss is: (3.0 - (-1.1325))^2 = 17.07 

(the loss function is maximizing!).

That is also applicable to any differentiable function, but this is just a simple naive example to demonstrate the idea.

So, you can do, as you suggested already:

optimizer.minimize( -1 * value)

Or if you like, create a wrapper function (which in fact is needless, but just to mention it):

def maximize(optimizer, value, **kwargs):
  return optimizer.minimize(-value, **kwargs)

Is there an easy way to implement a Optimizer.Maximize() function in TensorFlow

Tags:

machine-learning

tensorflow

reinforcement-learning

mazecreator

People also ask

1 Answers

Toy example with arbitrary numbers:

Yahya

Recent Activity

Donate For Us

Is there an easy way to implement a Optimizer.Maximize() function in TensorFlow

Tags:

machine-learning

tensorflow

reinforcement-learning

mazecreator

People also ask

1 Answers

Toy example with arbitrary numbers:

Yahya

Related questions

Recent Activity

Donate For Us