I'm trying to use TensorFlow's Gradient Descent Optimizer to solve 2-dimension Rosenbrock function, but as I ran the program, the optimizer sometimes goes towards the infinity. Also sometime, without changing anything, it can find the right neighborhood but not pinpoint the optimal solution.
My code is as follows:
import tensorflow as tf
x1_data = tf.Variable(initial_value=tf.random_uniform([1], -10, 10),name='x1')
x2_data = tf.Variable(initial_value=tf.random_uniform([1], -10, 10), name='x2')
# Loss function
y = tf.add(tf.pow(tf.sub(1.0, x1_data), 2.0),
tf.mul(100.0, tf.pow(tf.sub(x2_data,tf.pow(x1_data, 2.0)), 2.0)), 'y')
opt = tf.train.GradientDescentOptimizer(0.0035)
train = opt.minimize(y)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
for step in xrange(200):
sess.run(train)
if step % 10 == 0:
print(step, sess.run(x1_data), sess.run(x2_data), sess.run(y))
The Rosenbrock problem is defined as y = (1 - x1)^2 + 100 * (x2 - x1^2)^2, giving the optimal solution on x1 = x2 = 1
What I'm doing wrong with this? Or have I completely misunderstood how to use TensorFlow?
An optimizer is an algorithm used to minimize a loss function with respect to a model's trainable parameters. The most straightforward optimization technique is gradient descent, which iteratively updates a model's parameters by taking a step in the direction of its loss function's steepest descent.
Gradient Descent is the most basic but most used optimization algorithm. It's used heavily in linear regression and classification algorithms. Backpropagation in neural networks also uses a gradient descent algorithm.
Gradient descent optimizer is an optimization algorithm that is basically used so as to reduce some functions by repetitively moving in the direction of descent that is steepest as explained by the gradient's negative.
If you decrease the variation of initial x1/x2 (e.g. use -3/3 instead of -10/10) and decrease the learning rate by a factor of 10, it shouldn't blow up as often. Decreasing learning rate when you see things diverging is often a good thing to try.
Also, the function you're optimizing is made for being difficult to find the global minimum, so no surprises there that it finds the valley but not the global optimum ;)
Yes, like @etarion says this is an optimization problem, your TensorFlow code is fine.
One way to make sure the gradients never explode is to clip them in the range [-10., 10.]
for instance:
opt = tf.train.GradientDescentOptimizer(0.0001)
grads_and_vars = opt.compute_gradients(y, [x1_data, x2_data])
clipped_grads_and_vars = [(tf.clip_by_value(g, -10., 10.), v) for g, v in grads_and_vars]
train = opt.apply_gradients(clipped_grads_and_vars)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With