How to use TensorFlow gradient descent optimizer to solve optimization problems

Tags:

I'm trying to use TensorFlow's Gradient Descent Optimizer to solve 2-dimension Rosenbrock function, but as I ran the program, the optimizer sometimes goes towards the infinity. Also sometime, without changing anything, it can find the right neighborhood but not pinpoint the optimal solution.

My code is as follows:

Click to copy

import tensorflow as tf

x1_data = tf.Variable(initial_value=tf.random_uniform([1], -10, 10),name='x1')
x2_data = tf.Variable(initial_value=tf.random_uniform([1], -10, 10), name='x2')

# Loss function
y = tf.add(tf.pow(tf.sub(1.0, x1_data), 2.0), 
           tf.mul(100.0, tf.pow(tf.sub(x2_data,tf.pow(x1_data, 2.0)), 2.0)), 'y')

opt = tf.train.GradientDescentOptimizer(0.0035)
train = opt.minimize(y)

sess = tf.Session()

init = tf.initialize_all_variables()
sess.run(init)

for step in xrange(200):
    sess.run(train)
    if step % 10 == 0:
        print(step, sess.run(x1_data), sess.run(x2_data), sess.run(y))

The Rosenbrock problem is defined as y = (1 - x1)^2 + 100 * (x2 - x1^2)^2, giving the optimal solution on x1 = x2 = 1

What I'm doing wrong with this? Or have I completely misunderstood how to use TensorFlow?

692

asked Jun 28 '16 05:06

K. Lindholm

2 Answers

If you decrease the variation of initial x1/x2 (e.g. use -3/3 instead of -10/10) and decrease the learning rate by a factor of 10, it shouldn't blow up as often. Decreasing learning rate when you see things diverging is often a good thing to try.

Also, the function you're optimizing is made for being difficult to find the global minimum, so no surprises there that it finds the valley but not the global optimum ;)

131

answered Nov 14 '22 23:11

etarion

Yes, like @etarion says this is an optimization problem, your TensorFlow code is fine.

One way to make sure the gradients never explode is to clip them in the range [-10., 10.] for instance:

Click to copy

opt = tf.train.GradientDescentOptimizer(0.0001)
grads_and_vars = opt.compute_gradients(y, [x1_data, x2_data])
clipped_grads_and_vars = [(tf.clip_by_value(g, -10., 10.), v) for g, v in grads_and_vars]

train = opt.apply_gradients(clipped_grads_and_vars)

answered Nov 14 '22 23:11

Olivier Moindrot

Related questions
                            
                                What is the most efficient way to create a DataFrame from two unrelated series?
                            
                                How to get the Document Vector from Doc2Vec in gensim 0.11.1?
                            
                                Pygame.movie missing
                            
                                Prevent ipython from storing outputs in Out variable
                            
                                Efficiently reshape numpy array
                            
                                Is it possible to vectorize a function that access different elements in an numpy array?
                            
                                Remove consecutive duplicates in a NumPy array
                            
                                NGINX - Python - UWSGI kill issue
                            
                                Python Pandas : Convert multiple rows into single row, ignoring NaN's
                            
                                sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'property'
                            
                                Flask project in Visual Studio 2015: how to specify port number?
                            
                                Embedding album cover in MP4 file using Mutagen
                            
                                Split odd rows of DataFrame without double iloc
                            
                                PIL: add a text at the bottom middle of image
                            
                                RuntimeError: Can not put single artist in more than one figure when using matplotlib 1.5
                            
                                Neural Network composed of multiple activation functions
                            
                                Math behind scipy.ndimage.convolve
                            
                                @(at) operator at Python, how to use it? [duplicate]
                            
                                Why is Numpy inconsistent in ordering polynomial coefficients by degree?
                            
                                Exclude Tags Based on Content in Beautifulsoup

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use TensorFlow gradient descent optimizer to solve optimization problems

Tags:

python

optimization

tensorflow

K. Lindholm

People also ask

2 Answers

etarion

Olivier Moindrot

Recent Activity

Donate For Us