Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between GradientDescentOptimizer and AdamOptimizer in tensorflow?

When using GradientDescentOptimizer instead of Adam Optimizer the model doesn't seem to converge. On the otherhand, AdamOptimizer seems to work fine. Is the something wrong with the GradientDescentOptimizer from tensorflow?

import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

def randomSample(size=100):
    """
    y = 2 * x -3
    """
    x = np.random.randint(500, size=size)
    y = x * 2  - 3 - np.random.randint(-20, 20, size=size)    

    return x, y

def plotAll(_x, _y, w, b):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(_x, _y)

    x = np.random.randint(500, size=20)
    y = w * x + b
    ax.plot(x, y,'r')
    plt.show()

def lr(_x, _y):

    w = tf.Variable(2, dtype=tf.float32)
    b = tf.Variable(3, dtype=tf.float32)

    x = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)

    linear_model = w * x + b
    loss = tf.reduce_sum(tf.square(linear_model - y))
    optimizer = tf.train.AdamOptimizer(0.0003) #GradientDescentOptimizer
    train = optimizer.minimize(loss)

    init = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init)
    for i in range(10000):
        sess.run(train, {x : _x, y: _y})
    cw, cb, closs = sess.run([w, b, loss], {x:_x, y:_y})
    print(closs)
    print(cw,cb)

    return cw, cb

x,y = randomSample()
w,b = lr(x,y)
plotAll(x,y, w, b)
like image 299
test Avatar asked Sep 16 '17 14:09

test


Video Answer


1 Answers

I had a similar problem once and it took me a long time to find out the real problem. With gradient descent my loss function was actually growing instead of getting smaller.

It turned out that my learning rate was too high. If you take too big of a step with gradient descent you can end up jumping over the minimum. And if you are really unlucky, like I was you end up jumping so far ahead that your error increases.

Lowering the learning rate should make the model converge. But it could take a long time.

Adam optimizer has momentum, that is, it doesn't just follow the instantaneous gradient, but it keeps track of the direction it was going before with a sort of velocity. This way, if you start going back and forth because of the gradient than the momentum will force you to go slower in this direction. This helps a lot! Adam has a few more tweeks other than momentum that make it the prefered deep learning optimizer.

If you want to read more about optimizers this blog post is very informative. http://ruder.io/optimizing-gradient-descent/

like image 76
Guilherme de Lazari Avatar answered Oct 27 '22 10:10

Guilherme de Lazari