Difference between GradientDescentOptimizer and AdamOptimizer in tensorflow?

Question

When using GradientDescentOptimizer instead of Adam Optimizer the model doesn't seem to converge. On the otherhand, AdamOptimizer seems to work fine. Is the something wrong with the GradientDescentOptimizer from tensorflow?

import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

def randomSample(size=100):
    """
    y = 2 * x -3
    """
    x = np.random.randint(500, size=size)
    y = x * 2  - 3 - np.random.randint(-20, 20, size=size)    

    return x, y

def plotAll(_x, _y, w, b):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(_x, _y)

    x = np.random.randint(500, size=20)
    y = w * x + b
    ax.plot(x, y,'r')
    plt.show()

def lr(_x, _y):

    w = tf.Variable(2, dtype=tf.float32)
    b = tf.Variable(3, dtype=tf.float32)

    x = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)

    linear_model = w * x + b
    loss = tf.reduce_sum(tf.square(linear_model - y))
    optimizer = tf.train.AdamOptimizer(0.0003) #GradientDescentOptimizer
    train = optimizer.minimize(loss)

    init = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init)
    for i in range(10000):
        sess.run(train, {x : _x, y: _y})
    cw, cb, closs = sess.run([w, b, loss], {x:_x, y:_y})
    print(closs)
    print(cw,cb)

    return cw, cb

x,y = randomSample()
w,b = lr(x,y)
plotAll(x,y, w, b)

Guilherme de Lazari · Accepted Answer

I had a similar problem once and it took me a long time to find out the real problem. With gradient descent my loss function was actually growing instead of getting smaller.

It turned out that my learning rate was too high. If you take too big of a step with gradient descent you can end up jumping over the minimum. And if you are really unlucky, like I was you end up jumping so far ahead that your error increases.

Lowering the learning rate should make the model converge. But it could take a long time.

Adam optimizer has momentum, that is, it doesn't just follow the instantaneous gradient, but it keeps track of the direction it was going before with a sort of velocity. This way, if you start going back and forth because of the gradient than the momentum will force you to go slower in this direction. This helps a lot! Adam has a few more tweeks other than momentum that make it the prefered deep learning optimizer.

If you want to read more about optimizers this blog post is very informative. http://ruder.io/optimizing-gradient-descent/

Difference between GradientDescentOptimizer and AdamOptimizer in tensorflow?

Tags:

python

machine-learning

tensorflow

gradient-descent

regression

test

Video Answer

1 Answers

Guilherme de Lazari

Recent Activity

Donate For Us

Difference between GradientDescentOptimizer and AdamOptimizer in tensorflow?

Tags:

python

machine-learning

tensorflow

gradient-descent

regression

test

Video Answer

1 Answers

Guilherme de Lazari

Related questions

Recent Activity

Donate For Us