Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function approximation Tensorflow

I am trying to create a neural network in Tensorflow that approximates a sine function. I have found some examples of universal function approximators but I am not fully understanding the code and, since I am quite new with Tensorflow, I would like to code it myself to understand every step.

This is my code:

import tensorflow as tf
import numpy as np
import math, random
import matplotlib.pyplot as plt


# Create the arrays x and y that contains the inputs and the outputs of the function to approximate
x = np.arange(0, 2*np.pi, 2*np.pi/1000).reshape((1000,1))
y = np.sin(x)
# plt.plot(x,y)
# plt.show()

# Define the number of nodes
n_nodes_hl1 = 100
n_nodes_hl2 = 100

# Define the number of outputs and the learn rate
n_classes = 1
learn_rate = 0.1

# Define input / output placeholders
x_ph = tf.placeholder('float', [None, 1])
y_ph = tf.placeholder('float')


# Routine to compute the neural network (2 hidden layers)
def neural_network_model(data):
    hidden_1_layer = {'weights': tf.Variable(tf.random_normal([1, n_nodes_hl1])),
                      'biases': tf.Variable(tf.random_normal([n_nodes_hl1]))}

    hidden_2_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
                      'biases': tf.Variable(tf.random_normal([n_nodes_hl2]))}

    output_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl2, n_classes])),
                      'biases': tf.Variable(tf.random_normal([n_classes]))}


    # (input_data * weights) + biases
    l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['biases'])
    l1 = tf.nn.relu(l1)

    l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])
    l2 = tf.nn.relu(l2)

    output = tf.add(tf.matmul(l2, output_layer['weights']), output_layer['biases'])

    return output


# Routine to train the neural network
def train_neural_network(x_ph):
    prediction = neural_network_model(x_ph)
    cost = tf.reduce_mean(tf.square(prediction - y_ph))
    optimizer = tf.train.GradientDescentOptimizer(learn_rate).minimize(cost)

    # cycles feed forward + backprop
    hm_epochs = 10

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        # Train in each epoch with the whole data
        for epoch in range(hm_epochs):
            epoch_loss = 0
            _, c = sess.run([optimizer, cost], feed_dict = {x_ph: x, y_ph: y})
            epoch_loss += c
            print('Epoch', epoch, 'completed out of', hm_epochs, 'loss:', epoch_loss)

        correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y_ph, 1))
        accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
        print('Accuracy;', accuracy.eval({x_ph: x, y_ph: x}))


# Train network
train_neural_network(x_ph)

If you run that program you will see how the loss diverges and I don't know why it behaves like that. Could anyone help me?

Thank you!

like image 713
Emm Avatar asked Feb 12 '18 19:02

Emm


2 Answers

@AIdream was right about initial learning rate convergence issue in general. But even with a lean_rate=1.0e-9 and 10000 epochs, the error is still large means the issue is something else.


Debugging the problem

Running the above code, gives:

Epoch 0 completed out of 10 loss: 61437.30859375
Epoch 1 completed out of 10 loss: 1.2855042406744022e+21
Epoch 2 completed out of 10 loss: inf
Epoch 3 completed out of 10 loss: nan

The above code tries to approximate a sin function within a range (0, 2*pi). Since the labels(outputs) is going to be (-1,1), the higher error indicates a large value being initialized for weights. Changing the weights to have a smaller initialized value (stddev=0.01), leads to:

Epoch 0 completed out of 10 loss: 0.5000443458557129
Epoch 1 completed out of 10 loss: 0.4999848008155823
Epoch 2 completed out of 10 loss: 0.49993154406547546
Epoch 3 completed out of 10 loss: 0.4998819828033447

The loss converged really fast, but checking the prediction it seems the inputs are all getting mapped to zero.

enter image description here

The problem is because the inputs in the above code are given as a single batch and not as mini batches. Batch gradient decent can lead to a local minimum problem and once it reaches the local minimum it cannot come out of it. Mini batch avoids this problem, as the gradients computed on batches are noisy and can get you out of the local minimums. With those changes leads to:

Epoch 0 completed out of 100 loss: 456.28773515997455
Epoch 10 completed out of 100 loss: 6.713319106237066
Epoch 20 completed out of 100 loss: 0.24847120749460316
Epoch 30 completed out of 100 loss: 0.09907744570556076

enter image description here


The above steps can be reproduced by downloading the source from here.

like image 145
vijay m Avatar answered Sep 18 '22 02:09

vijay m


Your initial learning rate for the gradient descent is just too big for it to converge toward a minima (see for instance this other thread about gradient descent and learning rate values: "Gradient descent explodes if learning rate is too large").

Just replace its value with e.g. learn_rate = 1.0e-9 here and your network will converge.

Trace:

Epoch 0 completed out of 10000 loss: 8512.4736328125
Epoch 1 completed out of 10000 loss: 8508.4677734375
...
Epoch 201 completed out of 10000 loss: 7743.56396484375
Epoch 202 completed out of 10000 loss: 7739.92431640625
...
Epoch 7000 completed out of 10000 loss: 382.22601318359375
Epoch 7001 completed out of 10000 loss: 382.08026123046875
...
Epoch 9998 completed out of 10000 loss: 147.459716796875
Epoch 9999 completed out of 10000 loss: 147.4239501953125
...
like image 43
benjaminplanche Avatar answered Sep 21 '22 02:09

benjaminplanche