Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Backpropagation with Momentum

I'm following this tutorial for implementing the Backpropagation algorithm. However, I am stuck at implementing momentum for this algorithm.

Without Momentum, this is the code for weight update method:

def update_weights(network, row, l_rate):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i - 1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
            neuron['weights'][-1] += l_rate * neuron['delta']

And below is my implementation:

def updateWeights(network, row, l_rate, momentum=0.5):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i-1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                previous_weight = neuron['weights'][j]
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] + momentum * previous_weight
            previous_weight = neuron['weights'][-1]
            neuron['weights'][-1] += l_rate * neuron['delta'] + momentum * previous_weight

This gives me a Mathoverflow error since the weights are exponentially becoming too large over multiple epochs. I believe my previous_weight logic is wrong for the update.

like image 326
Jaswanth Kumar Avatar asked Nov 09 '17 21:11

Jaswanth Kumar


People also ask

Why is momentum used along with gradient while back propagating errors in a neural network?

Introduction of the momentum rate allows the attenuation of oscillations in the gradient descent.

What does momentum do in neural network?

Neural network momentum is a simple technique that often improves both training speed and accuracy. Training a neural network is the process of finding values for the weights and biases so that for a given set of input values, the computed output values closely match the known, correct, target values.

What is momentum in convolutional neural network?

Momentum in neural networks is a variant of the stochastic gradient descent. It replaces the gradient with a momentum which is an aggregate of gradients as very well explained here. It is also the common name given to the momentum factor, as in your case.

What is gradient descent with momentum?

Gradient Descent with Momentum takes small steps in directions where the gradients oscillate and take large steps along the direction where the past gradients have the same direction(same sign).


1 Answers

I'll give you a hint. You're multiplying momentum by the previous_weight in your implementation, which is another parameter of the network on the same step. This obviously blows up quickly.

What you should do instead is remember the whole update vector, l_rate * neuron['delta'] * inputs[j], on the previous backpropagation step and add it up. It might look something like this:

velocity[j] = l_rate * neuron['delta'] * inputs[j] + momentum * velocity[j]
neuron['weights'][j] += velocity[j]

... where velocity is an array of the same length as network, defined with a bigger scope than updateWeights and initialized with zeros. See this post for details.

like image 93
Maxim Avatar answered Oct 14 '22 06:10

Maxim