I'm following this tutorial for implementing the Backpropagation algorithm. However, I am stuck at implementing momentum for this algorithm. Without Momentum, this is the code for weight update method: <pre class="prettyprint"><code>def update_weights(network, row, l_rate): for i in range(len(network)): inputs = row[:-1] if i != 0: inputs = [neuron['output'] for neuron in network[i - 1]] for neuron in network[i]: for j in range(len(inputs)): neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] neuron['weights'][-1] += l_rate * neuron['delta'] </code></pre> And below is my implementation: <pre class="prettyprint"><code>def updateWeights(network, row, l_rate, momentum=0.5): for i in range(len(network)): inputs = row[:-1] if i != 0: inputs = [neuron['output'] for neuron in network[i-1]] for neuron in network[i]: for j in range(len(inputs)): previous_weight = neuron['weights'][j] neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] + momentum * previous_weight previous_weight = neuron['weights'][-1] neuron['weights'][-1] += l_rate * neuron['delta'] + momentum * previous_weight </code></pre> This gives me a Mathoverflow error since the weights are exponentially becoming too large over multiple epochs. I believe my <code>previous_weight</code> logic is wrong for the update.

I'll give you a hint. You're multiplying <code>momentum</code> by the <code>previous_weight</code> in your implementation, which is another parameter of the network on the same step. This obviously blows up quickly. What you should do instead is remember the whole update vector, <code>l_rate * neuron['delta'] * inputs[j]</code>, on the previous backpropagation step and add it up. It might look something like this: <pre class="prettyprint"><code>velocity[j] = l_rate * neuron['delta'] * inputs[j] + momentum * velocity[j] neuron['weights'][j] += velocity[j] </code></pre> ... where <code>velocity</code> is an array of the same length as <code>network</code>, defined with a bigger scope than <code>updateWeights</code> and initialized with zeros. See this post for details.

Backpropagation with Momentum

I'm following this tutorial for implementing the Backpropagation algorithm. However, I am stuck at implementing momentum for this algorithm.

Without Momentum, this is the code for weight update method:

def update_weights(network, row, l_rate):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i - 1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
            neuron['weights'][-1] += l_rate * neuron['delta']

And below is my implementation:

def updateWeights(network, row, l_rate, momentum=0.5):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i-1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                previous_weight = neuron['weights'][j]
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] + momentum * previous_weight
            previous_weight = neuron['weights'][-1]
            neuron['weights'][-1] += l_rate * neuron['delta'] + momentum * previous_weight

This gives me a Mathoverflow error since the weights are exponentially becoming too large over multiple epochs. I believe my previous_weight logic is wrong for the update.

Why is momentum used along with gradient while back propagating errors in a neural network?

Introduction of the momentum rate allows the attenuation of oscillations in the gradient descent.

What does momentum do in neural network?

Neural network momentum is a simple technique that often improves both training speed and accuracy. Training a neural network is the process of finding values for the weights and biases so that for a given set of input values, the computed output values closely match the known, correct, target values.

What is momentum in convolutional neural network?

Momentum in neural networks is a variant of the stochastic gradient descent. It replaces the gradient with a momentum which is an aggregate of gradients as very well explained here. It is also the common name given to the momentum factor, as in your case.

What is gradient descent with momentum?

Gradient Descent with Momentum takes small steps in directions where the gradients oscillate and take large steps along the direction where the past gradients have the same direction(same sign).

I'll give you a hint. You're multiplying momentum by the previous_weight in your implementation, which is another parameter of the network on the same step. This obviously blows up quickly.

What you should do instead is remember the whole update vector, l_rate * neuron['delta'] * inputs[j], on the previous backpropagation step and add it up. It might look something like this:

velocity[j] = l_rate * neuron['delta'] * inputs[j] + momentum * velocity[j]
neuron['weights'][j] += velocity[j]

... where velocity is an array of the same length as network, defined with a bigger scope than updateWeights and initialized with zeros. See this post for details.

Backpropagation with Momentum

Tags:

python

algorithm

neural-network

backpropagation

gradient-descent

Jaswanth Kumar

People also ask

1 Answers

Maxim

Recent Activity

Donate For Us

Backpropagation with Momentum

Tags:

python

algorithm

neural-network

backpropagation

gradient-descent

Jaswanth Kumar

People also ask

1 Answers

Maxim

Related questions

Recent Activity

Donate For Us