Backpropagation with Rectified Linear Units

Question

I have written some code to implement backpropagation in a deep neural network with the logistic activation function and softmax output.

def backprop_deep(node_values, targets, weight_matrices):
    delta_nodes = node_values[-1] - targets
    delta_weights = delta_nodes.T.dot(node_values[-2])
    weight_updates = [delta_weights]
    for i in xrange(-2, -len(weight_matrices)- 1, -1):
        delta_nodes = dsigmoid(node_values[i][:,:-1]) * delta_nodes.dot(weight_matrices[i+1])[:,:-1]
        delta_weights = delta_nodes.T.dot(node_values[i-1])
        weight_updates.insert(0, delta_weights)
    return weight_updates

The code works well, but when I switched to ReLU as the activation function it stopped working. In the backprop routine I only change the derivative of the activation function:

def backprop_relu(node_values, targets, weight_matrices):
    delta_nodes = node_values[-1] - targets
    delta_weights = delta_nodes.T.dot(node_values[-2])
    weight_updates = [delta_weights]
    for i in xrange(-2, -len(weight_matrices)- 1, -1):
        delta_nodes = (node_values[i]>0)[:,:-1] * delta_nodes.dot(weight_matrices[i+1])[:,:-1]
        delta_weights = delta_nodes.T.dot(node_values[i-1])
        weight_updates.insert(0, delta_weights)
    return weight_updates

However, the network no longer learns, and the weights quickly go to zero and stay there. I am totally stumped.

GuillaumeDufay · Accepted Answer

Although I have determined the source of the problem, I'm going to leave this up in case it might be of benefit to someone else.

The problem was that I did not adjust the scale of the initial weights when I changed activation functions. While logistic networks learn very well when node inputs are near zero and the logistic function is approximately linear, ReLU networks learn well for moderately large inputs to nodes. The small weight initialization used in logistic networks is therefore not necessary, and in fact harmful. The behavior I was seeing was the ReLU network ignoring the features and attempting to learn the bias of the training set exclusively.

I am currently using initial weights distributed uniformly from -.5 to .5 on the MNIST dataset, and it is learning very quickly.

Backpropagation with Rectified Linear Units

Tags:

python

neural-network

backpropagation

numpy

GuillaumeDufay

1 Answers

GuillaumeDufay

Recent Activity

Donate For Us

Backpropagation with Rectified Linear Units

Tags:

python

neural-network

backpropagation

numpy

GuillaumeDufay

1 Answers

GuillaumeDufay

Related questions

Recent Activity

Donate For Us