I have written some code to implement backpropagation in a deep neural network with the logistic activation function and softmax output.
def backprop_deep(node_values, targets, weight_matrices):
delta_nodes = node_values[-1] - targets
delta_weights = delta_nodes.T.dot(node_values[-2])
weight_updates = [delta_weights]
for i in xrange(-2, -len(weight_matrices)- 1, -1):
delta_nodes = dsigmoid(node_values[i][:,:-1]) * delta_nodes.dot(weight_matrices[i+1])[:,:-1]
delta_weights = delta_nodes.T.dot(node_values[i-1])
weight_updates.insert(0, delta_weights)
return weight_updates
The code works well, but when I switched to ReLU as the activation function it stopped working. In the backprop routine I only change the derivative of the activation function:
def backprop_relu(node_values, targets, weight_matrices):
delta_nodes = node_values[-1] - targets
delta_weights = delta_nodes.T.dot(node_values[-2])
weight_updates = [delta_weights]
for i in xrange(-2, -len(weight_matrices)- 1, -1):
delta_nodes = (node_values[i]>0)[:,:-1] * delta_nodes.dot(weight_matrices[i+1])[:,:-1]
delta_weights = delta_nodes.T.dot(node_values[i-1])
weight_updates.insert(0, delta_weights)
return weight_updates
However, the network no longer learns, and the weights quickly go to zero and stay there. I am totally stumped.
Although I have determined the source of the problem, I'm going to leave this up in case it might be of benefit to someone else.
The problem was that I did not adjust the scale of the initial weights when I changed activation functions. While logistic networks learn very well when node inputs are near zero and the logistic function is approximately linear, ReLU networks learn well for moderately large inputs to nodes. The small weight initialization used in logistic networks is therefore not necessary, and in fact harmful. The behavior I was seeing was the ReLU network ignoring the features and attempting to learn the bias of the training set exclusively.
I am currently using initial weights distributed uniformly from -.5 to .5 on the MNIST dataset, and it is learning very quickly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With