Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How to initialize weights when using RELU activation function

I want to make a Conv network and I wish to use the RELU activation function. Can someone please give me a clue of the correct way to initialize weights (I'm using Theano)


like image 322
Giovanni Crescencio Avatar asked Oct 20 '15 06:10

Giovanni Crescencio

People also ask

What type of initialization method is used for ReLU?

Weight Initialization for ReLU The current standard approach for initialization of the weights of neural network layers and nodes that use the rectified linear (ReLU) activation function is called “he” initialization.

What are the methods of initialization of weights?

Step-1: Initialization of Neural Network: Initialize weights and biases. Step-2: Forward propagation: Using the given input X, weights W, and biases b, for every layer we compute a linear combination of inputs and weights (Z)and then apply activation function to linear combination (A).

Why is zero initialization of weights not a good initialization technique?

Conclusion: Zero initialization causes the neuron to memorize the the same functions almost in each iterations. To break the symmetry, Random initialization is a better choice however, initializing much high or low value can result in slower optimization.

1 Answers

I'm not sure there is a hard and fast best way to initialize weights and bias for a ReLU layer.

Some claim that (a slightly modified version of) Xavier initialization works well with ReLUs. Others that small Gaussian random weights plus bias=1 (ensuring the weighted sum of positive inputs will remain positive and thus not end up in the ReLUs zero region).

In Theano, these can be achieved like this (assuming weights post-multiply the input):

w = theano.shared((numpy.random.randn((in_size, out_size)) * 0.1).astype(theano.config.floatX))
b = theano.shared(numpy.ones(out_size))


w = theano.shared((numpy.random.randn((in_size, out_size)) * tt.sqrt(2 / (in_size + out_size))).astype(theano.config.floatX))
b = theano.shared(numpy.zeros(out_size))
like image 86
Daniel Renshaw Avatar answered Sep 23 '22 23:09

Daniel Renshaw