Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does it mean to "break symmetry"? in the context of neural network programming? [duplicate]

I have heard a lot about "breaking the symmetry" within the context of neural network programming and initialization. Can somebody please explain what this means? As far as I can tell, it has something to do with neurons performing similarly during forward and backward propagation if the weight matrix is filled with identical values during initialization. Asymmetrical behavior would be more clearly replicated with random initialization, i.e., not using identical values throughout the matrix.

like image 1000
Jeff Austin Avatar asked Jan 08 '20 02:01

Jeff Austin


People also ask

What is symmetry-breaking in neural networks?

Symmetry breaking refer to a requirement of initializing machine learning models like neural networks. When some machine learning models have weights all initialized to the same value, it can be difficult or impossible for the weights to differ as the model is trained. This is the “symmetry”.

What will happen if all the weights of a neural network are initialized with same value?

Now imagine that you initialize all weights to the same value (e.g. zero or one). In this case, each hidden unit will get exactly the same signal. E.g. if all weights are initialized to 1, each unit gets signal equal to sum of inputs (and outputs sigmoid(sum(inputs)) ).

What happens during backpropagation in a neural network?

Backpropagation is a process involved in training a neural network. It involves taking the error rate of a forward propagation and feeding this loss backward through the neural network layers to fine-tune the weights. Backpropagation is the essence of neural net training.

What will happen if we initialize all the weights to 0 in neural networks?

Initializing all the weights with zeros leads the neurons to learn the same features during training. In fact, any constant initialization scheme will perform very poorly.


1 Answers

Your understanding is correct.

When all initial values are identical, for example initialize every weight to 0, then when doing backpropagation, all weights will get the same gradient, and hence the same update. This is what is referred to as the symmetry.

Intuitively, that means all nodes will learn the same thing, and we don't want that, because we want the network to learn different kinds of features. This is achieved by random initialization, since then the gradient will be different, and each node will grow to be more distinct to other nodes, enabling diverse feature extraction. This is what is referred to as breaking the symmetry.

like image 195
justhalf Avatar answered Sep 30 '22 07:09

justhalf