Xavier and he_normal initialization difference

Tags:

What is the difference between He normal and Xavier normal initializer in keras. Both seem to initialize weights based on variance in the input data. Any intuitive explanation for the difference between both?

699

asked Feb 06 '18 10:02

AKSHAYAA VAIDYANATHAN

1 Answers

See this discussion on Stats.SE:

In summary, the main difference for machine learning practitioners is the following:

He initialization works better for layers with ReLu activation.

Xavier initialization works better for layers with sigmoid activation.

193

answered Nov 13 '22 04:11

Maxim

Related questions
                            
                                Neural Network for File Decryption - Possible?
                            
                                General techniques to work with huge amounts of data on a non-super computer
                            
                                is it possible to use apache mahout without hadoop dependency?
                            
                                Right order of doing feature selection, PCA and normalization?
                            
                                Support Vector Machine or Artificial Neural Network for text processing? [closed]
                            
                                Ordinal classification packages and algorithms
                            
                                Package ‘neuralnet’ in R, rectified linear unit (ReLU) activation function?
                            
                                Is it possible to use TensorFlow C++ API on Windows?
                            
                                tensorflow:Can save best model only with val_acc available, skipping
                            
                                What does global pooling do?
                            
                                Interpreting a Self Organizing Map
                            
                                Items of feature_columns must be a _FeatureColumn Given: _VocabularyListCategoricalColumn
                            
                                List the words in a vocabulary according to occurrence in a text corpus, with Scikit-Learn CountVectorizer
                            
                                sklearn LinearRegression, why only one coefficient returned by the model?
                            
                                What is the difference between normalisation and regularisation in machine learning
                            
                                In machine learning, what is definition of “downstream”?
                            
                                Neural Network Ordinal Classification for Age
                            
                                Stop Training in Keras when Accuracy is already 1.0
                            
                                Why does one not use IOU for training?
                            
                                What does "sparse" mean in the context of neural nets?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Xavier and he_normal initialization difference

Tags:

initialization

machine-learning

neural-network

deep-learning

keras

AKSHAYAA VAIDYANATHAN

People also ask

1 Answers

Maxim

Recent Activity

Donate For Us