Why do I need to pass the previous nummber of channels to the batchnorm? The batchnorm should normalize over each datapoint in the batch, why does it need to have the number of channels then ?

Batch normalisation has learnable parameters, because it includes an affine transformation. From the documentation of <code>nn.BatchNorm2d</code>: <blockquote> <img src="https://i.stack.imgur.com/Z4OTh.png" alt="BatchNorm Formular"> The mean and standard-deviation are calculated per-dimension over the mini-batches and γ and β are learnable parameter vectors of size C (where C is the input size). By default, the elements of γ are set to 1 and the elements of β are set to 0. </blockquote> Since the norm is calculated per channel, the parameters γ and β are vectors of size num_channels (one element per channel), which results in an individual scale and shift per channel. As with any other learnable parameter in PyTorch, they need to be created with a fixed size, hence you need to specify the number of channels <pre class="prettyprint lang-py prettyprint-override"><code>batch_norm = nn.BatchNorm2d(10) # γ batch_norm.weight.size() # => torch.Size([10]) # β batch_norm.bias.size() # => torch.Size([10]) </code></pre> Note: Setting <code>affine=False</code> does not use any parameters and the number of channels wouldn't be needed, but they are still required, in order to have a consistent interface.

Batchnorm2d Pytorch - Why pass number of channels to batchnorm?

1 Answers

Batch normalisation has learnable parameters, because it includes an affine transformation.

From the documentation of nn.BatchNorm2d:

The mean and standard-deviation are calculated per-dimension over the mini-batches and γ and β are learnable parameter vectors of size C (where C is the input size). By default, the elements of γ are set to 1 and the elements of β are set to 0.

Since the norm is calculated per channel, the parameters γ and β are vectors of size num_channels (one element per channel), which results in an individual scale and shift per channel. As with any other learnable parameter in PyTorch, they need to be created with a fixed size, hence you need to specify the number of channels

batch_norm = nn.BatchNorm2d(10)

# γ
batch_norm.weight.size()
# => torch.Size([10])

# β
batch_norm.bias.size()
# => torch.Size([10])

Note: Setting affine=False does not use any parameters and the number of channels wouldn't be needed, but they are still required, in order to have a consistent interface.

193

answered Nov 15 '22 08:11

Michael Jungo

Related questions
                            
                                Hidden Markov Model: Is it possible that the accuracy decreases as the number of states increases?
                            
                                How to save Machine Learning models in R
                            
                                How to calculate GPU memory usage in Theano?
                            
                                Finding and utilizing eigenvalues and eigenvectors from PCA in scikit-learn
                            
                                How does one use the official Batch Normalization layer in TensorFlow?
                            
                                R mlr package - is it possible to save all models from Parameter tuning?
                            
                                Is there anyway Google App Engine apps can communicate or control Machine Learning models or tasks?
                            
                                Yelp data file type
                            
                                What is the meaning of rank 4 of data In the flow method of ImageDataGenerator (Keras) which has argument x
                            
                                What is the difference between a tensor and a multi-d matrix in Tensorflow?
                            
                                What are the uses of tf.space_to_depth?
                            
                                Why is my implementations of the log-loss (or cross-entropy) not producing the same results?
                            
                                Find length of cluster (how many point associated with cluster) after KMeans clustering (scikit learn)
                            
                                Oversampling or SMOTE in Pyspark
                            
                                Tensorflow error "has type list, but expected one of: int, long, float"
                            
                                What are some of the ways to convert NLP to SQL?
                            
                                Is it possible to add TransformedTargetRegressor into a scikit-learn pipeline?
                            
                                How to do GridSearchCV for F1-score in classification problem with scikit-learn?
                            
                                when restoring from a checkpoint, how can I change the data type of the parameters?
                            
                                Negative accuracy score in regression models with Scikit-Learn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Batchnorm2d Pytorch - Why pass number of channels to batchnorm?

Tags:

machine-learning

deep-learning

pytorch

batch-normalization

batchnorm

TheBenimeni

People also ask

1 Answers

Michael Jungo

Recent Activity

Donate For Us