Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Batchnorm2d Pytorch - Why pass number of channels to batchnorm?

Why do I need to pass the previous nummber of channels to the batchnorm? The batchnorm should normalize over each datapoint in the batch, why does it need to have the number of channels then ?

like image 929
TheBenimeni Avatar asked May 27 '20 11:05

TheBenimeni


People also ask

What does BatchNorm2d do in Pytorch?

Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .

What is the difference between BatchNorm1d and BatchNorm2d?

BatchNorm1d normalises data to 0 mean and unit variance for 2/3-dimensional data (N, C) or (N, C, L) , computed over the channel dimension at each (N, L) or (N,) slice; while BatchNorm2d does the same thing for 4 dimensions (N, C, H, W) , computed over the channel dimension at each (N, H, W) slice.

Why is batch normalization important?

Batch normalization solves a major problem called internal covariate shift. It helps by making the data flowing between intermediate layers of the neural network look, this means you can use a higher learning rate. It has a regularizing effect which means you can often remove dropout.

What is batch norm 2D?

BatchNorm2d() is used as the number of dimensions that output from the last layer and come into the batch norm layer. nn. Dropout() is used as a dropout unit in a neural network. torch. flatten() is used as flatter input by reshaping it into a one-dimension tensor.


1 Answers

Batch normalisation has learnable parameters, because it includes an affine transformation.

From the documentation of nn.BatchNorm2d:

BatchNorm Formular

The mean and standard-deviation are calculated per-dimension over the mini-batches and γ and β are learnable parameter vectors of size C (where C is the input size). By default, the elements of γ are set to 1 and the elements of β are set to 0.

Since the norm is calculated per channel, the parameters γ and β are vectors of size num_channels (one element per channel), which results in an individual scale and shift per channel. As with any other learnable parameter in PyTorch, they need to be created with a fixed size, hence you need to specify the number of channels

batch_norm = nn.BatchNorm2d(10)

# γ
batch_norm.weight.size()
# => torch.Size([10])

# β
batch_norm.bias.size()
# => torch.Size([10])

Note: Setting affine=False does not use any parameters and the number of channels wouldn't be needed, but they are still required, in order to have a consistent interface.

like image 193
Michael Jungo Avatar answered Nov 15 '22 08:11

Michael Jungo