In the fastai cutting edge deep learning for coders course lecture 7.
self.conv1 = nn.Conv2d(3,10,kernel_size = 5,stride=1,padding=2)
Does 10 there mean the number of filters or the number activations the filter will give?
Two-dimensional convolution is applied over an input given by the user where the specific shape of the input is given in the form of size, length, width, channels, and hence the output must be in a convoluted manner is called PyTorch Conv2d.
Conv2d expands on this. Groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example, At groups=1, all inputs are convolved to all outputs.
Applies a 2D convolution over an input image composed of several input planes. This operator supports TensorFloat32.
padding controls the amount of padding applied to the input. It can be either a string {'valid', 'same'} or a tuple of ints giving the amount of implicit padding applied on both sides. dilation controls the spacing between the kernel points; also known as the à trous algorithm.
Here is what you may find
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
Parameters
And this URL has helpful visualization of the process.
So the in_channels
in the beginning is 3 for images with 3 channels (colored images). For images black and white it should be 1. Some satellite images should have 4.
The out_channels
is what convolution will produce so these are the number of filters.
Let's create an example to "prove" that.
import torch import torch.nn as nn c = nn.Conv2d(1,3, stride = 1, kernel_size=(4,5)) print(c.weight.shape) print(c.weight)
Out
torch.Size([3, 1, 4, 5]) Parameter containing: tensor([[[[ 0.1571, 0.0723, 0.0900, 0.1573, 0.0537], [-0.1213, 0.0579, 0.0009, -0.1750, 0.1616], [-0.0427, 0.1968, 0.1861, -0.1787, -0.2035], [-0.0796, 0.1741, -0.2231, 0.2020, -0.1762]]], [[[ 0.1811, 0.0660, 0.1653, 0.0605, 0.0417], [ 0.1885, -0.0440, -0.1638, 0.1429, -0.0606], [-0.1395, -0.1202, 0.0498, 0.0432, -0.1132], [-0.2073, 0.1480, -0.1296, -0.1661, -0.0633]]], [[[ 0.0435, -0.2017, 0.0676, -0.0711, -0.1972], [ 0.0968, -0.1157, 0.1012, 0.0863, -0.1844], [-0.2080, -0.1355, -0.1842, -0.0017, -0.2123], [-0.1495, -0.2196, 0.1811, 0.1672, -0.1817]]]], requires_grad=True)
If we would alter the number of out_channels,
c = nn.Conv2d(1,5, stride = 1, kernel_size=(4,5)) print(c.weight.shape) # torch.Size([5, 1, 4, 5])
We will get 5 filters each filter 4x5 as this is our kernel size. If we would set 2 channels, (some images may have 2 channels only)
c = nn.Conv2d(2,5, stride = 1, kernel_size=(4,5)) print(c.weight.shape) # torch.Size([5, 2, 4, 5])
our filter will have 2 channels.
I think they have terms from this book and since they haven't called it filters, they haven't used that term.
So you are right; filters are what conv layer is learning and the number of filters is the number of out channels. They are set randomly at the start.
Number of activations is calculated based on bs
and image dimension:
bs=16 x = torch.randn(bs, 3, 28, 28) c = nn.Conv2d(3,10,kernel_size=5,stride=1,padding=2) out = c(x) print(out.nelement()) #125440 number of activations
Checking the docs https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d you have 3 in_channels and 10 out_channels so these 10 out_channels are @thefifthjack005 filters also known as features.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With