Surprisingly I have not found an answer to this question after looking around the internet. I am specifically interested in a 3d tensor. From doing my own experiments, I have found that when I create a tensor:
h=torch.randn(5,12,5)
And then put a convolutional layer on it defined as follows:
conv=torch.nn.Conv1d(12,48,3,padding=1)
The output is a (5,48,5) tensor. So, am I correct in assuming that for a 3d tensor in pytorch the middle number represents the number of channels?
Edit: It seems that when running a conv2d, the input dimension is the first entry in the tensor, and I need to make it a 4d tensor (1,48,5,5) for example. Now I am very confused...
Any help is much appreciated!
in_Channels denotes the number of channels in the input image, while out_channels denotes the number of channels produced by the convolution. In the case of image data, the most common cases are grayscale images which will have one channel, black, or color images that will have three channels – red, green, and blue.
PyTorch uses channels-first by default and allows you to transform the input as well as model parameters to channels-last as described here, which could be beneficial for mixed-precision training using TensorCores.
PyTorch: Tensors A PyTorch Tensor is basically the same as a numpy array: it does not know anything about deep learning or computational graphs or gradients, and is just a generic n-dimensional array to be used for arbitrary numeric computation.
For a conv2D, input should be in (N, C, H, W) format. N is the number of samples/batch_size. C is the channels. H and W are height and width resp.
See shape documentation at https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d
For conv1D, input should be (N,C,L) see documentation at https://pytorch.org/docs/stable/nn.html#conv1d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With