Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Understanding input shape to PyTorch conv1D?

This seems to be one of the common questions on here (1, 2, 3), but I am still struggling to define the right shape for input to PyTorch conv1D.

I have text sequences of length 512 (number of tokens per sequence) with each token being represented by a vector of length 768 (embedding). The batch size I am using is 6.

So my input tensor to conv1D is of shape [6, 512, 768].

input = torch.randn(6, 512, 768) 

Now, I want to convolve over the length of my sequence (512) with a kernel size of 2 using the conv1D layer from PyTorch.

Understanding 1:

I assumed that "in_channels" are the embedding dimension of the conv1D layer. If so, then a conv1D layer will be defined in this way where

in_channels = embedding dimension (768)
out_channels = 100 (arbitrary number)
kernel = 2

convolution_layer = nn.conv1D(768, 100, 2)
feature_map = convolution_layer(input)

But with this assumption, I get the following error:

RuntimeError: Given groups=1, weight of size 100 768 2, expected input `[4, 512, 768]` to have 768 channels, but got 512 channels instead

Understanding 2:

Then I assumed that "in_channels" is the sequence length of the input sequence. If so, then a conv1D layer will be defined in this way where

in_channels = sequence length (512)
out_channels = 100 (arbitrary number)
kernel = 2

convolution_layer = nn.conv1D(512, 100, 2) 
feature_map = convolution_layer(input)

This works fine and I get an output feature map of dimension [batch_size, 100, 767]. However, I am confused. Shouldn't the convolutional layer convolve over the sequence length of 512 and output a feature map of dimension [batch_size, 100, 511]?

I will be really grateful for your help.

like image 986
Anjani Anjani Avatar asked Jun 14 '20 13:06

Anjani Anjani

People also ask

What is conv1d in PyTorch?

Conv1d — PyTorch 1.9.0 documentation Conv1d class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None) [source] Applies a 1D convolution over an input signal composed of several input planes.

What is the output shape of conv1d() in Python?

The output shape of torch.nn.Conv1d () is: (N, Cout, Lout) or (Cout, Lout) Cout is given in torch.nn.Conv1d () by parameter out_channels, which means Cout == out_channels. There are some important parameters in torch.nn.Conv1d (), they are: stride (int or tuple, optional) – Stride of the convolution. Default: 1

What is the size of the convolving kernel in Python?

kernel_size ( python:int or tuple) — Size of the convolving kernel. The input is a 1D signal which consists of 10 numbers. We will convert this into a tensor of size [1, 1, 10]. CNN Output with out_channels=1, kernel_size=3 and stride=1 .

What is the shape of the input and output of torch?

The shape of torch.nn.Conv1d () input. The input shape should be: (N, Cin​, Lin​) or (Cin, Lin), (N, Cin​, Lin​) are common used. The output of torch.nn.Conv1d (). The output shape of torch.nn.Conv1d () is: (N, Cout, Lout) or (Cout, Lout) Cout is given in torch.nn.Conv1d () by parameter out_channels, which means Cout == out_channels.

2 Answers

In pytorch your input shape of [6, 512, 768] should actually be [6, 768, 512] where the feature length is represented by the channel dimension and sequence length is the length dimension. Then you can define your conv1d with in/out channels of 768 and 100 respectively to get an output of [6, 100, 511].

Given an input of shape [6, 512, 768] you can convert it to the correct shape with Tensor.transpose.

input = input.transpose(1, 2).contiguous()

The .contiguous() ensures the memory of the tensor is stored contiguously which helps avoid potential issues during processing.

like image 112
jodag Avatar answered Oct 09 '22 02:10


I found an answer to it (source).

So, usually, BERT outputs vectors of shape

[batch_size, sequence_length, embedding_dim].


sequence_length = number of words or tokens in a sequence (max_length sequence BERT can handle is 512)
embedding_dim = the vector length of the vector describing each token (768 in case of BERT).

thus, input = torch.randn(batch_size, 512, 768)

Now, we want to convolve over the text sequence of length 512 using a kernel size of 2.

So, we define a PyTorch conv1D layer as follows,

convolution_layer = nn.conv1d(in_channels, out_channels, kernel_size)


in_channels = embedding_dim
out_channels = arbitrary int
kernel_size = 2 (I want bigrams)

thus, convolution_layer = nn.conv1d(768, 100, 2)

Now we need a connecting link between the expected input by convolution_layer and the actual input.

For this, we require to

current input shape [batch_size, 512, 768] expected input [batch_size, 768, 512]

To achieve this expected input shape, we need to use the transpose function from PyTorch.

input_transposed = input.transpose(1, 2)
like image 31
Anjani Anjani Avatar answered Oct 09 '22 03:10

Anjani Anjani