This seems to be one of the common questions on here (1, 2, 3), but I am still struggling to define the right shape for input to PyTorch conv1D.
I have text sequences of length 512 (number of tokens per sequence) with each token being represented by a vector of length 768 (embedding). The batch size I am using is 6.
So my input tensor to conv1D is of shape [6, 512, 768].
input = torch.randn(6, 512, 768)
Now, I want to convolve over the length of my sequence (512) with a kernel size of 2 using the conv1D layer from PyTorch.
Understanding 1:
I assumed that "in_channels" are the embedding dimension of the conv1D layer. If so, then a conv1D layer will be defined in this way where
in_channels = embedding dimension (768)
out_channels = 100 (arbitrary number)
kernel = 2
convolution_layer = nn.conv1D(768, 100, 2)
feature_map = convolution_layer(input)
But with this assumption, I get the following error:
RuntimeError: Given groups=1, weight of size 100 768 2, expected input `[4, 512, 768]` to have 768 channels, but got 512 channels instead
Understanding 2:
Then I assumed that "in_channels" is the sequence length of the input sequence. If so, then a conv1D layer will be defined in this way where
in_channels = sequence length (512)
out_channels = 100 (arbitrary number)
kernel = 2
convolution_layer = nn.conv1D(512, 100, 2)
feature_map = convolution_layer(input)
This works fine and I get an output feature map of dimension [batch_size, 100, 767]
. However, I am confused. Shouldn't the convolutional layer convolve over the sequence length of 512 and output a feature map of dimension [batch_size, 100, 511]
?
I will be really grateful for your help.
Conv1d — PyTorch 1.9.0 documentation Conv1d class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None) [source] Applies a 1D convolution over an input signal composed of several input planes.
The output shape of torch.nn.Conv1d () is: (N, Cout, Lout) or (Cout, Lout) Cout is given in torch.nn.Conv1d () by parameter out_channels, which means Cout == out_channels. There are some important parameters in torch.nn.Conv1d (), they are: stride (int or tuple, optional) – Stride of the convolution. Default: 1
kernel_size ( python:int or tuple) — Size of the convolving kernel. The input is a 1D signal which consists of 10 numbers. We will convert this into a tensor of size [1, 1, 10]. CNN Output with out_channels=1, kernel_size=3 and stride=1 .
The shape of torch.nn.Conv1d () input. The input shape should be: (N, Cin, Lin) or (Cin, Lin), (N, Cin, Lin) are common used. The output of torch.nn.Conv1d (). The output shape of torch.nn.Conv1d () is: (N, Cout, Lout) or (Cout, Lout) Cout is given in torch.nn.Conv1d () by parameter out_channels, which means Cout == out_channels.
In pytorch your input shape of [6, 512, 768] should actually be [6, 768, 512] where the feature length is represented by the channel dimension and sequence length is the length dimension. Then you can define your conv1d with in/out channels of 768 and 100 respectively to get an output of [6, 100, 511].
Given an input
of shape [6, 512, 768] you can convert it to the correct shape with Tensor.transpose
.
input = input.transpose(1, 2).contiguous()
The .contiguous()
ensures the memory of the tensor is stored contiguously which helps avoid potential issues during processing.
I found an answer to it (source).
So, usually, BERT outputs vectors of shape
[batch_size, sequence_length, embedding_dim].
where,
sequence_length = number of words or tokens in a sequence (max_length sequence BERT can handle is 512)
embedding_dim = the vector length of the vector describing each token (768 in case of BERT).
thus, input = torch.randn(batch_size, 512, 768)
Now, we want to convolve over the text sequence of length 512 using a kernel size of 2.
So, we define a PyTorch conv1D layer as follows,
convolution_layer = nn.conv1d(in_channels, out_channels, kernel_size)
where,
in_channels = embedding_dim
out_channels = arbitrary int
kernel_size = 2 (I want bigrams)
thus, convolution_layer = nn.conv1d(768, 100, 2)
Now we need a connecting link between the expected input by convolution_layer
and the actual input.
For this, we require to
current input shape [batch_size, 512, 768] expected input [batch_size, 768, 512]
To achieve this expected input shape, we need to use the transpose function from PyTorch.
input_transposed = input.transpose(1, 2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With