Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flatten Tensor in Pytorch Convolutional Neural Network (size mismatch error)

I made a reproducible example with random pixels. I'm trying to flatten the tensors for the dense layers after the convolutional layers. The problem is at the intersection of the convolutional layers and the dense layers. I don't know how to put the right number of neurons.

tl;dr I'm looking for the manual equivalent of keras.layers.Flatten() since it doesn't exist in pytorch.

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader

x = np.random.rand(1_00, 3, 100, 100)
y = np.random.randint(0, 2, 1_00)

if torch.cuda.is_available():
    x = torch.from_numpy(x.astype('float32')).cuda()
    y = torch.from_numpy(y.astype('float32')).cuda()

class ConvNet(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(128, 1024) # 128 is wrong here
        self.fc2 = nn.Linear(1024, 1)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

net = ConvNet()
net.cuda()
optimizer = optim.Adam(net.parameters(), lr=0.03)
loss_function = nn.BCELoss()

class Train:

    def __init__(self):
        self.len = x.shape[0]
        self.x_train = x
        self.y_train = y

    def __getitem__(self, index):
        return x[index], y[index].unsqueeze(0)

    def __len__(self):
        return self.len

train = Train()
train_loader = DataLoader(dataset=train, batch_size=64, shuffle=True)

epochs = 1
train_losses = list()
for e in range(epochs):
    running_loss = 0
    for images, labels in train_loader:
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print('It\'s working.')
like image 500
Nicolas Gervais Avatar asked Nov 29 '19 17:11

Nicolas Gervais


People also ask

How do you flatten tensor PyTorch?

A tensor can be flattened into a one-dimensional tensor by reshaping it using the method torch. flatten(). This method supports both real and complex-valued input tensors. It takes a torch tensor as its input and returns a torch tensor flattened into one dimension.

How does flatten work in PyTorch?

flatten. Flattens input by reshaping it into a one-dimensional tensor. If start_dim or end_dim are passed, only dimensions starting with start_dim and ending with end_dim are flattened.

What does flatten mean in CNN?

Flattening is used to convert all the resultant 2-Dimensional arrays from pooled feature maps into a single long continuous linear vector. The flattened matrix is fed as input to the fully connected layer to classify the image.


2 Answers

You must be getting a size mismatch error, right?

That is because the output shape of the result after applying convolutions is [B, 128, 10, 10] and so the result of .flatten would be of shape [B, 128*10*10]. So you need to use a linear layer of input size 12800. That should fix the problem.

So, just change

self.fc1 = nn.Linear(128, 1024) # 128 is wrong here

to

self.fc1 = nn.Linear(12800, 1024)

Usually, to get the idea of the right size, you can compute the shapes of output on paper, or just a print(x.shape) debug statement in the forward function at the right place will also do the job.

like image 164
Umang Gupta Avatar answered Sep 20 '22 04:09

Umang Gupta


Here's a function I made to automatically fit the right number of neurons while flattening a convolutional tensor:

def flatten(w, k=3, s=1, p=0, m=True):
    """
    Returns the right size of the flattened tensor after
        convolutional transformation
    :param w: width of image
    :param k: kernel size
    :param s: stride
    :param p: padding
    :param m: max pooling (bool)
    :return: proper shape and params: use x * x * previous_out_channels

    Example:
    r = flatten(*flatten(*flatten(w=100, k=3, s=1, p=0, m=True)))[0]
    self.fc1 = nn.Linear(r*r*128, 1024)
    """
    return int((np.floor((w - k + 2 * p) / s) + 1) / 2 if m else 1), k, s, p, m

In your case:

def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(3, 32, 3)
    self.conv2 = nn.Conv2d(32, 64, 3)
    self.conv3 = nn.Conv2d(64, 128, 3)

    r = flatten(*flatten(*flatten(w=100, k=3, s=1, p=0, m=True)))[0]

    self.fc1 = nn.Linear(r*r*128, 1024)
    self.fc2 = nn.Linear(1024, 1)

    def forward(self, x): ...
like image 22
Nicolas Gervais Avatar answered Sep 20 '22 04:09

Nicolas Gervais