I am training a CNN model. I am facing issue while doing the training iteration for my model. The code is as below:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
#convo layers
self.conv1 = nn.Conv2d(3,32,3)
self.conv2 = nn.Conv2d(32,64,3)
self.conv3 = nn.Conv2d(64,128,3)
self.conv4 = nn.Conv2d(128,256,3)
self.conv5 = nn.Conv2d(256,512,3)
#pooling layer
self.pool = nn.MaxPool2d(2,2)
#linear layers
self.fc1 = nn.Linear(512*5*5,2048)
self.fc2 = nn.Linear(2048,1024)
self.fc3 = nn.Linear(1024,133)
#dropout layer
self.dropout = nn.Dropout(0.3)
def forward(self, x):
#first layer
x = self.conv1(x)
x = F.relu(x)
x = self.pool(x)
#x = self.dropout(x)
#second layer
x = self.conv2(x)
x = F.relu(x)
x = self.pool(x)
#x = self.dropout(x)
#third layer
x = self.conv3(x)
x = F.relu(x)
x = self.pool(x)
#x = self.dropout(x)
#fourth layer
x = self.conv4(x)
x = F.relu(x)
x = self.pool(x)
#fifth layer
x = self.conv5(x)
x = F.relu(x)
x = self.pool(x)
#x = self.dropout(x)
#reshape tensor
x = x.view(-1,512*5*5)
#last layer
x = self.dropout(x)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout(x)
x = self.fc2(x)
x = F.relu(x)
x = self.fc3(x)
return x
#loss func
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr = 0.0001)
#criterion = nn.CrossEntropyLoss()
#optimizer = optim.SGD(net.parameters(), lr = 0.05)
def train(n_epochs,model,loader,optimizer,criterion,save_path):
for epoch in range(n_epochs):
train_loss = 0
valid_loss = 0
#training
net.train()
for batch, (data,target) in enumerate(loaders['train']):
optimizer.zero_grad()
outputs = net(data)
#print(outputs.shape)
loss = criterion(outputs,target)
loss.backward()
optimizer.step()
When I use the CrossEntropy Loss function and SGD optimizer, I able able to train the model with no error. When I use MSE loss function and Adam optimizer, I am facing the following error:
RuntimeError Traceback (most recent call last) <ipython-input-20-2223dd9058dd> in <module>
1 #train the model
2 n_epochs = 2
----> 3 train(n_epochs,net,loaders,optimizer,criterion,'saved_model/dog_model.pt')
<ipython-input-19-a93d145ef9f7> in train(n_epochs, model, loader, optimizer, criterion, save_path)
22
23 #calculate loss
---> 24 loss = criterion(outputs,target)
25
26 #backward prop
RuntimeError: The size of tensor a (133) must match the size of tensor b (10) at non-singleton dimension 1.
Does the selected loss function and optimizer effect the training of the model? Can anyone please help on this?
Well, the error is because the nn.MSELoss()
and nn.CrossEntropyLoss()
expect different input
/target
combinations. You cannot simply change the criterion function without changing the inputs and targets appropriately. From the docs:
nn.CrossEntropyLoss
:
- Input:
- (N, C) where C = number of classes, or
- (N, C, d_1, d_2, ..., d_K) with K >= 1 in the case of K-dimensional loss.
- Target:
- (N) where each value is in range [0, C-1] or
- (N, d_1, d_2, ..., d_K) with K >= 1 in the case of K-dimensional loss.
nn.MSELoss
:
- Input:
- (N,∗) where ∗ means, any number of additional dimensions.
- Target:
- (N,∗), same shape as the input
As you can see, in the MSELoss, Target is expect to have the same shape as input, while in the CrossEntropyLoss, the C
dimension is dropped. You cannot use MSELoss as a drop-in replacement for CrossEntropyLoss.
The error message clearly suggests that the error occurred at the line
loss = criterion(outputs,target)
where you are trying to compute the mean-squared error
between the input and the target.
See this line: criterion = nn.MSELoss()
.
I think you should modify your code where you are estimating loss between (output, target) pair of inputs,i.e., loss = criterion(outputs,target)
to something like below:
loss = criterion(outputs,target.view(1, -1))
Here, you are making target
shape same as outputs
from model on line
outputs = net(data)
One more think to notice here is the output of the net
model, i.e., outputs will be of shape batch_size X output_channels
, where batch size if the first dimension of input images as during the training you will get batches of images, so your shape in the forward method will get an additional batch dimension at dim0
: [batch_size, channels, height, width
], and ouput_channels
is number of output features/channels from the last linear layer in the net
model.
And, the the target labels will be of shape batch_size
, which is 10
in your case, check batch_size
you passed in torch.utils.data.DataLoader()
. Therefore, on reshaping it using view(1, -1)
, it will be of converted into a shape 1 X batch_size
, i.e., 1 X 10
.
That's why, you are getting the error:
RuntimeError: input and target shapes do not match: input [10 x 133], target [1 x 10]
So, a way around is to replace loss = criterion(outputs,target.view(1, -1))
with loss = criterion(outputs,target.view(-1, 1))
and change the output_channels
of last linear layer to 1
instead of 133
. In this way, both of outputs
and target
shape will be equal and we can compute MSE
value then.
Learn more about pytorch MSE
loss function from here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With