Given input features as such, just raw numbers:
tensor([0.2153, 0.2190, 0.0685, 0.2127, 0.2145, 0.1260, 0.1480, 0.1483, 0.1489,
0.1400, 0.1906, 0.1876, 0.1900, 0.1925, 0.0149, 0.1857, 0.1871, 0.2715,
0.1887, 0.1804, 0.1656, 0.1665, 0.1137, 0.1668, 0.1168, 0.0278, 0.1170,
0.1189, 0.1163, 0.2337, 0.2319, 0.2315, 0.2325, 0.0519, 0.0594, 0.0603,
0.0586, 0.0067, 0.0624, 0.2691, 0.0617, 0.2790, 0.2805, 0.2848, 0.2454,
0.1268, 0.2483, 0.2454, 0.2475], device='cuda:0')
And the expected output is a single real number output, e.g.
tensor(-34.8500, device='cuda:0')
Full code on https://www.kaggle.com/alvations/pytorch-mlp-regression
I've tried creating a simple 2 layer network with:
class MLP(nn.Module):
def __init__(self, input_size, output_size, hidden_size):
super(MLP, self).__init__()
self.linear = nn.Linear(input_size, hidden_size)
self.classifier = nn.Linear(hidden_size, output_size)
def forward(self, inputs, hidden=None, dropout=0.5):
inputs = F.dropout(inputs, dropout) # Drop-in.
# First Layer.
output = F.relu(self.linear(inputs))
# Matrix manipulation magic.
batch_size, sequence_len, hidden_size = output.shape
# Technically, linear layer takes a 2-D matrix as input, so more manipulation...
output = output.contiguous().view(batch_size * sequence_len, hidden_size)
# Apply dropout.
output = F.dropout(output, dropout)
# Put it through the classifier
# And reshape it to [batch_size x sequence_len x vocab_size]
output = self.classifier(output).view(batch_size, sequence_len, -1)
return output
And training as such:
# Training routine.
def train(num_epochs, dataloader, valid_dataset, model, criterion, optimizer):
losses = []
valid_losses = []
learning_rates = []
plt.ion()
x_valid, y_valid = valid_dataset
for _e in range(num_epochs):
for batch in tqdm(dataloader):
# Zero gradient.
optimizer.zero_grad()
#print(batch)
this_x = torch.tensor(batch['x'].view(len(batch['x']), 1, -1)).to(device)
this_y = torch.tensor(batch['y'].view(len(batch['y']), 1, 1)).to(device)
# Feed forward.
output = model(this_x)
prediction, _ = torch.max(output, dim=1)
loss = criterion(prediction, this_y.view(len(batch['y']), -1))
loss.backward()
optimizer.step()
losses.append(torch.sqrt(loss.float()).data)
with torch.no_grad():
# Zero gradient.
optimizer.zero_grad()
output = model(x_valid.view(len(x_valid), 1, -1))
prediction, _ = torch.max(output, dim=1)
loss = criterion(prediction, y_valid.view(len(y_valid), -1))
valid_losses.append(torch.sqrt(loss.float()).data)
clear_output(wait=True)
plt.plot(losses, label='Train')
plt.plot(valid_losses, label='Valid')
plt.legend()
plt.pause(0.05)
Tuning several hyperparameters, it looks like the model doesn't train well, the validation loss doesn't move at all e.g.
hyperparams = Hyperparams(input_size=train_dataset.x.shape[1],
output_size=1,
hidden_size=150,
loss_func=nn.MSELoss,
learning_rate=1e-8,
optimizer=optim.Adam,
batch_size=500)
And it's loss curve:
Any idea what's wrong with the network?
Am I training the regression model with the wrong loss? Or I've just not yet found the right hyperparameters?
Or am I validating the model wrongly?
From the code you provided, it is tough to say why the validation loss is constant but I see several problems in your code.
for _e in range(num_epochs):
for batch in tqdm(train_dataloader):
# training code
with torch.no_grad():
for batch in tqdm(valid_dataloader):
# validation code
# plot your loss values
Also, you can plot after each epoch, not after each mini-batch training.
Did you check whether the model parameters are getting updated after optimizer.step()
during training? How many validation examples do you have? Why don't you use mini-batch computation during validation?
Why do you do: optimizer.zero_grad()
during validation? It doesn't make sense because, during validation, you are not going to do anything related to optimization.
You should use model.eval()
during validation to turn off the dropouts. See PyTorch documentation to learn about .train()
and .eval()
methods.
The learning rate is set to 1e-8, isn't it too small? Why don't you use the default learning rate for Adam (1e-3)?
The following requires some reasoning.
Why are you using such a large batch size? What is your training dataset size?
You can directly plot the MSELoss, instead of taking the square root.
My suggestion would be: use some existing resources for MLP in PyTorch. Don't do it from scratch if you do not have sufficient knowledge at this point. It would make you suffer a lot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With