LSTM autoencoder always returns the average of the input sequence

Tags:

I'm trying to build a very simple LSTM autoencoder with PyTorch. I always train it with the same data:

x = torch.Tensor([[0.0], [0.1], [0.2], [0.3], [0.4]])

I have built my model following this link:

inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)

sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

My code is running with no errors but y_pred converge to:

tensor([[[0.2]],
        [[0.2]],
        [[0.2]],
        [[0.2]],
        [[0.2]]], grad_fn=<StackBackward>)

Here is my code:

import torch
import torch.nn as nn
import torch.optim as optim


class LSTM(nn.Module):

    def __init__(self, input_dim, latent_dim, batch_size, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.latent_dim = latent_dim
        self.batch_size = batch_size
        self.num_layers = num_layers

        self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)

        self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

    def init_hidden_encoder(self):
        return (torch.zeros(self.num_layers, self.batch_size, self.latent_dim),
                torch.zeros(self.num_layers, self.batch_size, self.latent_dim))

    def init_hidden_decoder(self):
        return (torch.zeros(self.num_layers, self.batch_size, self.input_dim),
                torch.zeros(self.num_layers, self.batch_size, self.input_dim))

    def forward(self, input):
        # Reset hidden layer
        self.hidden_encoder = self.init_hidden_encoder()
        self.hidden_decoder = self.init_hidden_decoder()

        # Reshape input
        input = input.view(len(input), self.batch_size, -1)

        # Encode
        encoded, self.hidden = self.encoder(input, self.hidden_encoder)
        encoded = encoded[-1].repeat(5, 1, 1)

        # Decode
        y, self.hidden = self.decoder(encoded, self.hidden_decoder)
        return y


model = LSTM(input_dim=1, latent_dim=20, batch_size=1, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)

x = torch.Tensor([[0.0], [0.1], [0.2], [0.3], [0.4]])

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, x)
    loss.backward()
    optimizer.step()
    print(y_pred)

881

asked Jan 28 '19 23:01

Neabfi

1 Answers

1. Initializing hidden states

In your source code you are using init_hidden_encoder and init_hidden_decoder functions to zero hidden states of both recurrent units in every forward pass.

In PyTorch you don't have to do that, if no initial hidden state is passed to RNN-cell (be it LSTM, GRU or RNN from the ones currently available by default in PyTorch), it is implicitly fed with zeroes.

So, to obtain the same code as your initial solution (which simplifies next parts), I will scrap unneeded parts, which leaves us with the model seen below:

class LSTM(nn.Module):
    def __init__(self, input_dim, latent_dim, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.latent_dim = latent_dim
        self.num_layers = num_layers

        self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)

        self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

    def forward(self, input):
        # Encode
        _, (last_hidden, _) = self.encoder(input)
        encoded = last_hidden.repeat(5, 1, 1)

        # Decode
        y, _ = self.decoder(encoded)
        return torch.squeeze(y)

Addition of torch.squeeze

We don't need any superfluous dimensions (like the 1 in [5,1,1]). Actually, it's the clue to your results equal to 0.2

Furthermore, I left input reshape out of the network (in my opinion, network should be fed with input ready to be processed), to separate strictly both tasks (input preparation and model itself).

This approach gives us the following setup code and training loop:

model = LSTM(input_dim=1, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)

y = torch.Tensor([[0.0], [0.1], [0.2], [0.3], [0.4]])
# Sequence x batch x dimension
x = y.view(len(y), 1, -1)

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, y)
    loss.backward()
    optimizer.step()
    print(y_pred)

Whole network is identical to yours (for now), except it is more succinct and readable.

2. What we want, describing network changes

As your provided Keras code indicates, what we want to do (and actually you are doing it correctly) is to obtain last hiddden state from the encoder (it encodes our entire sequence) and decode the sequence from this state to obtain the original one.

BTW. this approach is called sequence to sequence or seq2seq for short (often used in tasks like language translation). Well, maybe a variation of that approach, but I would classify it as that anyway.

PyTorch provides us the last hidden state as a separate return variable from RNNs family. I would advise against yours encoded[-1]. The reason for it would be bidirectional and multilayered approach. Say, you wanted to sum bidirectional output, it would mean a code along those lines

# batch_size and hidden_size should be inferred cluttering the code further    
encoded[-1].view(batch_size, 2, hidden_size).sum(dim=1)

And that's why the line _, (last_hidden, _) = self.encoder(input) was used.

3. Why does the output converge to 0.2?

Actually, it was a mistake on your side and only in the last part.

Output shapes of your predictions and targets:

# Your output
torch.Size([5, 1, 1])
# Your target
torch.Size([5, 1])

If those shapes are provided, MSELoss, by default, uses argument size_average=True. And yes, it averages your targets and your output, which essentially calculates loss for the average of your tensor (around 2.5 at the beginning) and average of your target which is 0.2.

So the network converges correctly, but your targets are wrong.

3.1 First and wrong solution

Provide MSELoss with argument reduction="sum", though it's really temporary and works accidentally. Network, at first, will try to get all of the outputs to be equal to sum (0 + 0.1 + 0.2 + 0.3 + 0.4 = 1.0), at first with semi-random outputs, after a while it will converge to what you want, but not for the reasons you want!.

Identity function is the easiest choice here, even for summation (as your input data is really simple).

3.2 Second and correct solution.

Just pass appropriate shapes to loss function, e.g. batch x outputs, in your case, the final part would look like this:

model = LSTM(input_dim=1, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters())

y = torch.Tensor([0.0, 0.1, 0.2, 0.3, 0.4])
x = y.view(len(y), 1, -1)

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, y)
    loss.backward()
    optimizer.step()
    print(y_pred)

Your target is one dimensional (as batch is of size 1) and so is your output (after squeezing unnecessary dimensions).

I changed Adam's parameters to defaults as it converges faster that way.

4. Final working code

For brevity, here is the code and results:

import torch
import torch.nn as nn
import torch.optim as optim


class LSTM(nn.Module):
    def __init__(self, input_dim, latent_dim, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.latent_dim = latent_dim
        self.num_layers = num_layers

        self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)

        self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

    def forward(self, input):
        # Encode
        _, (last_hidden, _) = self.encoder(input)
        # It is way more general that way
        encoded = last_hidden.repeat(input.shape)

        # Decode
        y, _ = self.decoder(encoded)
        return torch.squeeze(y)


model = LSTM(input_dim=1, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters())

y = torch.Tensor([0.0, 0.1, 0.2, 0.3, 0.4])
x = y.view(len(y), 1, -1)

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, y)
    loss.backward()
    optimizer.step()
    print(y_pred)

And here are the results after ~60k steps (it is stuck after ~20k steps actually, you may want to improve your optimization and play around with hidden size for better results):

step=59682                       
tensor([0.0260, 0.0886, 0.1976, 0.3079, 0.3962], grad_fn=<SqueezeBackward0>)

Additionally, L1Loss (a.k.a Mean Absolute Error) may get better results in this case:

step=10645                        
tensor([0.0405, 0.1049, 0.1986, 0.3098, 0.4027], grad_fn=<SqueezeBackward0>)

Tuning and correct batching of this network is left for you, hope you'll have some fun now and you get the idea. :)

PS. I repeat entire shape of input sequence, as it's more general approach and should work with batches and more dimensions out of the box.

184

answered Oct 13 '22 23:10

Szymon Maszke

Related questions
                            
                                How to avoid incorrect rounding with numpy.round?
                            
                                Modifying value on serialization - Django Rest Framework
                            
                                Django filter on date difference between columns
                            
                                Explain python Singleton class
                            
                                How to assign a unique ID to detect repeated rows in a pandas dataframe?
                            
                                How to run Spyder with Python 3.7 with Anaconda
                            
                                Type hints: when to annotate
                            
                                Get CSRF token using python requests
                            
                                How to use an aiohttp ClientSession with Sanic?
                            
                                How do I turn a Pytorch Dataloader into a numpy array to display image data with matplotlib?
                            
                                Keras: "must compile model before using it" despite compile() is used
                            
                                how to get the attribute of setter method of property in python
                            
                                Nested list comprehension with if statement
                            
                                How can I recover the commit message when the git commit-msg hook fails?
                            
                                Pandas isin with multiple columns
                            
                                How to I return JSON from a Google Cloud Function
                            
                                Stop Keras Training when the network has fully converge
                            
                                Get cumulative count per 2d array
                            
                                Read Fernet Key Causes ValueError: Fernet key must be 32 url-safe base64-encoded bytes
                            
                                what does this mean? xarray error: cannot handle a non-unique multi-index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

LSTM autoencoder always returns the average of the input sequence

Tags:

python

machine-learning

lstm

pytorch