LSTM in Pytorch

Tags:

I'm new to PyTorch. I came across some this GitHub repository (link to full code example) containing various different examples.

There is also an example about LSTMs, this is the Network class:

Click to copy

# RNN Model (Many-to-One)
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Set initial states 
        h0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) 
        c0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))

        # Forward propagate RNN
        out, _ = self.lstm(x, (h0, c0))  

        # Decode hidden state of last time step
        out = self.fc(out[:, -1, :])  
        return out

So my question is about the following lines:

Click to copy

h0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) 
c0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))

As far as I understand it, forward() is called for every training example. But this would mean, that the hidden state and cell state would be resettet i.e. replaced with a matrix of zeros on every training example.

The names h0 and c0 indicate that this is only the hidden/cell state at t=0, but why then are theses zeros matrices handed over to the lstm with every training example?

Even if they are just ignored after the first call, it would not be a very nice solution.

When testing the code it states an accuracy of 97% on the MNIST set, so it seems to work this way, but it doesn't make sense to me.

Hope someone can help me out with this.

Thanks in advance!

756

asked Feb 16 '18 17:02

MBT

1 Answers

Obviously I was on the wrong track with this. I was confusing hidden units and hidden/cell state. Only the hidden units in the LSTM are trained during the training step. Cell state and hidden state are resetet at the beginning of every sequence. So it just makes sense that it is programmed this way.

Sorry for this..

answered Sep 29 '22 11:09

MBT

Related questions
                            
                                How to visualize kmeans clustering on multidimensional data
                            
                                django-auth-ldap installation not working
                            
                                Mean Std in pandas data frame
                            
                                Checking if two arrays are broadcastable in python
                            
                                How to plot using matplotlib (python) colah's deformed grid?
                            
                                How to have predictions AND labels returned with tf.estimator (either with predict or eval method)?
                            
                                Draw line between two given points (OpenCV, Python)
                            
                                Plotting a 2D plane through a 3D surface
                            
                                how to write .npy file to s3 directly?
                            
                                Non-ASCII Python identifiers and reflectivity [duplicate]
                            
                                AUTH_USER_MODEL refers to model 'accounts.User' that has not been installed
                            
                                sklearn - how to incorporate missing data when one-hot encoding
                            
                                Django, update the object after a prefetch_related
                            
                                Fastest way to find unique combinations of list
                            
                                Time series correlation with pandas
                            
                                Python - TypeError: Can't mix strings and bytes in path components
                            
                                Tensorflow dataset data preprocessing is done once for the whole dataset or for each call to iterator.next()?
                            
                                De-spiking a non-periodic signal
                            
                                How to create new environment from a text file without environment name?
                            
                                What is dynamic dispatch and duck typing?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

LSTM in Pytorch

Tags:

python

neural-network

deep-learning

lstm

pytorch

MBT

People also ask

1 Answers

MBT

Recent Activity

Donate For Us