Stacking up of LSTM outputs in pytorch

Tags:

I was going through some tutorial about the sentiment analysis using lstm network. The below code said that its stacks up the lstm output. I Don't know how it works.

lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)

957

asked Feb 18 '19 14:02

Koushik J

1 Answers

It indeed stacks the output, the comment by kHarshit is misleading here!

To visualize this, let us review the output of the previous line in the tutorial (accessed May 1st, 2019):

lstm_out, hidden = self.lstm(embeds, hidden)

The output dimension of this will be [sequence_length, batch_size, hidden_size*2], as per the documentation. Here, the length of twice the input comes from having a bidirectional LSTM. Therefore, your first half of the last dimension will always be the forward output, and then afterwards the backwards output (I'm not entirely sure on the direction of that, but it seems to me that it is already in the right direction).

Then, the actual line that you are concerned about:

We're ignoring the specifics of .contiguous() here, but you can read up on it in this excellent answer on Stackoverflow. In summary, it basically makes sure that your torch.Tensor is in the right alignment in memory.
Lastly, .view() allows you to reshape a resulting tensor in a specific way. Here, we're aiming for a shape that has two dimensions (as defined by the number of input arguments to .view(). Specifically, the second dimension is supposedly having the size hidden_dim. -1 for the first dimension simply means that we're redistributing the vector dimension in such a way that we don't care about the exact dimension, but simply satisfy the other dimension's requirements.
So, if you have a vector of, say, length 40, and want to reshape that one into a 2D-Tensor of (-1, 10), then the resulting tensor would have shape (4, 10).

As we've previously said that the first half of the vector (length hidden_dim) is the forward output, and the latter half is the second half, then the resulting split into a tensor of (-1, hidden_dim) will be resulting in a tensor of (2, hidden_dim), where the first row contains the forward output, "stacked" on top of the second row, which equals the reverse layer's output.

Visual example:

lstm_out, hidden = self.lstm(embeds, hidden)
print(lstm_out) # imagine a sample output like [1,0 , 2,0] 
                #                      forward out  | backward out

stacked = lstm_out.contiguous().view(-1,hidden_dim) # hidden_dim = 2

print(stacked) # torch.Tensor([[1,0],
               #               [2,0]])

answered Oct 24 '22 02:10

dennlinger

Related questions
                            
                                Pygame: Rescale pixel size
                            
                                Running `connect_get_namespaced_pod_exec` using kubernetes client corev1api gives bad request
                            
                                Dynamically change size of arrowheads in networkx based on list/dict
                            
                                Python convert the day of year to month on an axis
                            
                                Convert categorical data back to numbers using keras utils to_categorical
                            
                                Google Drive API v3 Change File Permissions and Get Publicly Shareable Link (Python)
                            
                                Multithreading with spacy: Is joblib necessary?
                            
                                PySpark 2.x: Programmatically adding Maven JAR Coordinates to Spark
                            
                                Deep learnin on Google Colab: loading large image dataset is very long, how to accelerate the process?
                            
                                How do I write a Django ORM query for the reverse relationship in a one-to-many relationship?
                            
                                Selectively import from another Jupyter Notebook
                            
                                Automate the Boring Stuff Chapter 7: Regular Expressions - phone number and email extractor only extracting phone numbers
                            
                                Pandas Dataframe: to_dict() poor performance
                            
                                python websockets, how to setup connect timeout
                            
                                Python Setup.py: set environment variable prior to running install_requires
                            
                                How to get Flask-SQLAlchemy to work with the Application Factory Pattern
                            
                                Why is this message printed more than once during multiprocessing with concurrent.futures.ProcessPoolExecuter()?
                            
                                Mechanize: too many values to unpack (expected 2)
                            
                                oct2py isn't seeing OCTAVE_EXECUTABLE environment variable (Windows)
                            
                                Why is random.random() not secure in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Stacking up of LSTM outputs in pytorch

Tags:

python

deep-learning

lstm

pytorch

Koushik J

People also ask

1 Answers

dennlinger

Recent Activity

Donate For Us