What is the difference between LSTM and LSTMCell in Pytorch (currently version 1.1)? It seems that LSTMCell is a special case of LSTM (i.e. with only one layer, unidirectional, no dropout). Then, what's the purpose of having both implementations? Unless I'm missing something, it's trivial to use an LSTM object as an LSTMCell (or alternatively, it's pretty easy to use multiple LSTMCells to create the LSTM object)

Yes, you can emulate one by another, the reason for having them separate is efficiency. <code>LSTMCell</code> is a cell that takes arguments: <ul> <li>Input of shape batch × input dimension;</li> <li>A tuple of LSTM hidden states of shape batch x hidden dimensions.</li> </ul> It is a straightforward implementation of the equations. <code>LSTM</code> is a layer applying an LSTM cell (or multiple LSTM cells) in a "for loop", but the loop is heavily optimized using cuDNN. Its input is <ul> <li>A three-dimensional tensor of inputs of shape batch × input length × input dimension;</li> <li>Optionally, an initial state of the LSTM, i.e., a tuple of hidden states of shape batch × hidden dim (or tuple of such tuples if the LSTM is bidirectional)</li> </ul> You often might want to use the LSTM cell in a different context than apply it over a sequence, i.e. make an LSTM that operates over a tree-like structure. When you write a decoder in sequence-to-sequence models, you also call the cell in a loop and stop the loop when the end-of-sequence symbol is decoded.

Pytorch LSTM vs LSTMCell

Tags:

lstm

pytorch

recurrent-neural-network

lstm-stateful

What is the difference between LSTM and LSTMCell in Pytorch (currently version 1.1)? It seems that LSTMCell is a special case of LSTM (i.e. with only one layer, unidirectional, no dropout).

Then, what's the purpose of having both implementations? Unless I'm missing something, it's trivial to use an LSTM object as an LSTMCell (or alternatively, it's pretty easy to use multiple LSTMCells to create the LSTM object)

768

asked Jul 15 '19 23:07

dkv

1 Answers

Yes, you can emulate one by another, the reason for having them separate is efficiency.

LSTMCell is a cell that takes arguments:

Input of shape batch × input dimension;
A tuple of LSTM hidden states of shape batch x hidden dimensions.

It is a straightforward implementation of the equations.

LSTM is a layer applying an LSTM cell (or multiple LSTM cells) in a "for loop", but the loop is heavily optimized using cuDNN. Its input is

A three-dimensional tensor of inputs of shape batch × input length × input dimension;
Optionally, an initial state of the LSTM, i.e., a tuple of hidden states of shape batch × hidden dim (or tuple of such tuples if the LSTM is bidirectional)

You often might want to use the LSTM cell in a different context than apply it over a sequence, i.e. make an LSTM that operates over a tree-like structure. When you write a decoder in sequence-to-sequence models, you also call the cell in a loop and stop the loop when the end-of-sequence symbol is decoded.

answered Nov 11 '22 12:11

Jindřich

Related questions
                            
                                In-place operations with PyTorch
                            
                                Reproducibility and performance in PyTorch
                            
                                How to correctly implement a batch-input LSTM network in PyTorch?
                            
                                What does log_prob do?
                            
                                Multi label classification in pytorch
                            
                                When should I use nn.ModuleList and when should I use nn.Sequential?
                            
                                Replace all nonzero values by zero and all zero values by a specific value
                            
                                Difference between 1 LSTM with num_layers = 2 and 2 LSTMs in pytorch
                            
                                Implementing dropout from scratch
                            
                                How to convert Pytorch autograd.Variable to Numpy?
                            
                                AttributeError: module 'torchtext.data' has no attribute 'Field'
                            
                                How to compute the cosine_similarity in pytorch for all rows in a matrix with respect to all rows in another matrix
                            
                                Generating new images with PyTorch
                            
                                Torchtext 0.7 shows Field is being deprecated. What is the alternative?
                            
                                How to convert a list of strings into a tensor in pytorch?
                            
                                Nvcc missing when installing cudatoolkit?
                            
                                AdamW and Adam with weight decay
                            
                                Include .whl installation in requirements.txt
                            
                                Can I slice tensors with logical indexing or lists of indices?
                            
                                Unique values in PyTorch tensor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With