Change Tanh activation in LSTM to ReLU

Tags:

The default non-linear activation function in LSTM class is tanh. I wish to use ReLU for my project. Browsing through the documentation and other resources, I'm unable to find a way to do this in a simple manner. The only way I could find was to define my own custom LSTMCell, but here the author says that custom LSTMCells don't support GPU acceleration capabilities(or has that changed since the article was published?). I need to use CUDA to speed up my training. Any help would be appreciated.

544

asked Feb 28 '18 23:02

Venkat

1 Answers

Custom LSTMCells don't support GPU acceleration capabilities - this statement probably means GPU acceleration capabilities become limited if you use LSTMCells. And definitely, you can write your own implementation of LSTM but you need to sacrifice runtime.

For example, once I implemented an LSTM (based on linear layers) as follows which used to take 2~3 times more time than LSTM (provided in PyTorch) when used as a part of a deep neural model.

class LSTMCell(nn.Module):
    def __init__(self, input_size, hidden_size, nlayers, dropout):
        """"Constructor of the class"""
        super(LSTMCell, self).__init__()

        self.nlayers = nlayers
        self.dropout = nn.Dropout(p=dropout)

        ih, hh = [], []
        for i in range(nlayers):
            ih.append(nn.Linear(input_size, 4 * hidden_size))
            hh.append(nn.Linear(hidden_size, 4 * hidden_size))
        self.w_ih = nn.ModuleList(ih)
        self.w_hh = nn.ModuleList(hh)

    def forward(self, input, hidden):
        """"Defines the forward computation of the LSTMCell"""
        hy, cy = [], []
        for i in range(self.nlayers):
            hx, cx = hidden[0][i], hidden[1][i]
            gates = self.w_ih[i](input) + self.w_hh[i](hx)
            i_gate, f_gate, c_gate, o_gate = gates.chunk(4, 1)

            i_gate = F.sigmoid(i_gate)
            f_gate = F.sigmoid(f_gate)
            c_gate = F.tanh(c_gate)
            o_gate = F.sigmoid(o_gate)

            ncx = (f_gate * cx) + (i_gate * c_gate)
            nhx = o_gate * F.tanh(ncx)
            cy.append(ncx)
            hy.append(nhx)
            input = self.dropout(nhx)

        hy, cy = torch.stack(hy, 0), torch.stack(cy, 0)
        return hy, cy

I would be happy to know if the runtime of custom implementation of LSTM can be improved!

answered Oct 12 '22 11:10

Wasi Ahmad

Related questions
                            
                                LSTM Autoencoder on timeseries
                            
                                LSTM RNN to predict multiple time-steps and multiple features simultaneously
                            
                                Better way to concatenate ConvLSTM2D model and Tabular model
                            
                                What's the difference between two implementations of RNN in tensorflow?
                            
                                What does BasicLSTMCell do?
                            
                                What is actually num_unit in LSTM cell circuit?
                            
                                How to configure input shape for bidirectional LSTM in Keras
                            
                                Understanding multivariate time series classification with Keras
                            
                                Multi-feature causal CNN - Keras implementation
                            
                                Unable to save TensorFlow Keras LSTM model to SavedModel format
                            
                                Replace specific text with a redacted version using Python
                            
                                How can I build an LSTM AutoEncoder with PyTorch?
                            
                                Panel data in Keras LSTM
                            
                                How to use max pooling to gather information from LSTM nodes
                            
                                Simple LSTM in PyTorch with Sequential module
                            
                                Keras: Embedding in LSTM
                            
                                LSTM implementation with peephole
                            
                                How to merge two LSTM layers in Keras
                            
                                Output of Tensorflow LSTM-Cell
                            
                                swap_memory in dynamic_rnn allows quasi-infinite sequences?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Change Tanh activation in LSTM to ReLU

Tags:

lstm

pytorch

Venkat

People also ask

1 Answers

Wasi Ahmad

Recent Activity

Donate For Us