The default non-linear activation function in LSTM class is tanh. I wish to use ReLU for my project. Browsing through the documentation and other resources, I'm unable to find a way to do this in a simple manner. The only way I could find was to define my own custom LSTMCell, but here the author says that custom LSTMCells don't support GPU acceleration capabilities(or has that changed since the article was published?). I need to use CUDA to speed up my training. Any help would be appreciated.
model = LSTM(100, return_sequences=True, input_shape(timesteps, n_features)) model = LSTM(50, return_sequences=True)(model) ... activation="tanh", recurrent_activation="sigmoid", So you should select another activation function if you want a different one.
Traditionally, LSTMs use the tanh activation function for the activation of the cell state and the sigmoid activation function for the node output. Given their careful design, ReLU were thought to not be appropriate for Recurrent Neural Networks (RNNs) such as the Long Short-Term Memory Network (LSTM) by default.
In LSTM network, tanh activation function is used to determine candidate cell state (internal state) values ( \tilde{C}_{t} ) and update the hidden state ( h_{t} ).
The proposed comb-H-sine activation function outperforms the traditional functions in LSTM with accuracy of 98.83%,93.49% and 78.38% with MNIST, IMDB and UCI HAR datasets respectively.
Custom LSTMCells don't support GPU acceleration capabilities - this statement probably means GPU acceleration capabilities become limited if you use LSTMCells. And definitely, you can write your own implementation of LSTM but you need to sacrifice runtime.
For example, once I implemented an LSTM (based on linear layers) as follows which used to take 2~3 times more time than LSTM (provided in PyTorch) when used as a part of a deep neural model.
class LSTMCell(nn.Module):
def __init__(self, input_size, hidden_size, nlayers, dropout):
""""Constructor of the class"""
super(LSTMCell, self).__init__()
self.nlayers = nlayers
self.dropout = nn.Dropout(p=dropout)
ih, hh = [], []
for i in range(nlayers):
ih.append(nn.Linear(input_size, 4 * hidden_size))
hh.append(nn.Linear(hidden_size, 4 * hidden_size))
self.w_ih = nn.ModuleList(ih)
self.w_hh = nn.ModuleList(hh)
def forward(self, input, hidden):
""""Defines the forward computation of the LSTMCell"""
hy, cy = [], []
for i in range(self.nlayers):
hx, cx = hidden[0][i], hidden[1][i]
gates = self.w_ih[i](input) + self.w_hh[i](hx)
i_gate, f_gate, c_gate, o_gate = gates.chunk(4, 1)
i_gate = F.sigmoid(i_gate)
f_gate = F.sigmoid(f_gate)
c_gate = F.tanh(c_gate)
o_gate = F.sigmoid(o_gate)
ncx = (f_gate * cx) + (i_gate * c_gate)
nhx = o_gate * F.tanh(ncx)
cy.append(ncx)
hy.append(nhx)
input = self.dropout(nhx)
hy, cy = torch.stack(hy, 0), torch.stack(cy, 0)
return hy, cy
I would be happy to know if the runtime of custom implementation of LSTM can be improved!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With