I have:
def __init__(self, feature_dim=15, hidden_size=5, num_layers=2):
super(BaselineModel, self).__init__()
self.num_layers = num_layers
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size=feature_dim,
hidden_size=hidden_size, num_layers=num_layers)
and then I get an error:
RuntimeError: The size of tensor a (5) must match the size of tensor b (15) at non-singleton dimension 2
If I set the two sizes to be the same, then the error goes away. But I'm wondering if my input_size
is some large number, say 15, and I want to reduce the number of hidden features to 5, why shouldn't that work?
hidden_size – The number of features in the hidden state h. num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM , with the second LSTM taking in outputs of the first LSTM and computing the final results.
An LSTM cell takes the following inputs: input, (h_0, c_0). input : a tensor of inputs of shape (batch, input_size) , where we declared input_size in the creation of the LSTM cell. h_0 : a tensor containing the initial hidden state for each element in the batch, of shape (batch, hidden_size).
Here the hidden_size of the LSTM layer would be 512 as there are 512 units in each LSTM cell and the num_layers would be 2. The num_layers is the number of layers stacked on top of each other.
The short answer is: Yes, input_sizecan be different from hidden_size. For an elaborated answer, take a look at the LSTM formulae in the PyTorch documentations, for instance:
Default: 0 bidirectional – If True, becomes a bidirectional LSTM. Default: False proj_size – If > 0, will use LSTM with projections of corresponding size. Default: 0 ) when batch_first=True containing the features of the input sequence. The input can also be a packed variable length sequence.
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. For each element in the input sequence, each layer computes the following function: are the input, forget, cell, and output gates, respectively.
Default: False dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Default: 0
It should work the error probably came from elsewhere. This work for example:
feature_dim = 15
hidden_size = 5
num_layers = 2
seq_len = 5
batch_size = 3
lstm = nn.LSTM(input_size=feature_dim,
hidden_size=hidden_size, num_layers=num_layers)
t1 = torch.from_numpy(np.random.uniform(0,1,size=(seq_len, batch_size, feature_dim))).float()
output, states = lstm.forward(t1)
hidden_state, cell_state = states
print("output: ",output.size())
print("hidden_state: ",hidden_state.size())
print("cell_state: ",cell_state.size())
and return
output: torch.Size([5, 3, 5])
hidden_state: torch.Size([2, 3, 5])
cell_state: torch.Size([2, 3, 5])
Are you using the output somewhere after the lstm ? Did you notice it has a size equal to hidden dim ie 5 on last dim ? It looks like you're using it afterwards thinking it has a size of 15 instead
The short answer is: Yes, input_size
can be different from hidden_size
.
For an elaborated answer, take a look at the LSTM formulae in the PyTorch documentations, for instance:
This is the formula to compute i_t, the input activation at the t-th time step for one layer. Here the matrix W_ii has the shape of (hidden_size x input_size)
. Similarly in other formulae, matrices W_if, W_ig, and W_io all have the same shape. These matrices project the input tensor into the same space as hidden states, so that they can be added together.
Back to your specific problem, as the other answer pointed out, it's probably an error at another part of your code. Without looking at your forward
implementation, it's hard to say what the problem is exactly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With