Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

With a PyTorch LSTM, can I have a different hidden_size than input_size?

I have:

    def __init__(self, feature_dim=15, hidden_size=5, num_layers=2):
        super(BaselineModel, self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size

        self.lstm = nn.LSTM(input_size=feature_dim,
                            hidden_size=hidden_size, num_layers=num_layers)

and then I get an error:

RuntimeError: The size of tensor a (5) must match the size of tensor b (15) at non-singleton dimension 2

If I set the two sizes to be the same, then the error goes away. But I'm wondering if my input_size is some large number, say 15, and I want to reduce the number of hidden features to 5, why shouldn't that work?

like image 493
Shamoon Avatar asked Mar 02 '20 15:03

Shamoon


People also ask

What is Hidden_size in LSTM?

hidden_size – The number of features in the hidden state h. num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM , with the second LSTM taking in outputs of the first LSTM and computing the final results.

How does LSTM work in pytorch?

An LSTM cell takes the following inputs: input, (h_0, c_0). input : a tensor of inputs of shape (batch, input_size) , where we declared input_size in the creation of the LSTM cell. h_0 : a tensor containing the initial hidden state for each element in the batch, of shape (batch, hidden_size).

What is hidden size in LSTM Pytorch?

Here the hidden_size of the LSTM layer would be 512 as there are 512 units in each LSTM cell and the num_layers would be 2. The num_layers is the number of layers stacked on top of each other.

Is input_sizecan different from hidden_size in PyTorch?

The short answer is: Yes, input_sizecan be different from hidden_size. For an elaborated answer, take a look at the LSTM formulae in the PyTorch documentations, for instance:

What is the default Default for LSTM?

Default: 0 bidirectional – If True, becomes a bidirectional LSTM. Default: False proj_size – If > 0, will use LSTM with projections of corresponding size. Default: 0 ) when batch_first=True containing the features of the input sequence. The input can also be a packed variable length sequence.

What is LSTM (long short term memory)?

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. For each element in the input sequence, each layer computes the following function: are the input, forget, cell, and output gates, respectively.

What is the default dropout probability in LSTM?

Default: False dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Default: 0


2 Answers

It should work the error probably came from elsewhere. This work for example:

        feature_dim = 15
        hidden_size = 5
        num_layers = 2
        seq_len = 5
        batch_size = 3
        lstm = nn.LSTM(input_size=feature_dim,
                                    hidden_size=hidden_size, num_layers=num_layers)

        t1 = torch.from_numpy(np.random.uniform(0,1,size=(seq_len, batch_size, feature_dim))).float()
        output, states = lstm.forward(t1)
        hidden_state, cell_state = states
        print("output: ",output.size())
        print("hidden_state: ",hidden_state.size())
        print("cell_state: ",cell_state.size())

and return

    output:  torch.Size([5, 3, 5])
    hidden_state:  torch.Size([2, 3, 5])
    cell_state:  torch.Size([2, 3, 5])

Are you using the output somewhere after the lstm ? Did you notice it has a size equal to hidden dim ie 5 on last dim ? It looks like you're using it afterwards thinking it has a size of 15 instead

like image 60
ThomaS Avatar answered Sep 28 '22 18:09

ThomaS


The short answer is: Yes, input_size can be different from hidden_size.

For an elaborated answer, take a look at the LSTM formulae in the PyTorch documentations, for instance:

Formulae for i_t

This is the formula to compute i_t, the input activation at the t-th time step for one layer. Here the matrix W_ii has the shape of (hidden_size x input_size). Similarly in other formulae, matrices W_if, W_ig, and W_io all have the same shape. These matrices project the input tensor into the same space as hidden states, so that they can be added together.

Back to your specific problem, as the other answer pointed out, it's probably an error at another part of your code. Without looking at your forward implementation, it's hard to say what the problem is exactly.

like image 41
Zecong Hu Avatar answered Sep 28 '22 19:09

Zecong Hu