Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hidden size vs input size in RNN

Premise 1:

Regarding neurons in a RNN layer - it is my understanding that at "each time step, every neuron receives both the input vector x (t) and the output vector from the previous time step y (t –1)" [1]:

https://github.com/ebolotin6/ebolotin6.github.io/blob/master/images/rnn.png

Premise 2:

It is also my understanding that in Pytorch's GRU layer, input_size and hidden_size mean the following:

  • input_size – The number of expected features in the input x
  • hidden_size – The number of features in the hidden state h

So naturally, hidden_size should represent the number of neurons in a GRU layer.

My question:

Given the following GRU layer:

# assume that hidden_size = 3

class Encoder(nn.Module):
    def __init__(self, src_dictionary_size, hidden_size):
        super(Encoder, self).__init__()
        self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
        self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size)

Assuming a hidden_size of 3, my understanding is that the GRU layer above would have 3 neurons, each which accepts an input vector of size 3 simultaneously for every timestep.

My question is: why do the arguments to hidden_size and input_size have to be equal? I.e. why can't each of the 3 neurons accept say, an input vector of size 5?

Case in point: both of the following produce size mismatch:

self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size+1)

[1] Géron, Aurélien. Hands-On Machine Learning with Scikit-Learn and TensorFlow (p. 388). O'Reilly Media. Kindle Edition.

[3] https://pytorch.org/docs/stable/nn.html#torch.nn.GRU


Adding full code for reproducibility:

import torch
import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, src_dictionary_size, hidden_size):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
        self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)

    def forward(self, pad_seqs, seq_lengths, hidden):
        """
        Args:
          pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
          seq_lengths: List of sequence lengths.
          hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.

        Returns:
          outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
          hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
        """
        embedded_sqs = self.embedding(pad_seqs).squeeze(2)
        packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
        packed_output, h_n = self.gru(packed_sqs, hidden)
        output, input_sizes = pad_packed_sequence(packed_output)

        return output, h_n

    def init_hidden(self, batch_size=1):
        return torch.zeros(1, batch_size, self.hidden_size)

def test_Encoder_shapes():
    hidden_size = 5
    encoder = Encoder(src_dictionary_size=5, hidden_size=hidden_size)

    # maximum word count
    max_seq_length = 4

    # num sentences
    batch_size = 2
    hidden = encoder.init_hidden(batch_size=batch_size)

    # these are padded sequences (sentences of words). There are 2 sentences (i.e. 2 batches) with a maximum of 4 words.
    pad_seqs = torch.tensor([
        [1, 2],
        [2, 3],
        [3, 0],
        [4, 0]
    ]).view(max_seq_length, batch_size, 1)

    outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
    assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
    assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
    print('Success')

test_Encoder_shapes()
like image 795
DSH Avatar asked Dec 04 '19 18:12

DSH


People also ask

What is the hidden size in RNN?

Hidden size is number of features of the hidden state for RNN. So if you increase hidden size then you compute bigger feature as hidden state output. However, num_layers is just multiple RNN units which contain hidden states with given hidden size.

What is input size in RNN?

Input Dimension or Input Size is the number of features or dimensions you are using in your data set. In this case, it is one (Columns/ Features).

What is the hidden state in RNN?

The hidden state in a RNN is basically just like a hidden layer in a regular feed-forward network - it just happens to also be used as an additional input to the RNN at the next time step. Where f is some non-linear function, Wxh is a weight matrix of size h×x, and Whh is a weight matrix of size h×h.

What is the input to aN RNN?

Therefore, a RNN has two inputs: the present and the recent past. This is important because the sequence of data contains crucial information about what is coming next, which is why a RNN can do things other algorithms can't.


1 Answers

I just resolved this and the mistake was self-inflicted.

Conclusion: input_size and hidden_size can differ in size and there is no inherent problem with this. The premises in the question are correctly stated.

The problem with the (full) code above was that the initial hidden state of the GRU did not have the correct dimensions. The initial hidden state must have the same dimensions as subsequent hidden states. In my case, the initial hidden state had the shape of (1,2,5) instead of (1,2,4). In the former, 5 represents the dimensionality of the embedding vector. 4 represents the hidden_size (num neurons) in the GRU. The correct code is below:

import torch
import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, src_dictionary_size, input_size, hidden_size):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(src_dictionary_size, input_size)
        self.gru = nn.GRU(input_size = input_size, hidden_size = hidden_size)

    def forward(self, pad_seqs, seq_lengths, hidden):
        """
        Args:
          pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
          seq_lengths: List of sequence lengths.
          hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.

        Returns:
          outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
          hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
        """
        embedded_sqs = self.embedding(pad_seqs).squeeze(2)
        packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
        packed_output, h_n = self.gru(packed_sqs, hidden)
        output, input_sizes = pad_packed_sequence(packed_output)

        return output, h_n

    def init_hidden(self, batch_size=1):
        return torch.zeros(1, batch_size, self.hidden_size)

def test_Encoder_shapes():
    hidden_size = 4
    embedding_size = 5
    encoder = Encoder(src_dictionary_size=5, input_size = embedding_size, hidden_size = hidden_size)
    print(encoder)

    max_seq_length = 4
    batch_size = 2
    hidden = encoder.init_hidden(batch_size=batch_size)
    pad_seqs = torch.tensor([
        [1, 2],
        [2, 3],
        [3, 0],
        [4, 0]
    ]).view(max_seq_length, batch_size, 1)

    outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
    assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
    assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
    print('Success')

test_Encoder_shapes()
like image 150
DSH Avatar answered Oct 28 '22 22:10

DSH