Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

implementing RNN with numpy

I'm trying to implement the recurrent neural network with numpy.

My current input and output designs are as follow:

x is of shape: (sequence length, batch size, input dimension)

h : (number of layers, number of directions, batch size, hidden size)

initial weight: (number of directions, 2 * hidden size, input size + hidden size)

weight: (number of layers -1, number of directions, hidden size, directions*hidden size + hidden size)

bias: (number of layers, number of directions, hidden size)

I have looked up pytorch API of RNN the as reference (https://pytorch.org/docs/stable/nn.html?highlight=rnn#torch.nn.RNN), but have slightly changed it to include initial weight as input. (output shapes are supposedly the same as in pytorch)

While it is running, I cannot determine whether it is behaving right, as I am inputting randomly generated numbers as input.

In particular, I am not so certain whether my input shapes are designed correctly.

Could any expert give me a guidance?

def rnn(xs, h, w0, w=None, b=None, num_layers=2, nonlinearity='tanh', dropout=0.0, bidirectional=False, training=True):
    num_directions = 2 if bidirectional else 1
    batch_size = xs.shape[1]
    input_size = xs.shape[2]
    hidden_size = h.shape[3]
    hn = []
    y = [None]*len(xs)

    for l in range(num_layers):
        for d in range(num_directions):
            if l==0 and d==0:
                wi = w0[d, :hidden_size,  :input_size].T
                wh = w0[d, hidden_size:,  input_size:].T
                wi = np.reshape(wi, (1,)+wi.shape)
                wh = np.reshape(wh, (1,)+wh.shape)
            else:
                wi = w[max(l-1,0), d, :,  :hidden_size].T
                wh = w[max(l-1,0), d, :,  hidden_size:].T
            for i,x in enumerate(xs):
                if l==0 and d==0:
                    ht = np.tanh(np.dot(x, wi) + np.dot(h[l, d], wh) + b[l, d][np.newaxis])
                    ht = np.reshape(ht,(batch_size, hidden_size)) #otherwise, shape is (bs,1,hs)
                else:
                    ht = np.tanh(np.dot(y[i], wi) + np.dot(h[l, d], wh) + b[l, d][np.newaxis])
                y[i] = ht
            hn.append(ht)
    y = np.asarray(y)
    y = np.reshape(y, y.shape+(1,))
    return np.asarray(y), np.asarray(hn)
like image 891
ytrewq Avatar asked Jul 22 '18 13:07

ytrewq


People also ask

Can RNN be trained with GPU?

Accelerating Recurrent Neural Networks using GPUsThe parallel processing capabilities of GPUs can accelerate both the training and inference processes of RNNs.

Is it possible to build a recurrent neural network with NumPy?

In today’s post we will try to build a Recurrent Neural Network with numpy, in order to get a better understanding of how recurrent algorithms are used in NLP. The limits of my language mean the limits of my world.

How to implement basic RNN cell in Python?

To start with the implementation of the basic RNN cell, we first define the dimensions of the various parameters U,V,W,b,c. Dimensions :Let’s assume we pick a vocabulary size vocab_size= 8000 and a hidden layer size hidden_size=100.

What is recurrent neural network (RNN)?

The gist is that the size of the input is fixed in all these “vanilla” neural networks. In this article, we’ll understand and build Recurrent Neural Network (RNNs), which learn functions that can be one-to-many, many-to-one, many-to-many. But what does that mean? They take as input sequences, such as speech, natural language, time series, or video.

What is the input and output of RNN?

Weights: The RNN has input to hidden connections parameterized by a weight matrix U, hidden-to-hidden recurrent connections parameterized by a weight matrix W, and hidden-to-output connections parameterized by a weight matrix V and all these weights ( U, V, W) are shared across time. Output: o (t) ​ illustrates the output of the network.


1 Answers

Regarding the shape, it probably makes sense if that's how PyTorch does it, but the Tensorflow way is a bit more intuitive - (batch_size, seq_length, input_size) - batch_size sequences of seq_length length where each element has input_size size. Both approaches can work, so I guess it's a matter of preferences.

To see whether your rnn is behaving appropriately, I'd just print the hidden state at each time step, run it on some small random data (e.g. 5 vectors, 3 elements each) and compare the results with your manual calculations.

Looking at your code, I'm unsure if it does what it's supposed to, but instead of doing this on your own based on an existing API, I'd recommend you read and try to replicate this awesome tutorial from wildml (in part 2 there's a pure numpy implementation).

like image 50
Dzjkb Avatar answered Sep 21 '22 00:09

Dzjkb