Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a simple RNN with a cycle in the graph in TensorFlow?

I've just started playing with TensorFlow and I'm trying to implement a very simple RNN. The RNN has x as input, y as output and consists of just a single layer that takes x and it's previous output as input. Here's a picture of the sort of thing I have in mind:

A simple RNN

The problem is, I can't see any way through the TensorFlow API to construct a graph with a cycle in it. Whenever I define a Tensor I have to specify what it's inputs are, which means I have to have already have defined it's inputs. So there's a chicken-and-egg problem.

I don't even know if it makes sense to want to define a graph with a cycle (What gets computed first? Would I have to define an initial value of the softmax node?). I played with the idea of using a variable to represent the previous output and then manually take the value of y and store it in the variable every time after feeding through a training sample. But that would be very slow unless there's a way to represent this procedure in the graph itself (?).

I know the TensorFlow tutorials show example implementations of RNNs but they cheat and pull an LSTM module out of the library which already has the cycle in it. Overall the tutorials are good for stepping you through how to build certain things but they could do a better job of explaining how this beast really works.

So, TensorFlow experts, is there a way to build this thing? How would I go about doing it?

like image 397
Shum Avatar asked Feb 17 '16 17:02

Shum


1 Answers

As a matter of fact, both the forward and backward pass in all the machine learning frameworks assume that your network does not have cycles. A common way of implementing a recurrent network is unrolling it in time for several steps (say 50), and therefore converting a network that has loops into one that does not have any.

For instance, in the docs you are referring to:

https://www.tensorflow.org/versions/r0.7/tutorials/recurrent/index.html

They mention

In order to make the learning process tractable, it is a common practice to truncate the gradients for backpropagation to a fixed number (num_steps) of unrolled steps.

What it effectively means is that they will create num_steps LSTM cells, where each takes as an input the value x for the current timestep, and the output of the previous LSTM module.

The BasicLSTMCell that they use and that you think has a loop in fact does not have a loop. An LSTM cell is just an implementation of a single LSTM step (a block that has two inputs [input and memory] and two outputs [output and memory], and uses gates to compute outputs from inputs), not the entire LSTM network.

like image 123
Ishamael Avatar answered Oct 26 '22 15:10

Ishamael