I've just started playing with TensorFlow and I'm trying to implement a very simple RNN. The RNN has x
as input, y
as output and consists of just a single layer that takes x
and it's previous output as input. Here's a picture of the sort of thing I have in mind:
The problem is, I can't see any way through the TensorFlow API to construct a graph with a cycle in it. Whenever I define a Tensor I have to specify what it's inputs are, which means I have to have already have defined it's inputs. So there's a chicken-and-egg problem.
I don't even know if it makes sense to want to define a graph with a cycle (What gets computed first? Would I have to define an initial value of the softmax node?). I played with the idea of using a variable to represent the previous output and then manually take the value of y
and store it in the variable every time after feeding through a training sample. But that would be very slow unless there's a way to represent this procedure in the graph itself (?).
I know the TensorFlow tutorials show example implementations of RNNs but they cheat and pull an LSTM module out of the library which already has the cycle in it. Overall the tutorials are good for stepping you through how to build certain things but they could do a better job of explaining how this beast really works.
So, TensorFlow experts, is there a way to build this thing? How would I go about doing it?
As a matter of fact, both the forward and backward pass in all the machine learning frameworks assume that your network does not have cycles. A common way of implementing a recurrent network is unrolling it in time for several steps (say 50), and therefore converting a network that has loops into one that does not have any.
For instance, in the docs you are referring to:
https://www.tensorflow.org/versions/r0.7/tutorials/recurrent/index.html
They mention
In order to make the learning process tractable, it is a common practice to truncate the gradients for backpropagation to a fixed number (num_steps) of unrolled steps.
What it effectively means is that they will create num_steps
LSTM cells, where each takes as an input the value x
for the current timestep, and the output of the previous LSTM module.
The BasicLSTMCell
that they use and that you think has a loop in fact does not have a loop. An LSTM cell is just an implementation of a single LSTM step (a block that has two inputs [input and memory] and two outputs [output and memory], and uses gates to compute outputs from inputs), not the entire LSTM network.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With