I'm learning about artificial neural networks and have implemented a standard feed-forward net with a couple hidden layers. Now, I'm trying to understand how a recurrent neural network(RNN) works in practice, and am having trouble with how activation/propagation flows through the network.
In my feed-forward, the activation is a simple layer by layer firing of the neurons. In a recurrent net, the neurons connect back to previous layers and sometimes themselves, so the way to propagate the network must be different. Trouble is, I can't seem to find an explanation of exactly how the propagation happens.
How might it occur say for a network like this:
Input1 --->Neuron A1 ---------> Neuron B1 ---------------------> Output
^ ^ ^ |
| | --------
| |
Input2 --->Neuron A2 ---------> Neuron B2
I imagined it would be a rolling activation with a gradual die down as the neuron's thresholds decrease the neuron firing to 0, much like in biology, but it appears there is a much more computational efficient way to do this through derivatives?
I think I have a grasp now on the basic principle of propagating recurrent versus feed-forward networks: an explicit time step.
In a feed-forward, the propagation happens layer by layer, so Layer 1 neurons fire first, followed by Layers 2, 3 etc, so the propagation is one neuron activation stimulating activation in the neurons that take it as input.
Alternatively, we can think of propagation instead as the neurons whose inputs are active at any given point in time are the ones to fire. So if we have a time t=0 were Layer 1 neurons are active, at the next time t=1 the next layer Layer 2 will activate, since the neurons in Layer 2 take the neurons in Layer 1 as input.
While the difference in thinking may seem like semantics, for me it was crucial in figuring out how to implement recurrent networks. In the feed-forward the time step is implicit, and the code passes over the neuron layers in turn, activating them like falling dominoes. In a recurrent network, trying the falling-domino way of activation where every neuron specifies what neuron it activates next would be a nightmare for large, convoluted networks. Instead, it makes sense to poll very neuron in the network at a give time t, to see if it activates based on its inputs.
There are of course many different types of recurrent neural network, but I think it is this crucial explicit time step that is the key to recurrent network propagation.
The differential equations part I was wondering about comes in to play if instead of having discrete time steps of t be 0, 1, 2, etc., to try and have smoother, more continuous network flow by modeling the propagation over very small time increments, like 0.2, 0.1, 0.05, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With