I've often read, that there are fundamental differences between feed-forward and recurrent neural networks (RNNs), due to the lack of an internal state and a therefore short-term memory in feed-forward networks. This seemed plausible to me at first sight.
However when learning a recurrent neural network with the Backpropagation through time algorithm recurrent networks are transformed into equivalent feed forward networks, if I understand correctly.
This would imply, that there is in fact no fundamental difference. Why do RNNs perform better in certain tasks (image recognition, time-series prediction, ...) than deep feed forward networks?
The fact that training is done using some trick, does not change the fact, that there is a fundamental difference in the preservation of the network state, which is absent in the feed-forward network.
The "unrolled" feed forward network is not equivalent to the recurrent network. It is only a markov approximation (to the level given by the number of "unrolled" levels). So you just "simulate" the recurrent network with k step memory, while the actual recurrent neural network has (in theory) unlimited memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With