Recently I stumbled across this article, and I was wondering what the difference between the results you would get from a recurrent neural net, like the ones described above, and a simple Markov chain would be.
I don't really understand the linear algebra happening under the hood in an RNN, but it seems that you are basically just designing a super convoluted way of making a statistical model for what the next letter is going to be based on the previous letters, something that is done very simply in a Markov Chain.
Why are RNNs interesting? Is it just because they are a more generalizable solution, or is there something happening that I am missing?
The logic behind an RNN is to save the output of the particular layer and feed it back to the input in order to predict the output. RNNs can be used to create a deep learning model that can translate a text from the source language into the target language without human intervention.
RNN's are mainly used for, Sequence Classification — Sentiment Classification & Video Classification. Sequence Labelling — Part of speech tagging & Named entity recognition. Sequence Generation — Machine translation & Transliteration.
LSTM networks are a type of RNN that uses special units in addition to standard units. LSTM units include a 'memory cell' that can maintain information in memory for long periods of time. This memory cell lets them learn longer-term dependencies.
The main difference between an LSTM unit and a standard RNN unit is that the LSTM unit is more sophisticated. More precisely, it is composed of the so-called gates that supposedly regulate better the flow of information through the unit.
The Markov chain assumes the Markov property, it's "memoryless". The probability of the next symbol is calculated based on the k previous symbols. In practice k is limited to low values (let's say 3-5), because the transition matrix grows exponentially. Therefore sentences generated by a Hidden Markov Model are very inconsistent.
On the other hand, RNNs (e.g. with LSTM units) are not bound by the Markov property. Their rich internal state allows them to keep track of long-distant dependencies.
Karpathy's blog post lists C-sourcecode generated by an RNN character by character. The model impressively captures the dependencies of things like opening and closing brackets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With