Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why should we use RNNs instead of Markov models?

Recently I stumbled across this article, and I was wondering what the difference between the results you would get from a recurrent neural net, like the ones described above, and a simple Markov chain would be.

I don't really understand the linear algebra happening under the hood in an RNN, but it seems that you are basically just designing a super convoluted way of making a statistical model for what the next letter is going to be based on the previous letters, something that is done very simply in a Markov Chain.

Why are RNNs interesting? Is it just because they are a more generalizable solution, or is there something happening that I am missing?

like image 991
Justin Sanders Avatar asked Jul 27 '17 05:07

Justin Sanders


People also ask

Why do we need RNNs?

The logic behind an RNN is to save the output of the particular layer and feed it back to the input in order to predict the output. RNNs can be used to create a deep learning model that can translate a text from the source language into the target language without human intervention.

When would you use an RNN model?

RNN's are mainly used for, Sequence Classification — Sentiment Classification & Video Classification. Sequence Labelling — Part of speech tagging & Named entity recognition. Sequence Generation — Machine translation & Transliteration.

Why are Lstms better than RNNs for sequences?

LSTM networks are a type of RNN that uses special units in addition to standard units. LSTM units include a 'memory cell' that can maintain information in memory for long periods of time. This memory cell lets them learn longer-term dependencies.

What is the main difference between RNNs and Lstms?

The main difference between an LSTM unit and a standard RNN unit is that the LSTM unit is more sophisticated. More precisely, it is composed of the so-called gates that supposedly regulate better the flow of information through the unit.


1 Answers

The Markov chain assumes the Markov property, it's "memoryless". The probability of the next symbol is calculated based on the k previous symbols. In practice k is limited to low values (let's say 3-5), because the transition matrix grows exponentially. Therefore sentences generated by a Hidden Markov Model are very inconsistent.

On the other hand, RNNs (e.g. with LSTM units) are not bound by the Markov property. Their rich internal state allows them to keep track of long-distant dependencies.

Karpathy's blog post lists C-sourcecode generated by an RNN character by character. The model impressively captures the dependencies of things like opening and closing brackets.

like image 51
vodov Avatar answered Sep 20 '22 07:09

vodov