Why should we use RNNs instead of Markov models?

Tags:

Recently I stumbled across this article, and I was wondering what the difference between the results you would get from a recurrent neural net, like the ones described above, and a simple Markov chain would be.

I don't really understand the linear algebra happening under the hood in an RNN, but it seems that you are basically just designing a super convoluted way of making a statistical model for what the next letter is going to be based on the previous letters, something that is done very simply in a Markov Chain.

Why are RNNs interesting? Is it just because they are a more generalizable solution, or is there something happening that I am missing?

991

asked Jul 27 '17 05:07

Justin Sanders

1 Answers

The Markov chain assumes the Markov property, it's "memoryless". The probability of the next symbol is calculated based on the k previous symbols. In practice k is limited to low values (let's say 3-5), because the transition matrix grows exponentially. Therefore sentences generated by a Hidden Markov Model are very inconsistent.

On the other hand, RNNs (e.g. with LSTM units) are not bound by the Markov property. Their rich internal state allows them to keep track of long-distant dependencies.

Karpathy's blog post lists C-sourcecode generated by an RNN character by character. The model impressively captures the dependencies of things like opening and closing brackets.

answered Sep 20 '22 07:09

vodov

Related questions
                            
                                How to perform multi-label learning with LSTM using theano?
                            
                                A reusable Tensorflow convolutional Network
                            
                                Forecasting time series data with PyBrain Neural Networks
                            
                                How does the distorted_inputs() function in the TensorFlow CIFAR-10 example tutorial get 128 images per batch?
                            
                                Max over time pooling in Keras
                            
                                Can one only implement gradient descent like optimizers with the code example from processing gradients in TensorFlow?
                            
                                Getting a neural network to output anything inbetween -1.0 and 1.0
                            
                                How to compute gradient of output wrt input in Tensorflow 2.0
                            
                                In PyTorch's "MaxPool2D", is padding added depending on "ceil_mode"?
                            
                                Neural network backpropagation algorithm not working in Python
                            
                                Keras: reshape to connect lstm and conv
                            
                                What is "batch normalizaiton"? why using it? how does it affect prediction?
                            
                                Multi scale CNN Network Python Keras
                            
                                Keras/TF: Time Distributed CNN+LSTM for visual recognition
                            
                                Cannot import multi_gpu_model from keras.utils
                            
                                Using Keras, How can I load weights generated from CuDNNLSTM into LSTM Model?
                            
                                Wasserstein loss can be negative?
                            
                                What are C classes for a NLLLoss loss function in Pytorch?
                            
                                Behavior of Dropout layers in test / training phase
                            
                                Can I use `tf.nn.dropout` to implement DropConnect?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why should we use RNNs instead of Markov models?

Tags:

artificial-intelligence

neural-network

recurrent-neural-network

markov-chains

Justin Sanders

People also ask

1 Answers

vodov

Recent Activity

Donate For Us