Hidden Markov Model: Is it possible that the accuracy decreases as the number of states increases?

Tags:

I constructed a couple of Hidden Markov Models using the Baum-Welch algorithm for an increasing number of states. I noticed that after 8 states, the validation score goes down for more than 8 states. So I wondered whether it's possible that the accuracy of an Hidden Markov Model can decrease with an increasing number of states, due to some kind of overfitting?

Thanks in advance!

814

asked Jul 07 '15 12:07

Thomas Pattyn

1 Answers

For the sake of clarity, I propose here a very simplified illustration of the phenomenon.

Say you train your HMM with the data sequence (A-B-A-B). Say you use a 2-state HMM. Naturally, state 1 will optimize itself to represent A and state 2 will represent B (or the opposite). Then, you have a new sequence (A-B-A-B). You want to know the likelihood this sequence has with respect to your HMM. A Viterbi algorithm will find that the most probable state sequence is (1-2-1-2), and the Baum-Welch algorithm will give this sequence a high likelihood as the sequence of states and the "values" of the new sequence (if working with continuous data) clearly match your training sequence.

Say now that you train a 3-state HMM with the same training sequence (A-B-A-B). The initial clustering of your data will most probably either assign the 2 first states of the HMM for the representation of symbol A, and the last one to symbol B (or the opposite once again).

So now, the query sequence (A-B-A-B) can be represented as the state sequence (1-3-1-3) or (2-3-2-3) or (1-3-2-3) or (2-3-1-3) ! This means that for this 3-state HMM, two identical sequences (A-B-A-B) can have low similarity for the HMM. That is the reason why for any HMM and any data set, beyond a certain number of states, performance will decrease.

You can estimate the optimal number of states using criteria such as the Bayesian Information Criterion, the Akaike Information Criterion, the Minimum Message Length Criterion, or if you just want to get a blur idea, a k-means clustering combined with the percentage of variance explained. The 3 first criteria are interesting because they include a penalty term that goes with the number of parameters of the model.

Hope it helps! :)

126

answered Oct 10 '22 15:10

Eskapp

Related questions
                            
                                How to generate a paper-like background with OpenCV
                            
                                How to merge multiple sequential models in Keras Python?
                            
                                How to get the filename of a sample from a DataLoader?
                            
                                CNN Pytorch Error : Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.FloatTensor) should be the same
                            
                                How do you edit an existing Tensorboard Training Loss summary?
                            
                                Gensim LDA Coherence Score Nan
                            
                                What are some economically important applications of machine learning?
                            
                                How to implement fact related to false positive vs. false negative balance in neural network?
                            
                                Mallet topic modelling
                            
                                Implementing proximity matrix for clustering
                            
                                Is it OK if the false positive rate in a ROC curve does not end in 1.0?
                            
                                Approaches to age estimation with opencv
                            
                                How to do multiclass classification properly with NLTK?
                            
                                text categorization classifiers
                            
                                How to efficiently serialize a scikit-learn classifier
                            
                                difference between classification and detection
                            
                                How to get scikit learn to find simple non-linear relationship
                            
                                How is scikit-learn GridSearchCV best_score_ calculated?
                            
                                Exporting a Scikit Learn Random Forest for use on Hadoop Platform
                            
                                Elasticsearch: Learning from clicks (Search result ranking)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hidden Markov Model: Is it possible that the accuracy decreases as the number of states increases?

Tags:

machine-learning

unsupervised-learning

markov-models

hidden-markov-models

markov

Thomas Pattyn

People also ask

1 Answers

Eskapp

Recent Activity

Donate For Us