I am having a tough time in figuring out how to use Kevin Murphy's HMM toolbox Toolbox. It would be a great help if anyone who has an experience with it could clarify some conceptual questions. I have somehow understood the theory behind HMM but it's confusing how to actually implement it and mention all the parameter setting. There are 2 classes so we need 2 HMMs. Let say the training vectors are :class1 O1={ 4 3 5 1 2} and class O_2={ 1 4 3 2 4}. Now,the system has to classify an unknown sequence O3={1 3 2 4 4} as either class1 or class2. <ol> <li>What is going to go in obsmat0 and obsmat1?</li> <li>How to specify/syntax for the transition probability transmat0 and transmat1?</li> <li>what is the variable data going to be in this case?</li> <li>Would number of states Q=5 since there are five unique numbers/symbols used?</li> <li>Number of output symbols=5 ?</li> <li>How do I mention the transition probabilities transmat0 and transmat1?</li> </ol>

Instead of answering each individual question, let me illustrate how to use the HMM toolbox with an example -- the weather example which is usually used when introducing hidden markov models. Basically the states of the model are the three possible types of weather: sunny, rainy and foggy. At any given day, we assume the weather can be only one of these values. Thus the set of HMM states are: <pre class="prettyprint"><code>S = {sunny, rainy, foggy} </code></pre> However in this example, we can't observe the weather directly (apparently we are locked in the basement!). Instead the only evidence we have is whether the person who checks on you every day is carrying an umbrella or not. In HMM terminology, these are the discrete observations: <pre class="prettyprint"><code>x = {umbrella, no umbrella} </code></pre> The HMM model is characterized by three things: <ul> <li>The prior probabilities: vector of probabilities of being in the first state of a sequence.</li> <li>The transition prob: matrix describing the probabilities of going from one state of weather to another.</li> <li>The emission prob: matrix describing the probabilities of observing an output (umbrella or not) given a state (weather).</li> </ul> Next we are either given the these probabilities, or we have to learn them from a training set. Once that's done, we can do reasoning like computing likelihood of an observation sequence with respect to an HMM model (or a bunch of models, and pick the most likely one)... <h3>1) known model parameters</h3> Here is a sample code that shows how to fill existing probabilities to build the model: <pre class="prettyprint"><code>Q = 3; %# number of states (sun,rain,fog) O = 2; %# number of discrete observations (umbrella, no umbrella) %# prior probabilities prior = [1 0 0]; %# state transition matrix (1: sun, 2: rain, 3:fog) A = [0.8 0.05 0.15; 0.2 0.6 0.2; 0.2 0.3 0.5]; %# observation emission matrix (1: umbrella, 2: no umbrella) B = [0.1 0.9; 0.8 0.2; 0.3 0.7]; </code></pre> Then we can sample a bunch of sequences from this model: <pre class="prettyprint"><code>num = 20; %# 20 sequences T = 10; %# each of length 10 (days) [seqs,states] = dhmm_sample(prior, A, B, num, T); </code></pre> for example, the 5th example was: <pre class="prettyprint"><code>>> seqs(5,:) %# observation sequence ans = 2 2 1 2 1 1 1 2 2 2 >> states(5,:) %# hidden states sequence ans = 1 1 1 3 2 2 2 1 1 1 </code></pre> we can evaluate the log-likelihood of the sequence: <pre class="prettyprint"><code>dhmm_logprob(seqs(5,:), prior, A, B) dhmm_logprob_path(prior, A, B, states(5,:)) </code></pre> or compute the Viterbi path (most probable state sequence): <pre class="prettyprint"><code>vPath = viterbi_path(prior, A, multinomial_prob(seqs(5,:),B)) </code></pre> <img src="https://i.stack.imgur.com/dkrDg.png" alt="5th_example"> <h3>2) unknown model parameters</h3> Training is performed using the EM algorithm, and is best done with a set of observation sequences. Continuing on the same example, we can use the generated data above to train a new model and compare it to the original: <pre class="prettyprint"><code>%# we start with a randomly initialized model prior_hat = normalise(rand(Q,1)); A_hat = mk_stochastic(rand(Q,Q)); B_hat = mk_stochastic(rand(Q,O)); %# learn from data by performing many iterations of EM [LL,prior_hat,A_hat,B_hat] = dhmm_em(seqs, prior_hat,A_hat,B_hat, 'max_iter',50); %# plot learning curve plot(LL), xlabel('iterations'), ylabel('log likelihood'), grid on </code></pre> <img src="https://i.stack.imgur.com/RBDcs.png" alt="log_likelihood"> Keep in mind that the states order don't have to match. That's why we need to permute the states before comparing the two models. In this example, the trained model looks close to the original one: <pre class="prettyprint"><code>>> p = [2 3 1]; %# states permutation >> prior, prior_hat(p) prior = 1 0 0 ans = 0.97401 7.5499e-005 0.02591 >> A, A_hat(p,p) A = 0.8 0.05 0.15 0.2 0.6 0.2 0.2 0.3 0.5 ans = 0.75967 0.05898 0.18135 0.037482 0.77118 0.19134 0.22003 0.53381 0.24616 >> B, B_hat(p,[1 2]) B = 0.1 0.9 0.8 0.2 0.3 0.7 ans = 0.11237 0.88763 0.72839 0.27161 0.25889 0.74111 </code></pre> <hr> There are more things you can do with hidden markov models such as classification or pattern recognition. You would have different sets of obervation sequences belonging to different classes. You start by training a model for each set. Then given a new observation sequence, you could classify it by computing its likelihood with respect to each model, and predict the model with the highest log-likelihood. <pre class="prettyprint"><code>argmax[ log P(X|model_i) ] over all model_i </code></pre>

Issue in training hidden markov model and usage for classification

Tags:

machine-learning

matlab

computer-vision

hidden-markov-models

I am having a tough time in figuring out how to use Kevin Murphy's HMM toolbox Toolbox. It would be a great help if anyone who has an experience with it could clarify some conceptual questions. I have somehow understood the theory behind HMM but it's confusing how to actually implement it and mention all the parameter setting.

There are 2 classes so we need 2 HMMs.
Let say the training vectors are :class1 O1={ 4 3 5 1 2} and class O_2={ 1 4 3 2 4}.
Now,the system has to classify an unknown sequence O3={1 3 2 4 4} as either class1 or class2.

What is going to go in obsmat0 and obsmat1?
How to specify/syntax for the transition probability transmat0 and transmat1?
what is the variable data going to be in this case?
Would number of states Q=5 since there are five unique numbers/symbols used?
Number of output symbols=5 ?
How do I mention the transition probabilities transmat0 and transmat1?

943

asked Mar 16 '12 06:03

George Roy

2 Answers

Instead of answering each individual question, let me illustrate how to use the HMM toolbox with an example -- the weather example which is usually used when introducing hidden markov models.

Basically the states of the model are the three possible types of weather: sunny, rainy and foggy. At any given day, we assume the weather can be only one of these values. Thus the set of HMM states are:

S = {sunny, rainy, foggy}

However in this example, we can't observe the weather directly (apparently we are locked in the basement!). Instead the only evidence we have is whether the person who checks on you every day is carrying an umbrella or not. In HMM terminology, these are the discrete observations:

x = {umbrella, no umbrella}

The HMM model is characterized by three things:

The prior probabilities: vector of probabilities of being in the first state of a sequence.
The transition prob: matrix describing the probabilities of going from one state of weather to another.
The emission prob: matrix describing the probabilities of observing an output (umbrella or not) given a state (weather).

Next we are either given the these probabilities, or we have to learn them from a training set. Once that's done, we can do reasoning like computing likelihood of an observation sequence with respect to an HMM model (or a bunch of models, and pick the most likely one)...

1) known model parameters

Here is a sample code that shows how to fill existing probabilities to build the model:

Q = 3;    %# number of states (sun,rain,fog)
O = 2;    %# number of discrete observations (umbrella, no umbrella)

%#  prior probabilities
prior = [1 0 0];

%# state transition matrix (1: sun, 2: rain, 3:fog)
A = [0.8 0.05 0.15; 0.2 0.6 0.2; 0.2 0.3 0.5];

%# observation emission matrix (1: umbrella, 2: no umbrella)
B = [0.1 0.9; 0.8 0.2; 0.3 0.7];

Then we can sample a bunch of sequences from this model:

num = 20;           %# 20 sequences
T = 10;             %# each of length 10 (days)
[seqs,states] = dhmm_sample(prior, A, B, num, T);

for example, the 5th example was:

>> seqs(5,:)        %# observation sequence
ans =
     2     2     1     2     1     1     1     2     2     2

>> states(5,:)      %# hidden states sequence
ans =
     1     1     1     3     2     2     2     1     1     1

we can evaluate the log-likelihood of the sequence:

dhmm_logprob(seqs(5,:), prior, A, B)

dhmm_logprob_path(prior, A, B, states(5,:))

or compute the Viterbi path (most probable state sequence):

vPath = viterbi_path(prior, A, multinomial_prob(seqs(5,:),B))

5th_example

2) unknown model parameters

Training is performed using the EM algorithm, and is best done with a set of observation sequences.

Continuing on the same example, we can use the generated data above to train a new model and compare it to the original:

%# we start with a randomly initialized model
prior_hat = normalise(rand(Q,1));
A_hat = mk_stochastic(rand(Q,Q));
B_hat = mk_stochastic(rand(Q,O));  

%# learn from data by performing many iterations of EM
[LL,prior_hat,A_hat,B_hat] = dhmm_em(seqs, prior_hat,A_hat,B_hat, 'max_iter',50);

%# plot learning curve
plot(LL), xlabel('iterations'), ylabel('log likelihood'), grid on

log_likelihood

Keep in mind that the states order don't have to match. That's why we need to permute the states before comparing the two models. In this example, the trained model looks close to the original one:

>> p = [2 3 1];              %# states permutation

>> prior, prior_hat(p)
prior =
     1     0     0
ans =
      0.97401
  7.5499e-005
      0.02591

>> A, A_hat(p,p)
A =
          0.8         0.05         0.15
          0.2          0.6          0.2
          0.2          0.3          0.5
ans =
      0.75967      0.05898      0.18135
     0.037482      0.77118      0.19134
      0.22003      0.53381      0.24616

>> B, B_hat(p,[1 2])
B =
          0.1          0.9
          0.8          0.2
          0.3          0.7
ans =
      0.11237      0.88763
      0.72839      0.27161
      0.25889      0.74111

There are more things you can do with hidden markov models such as classification or pattern recognition. You would have different sets of obervation sequences belonging to different classes. You start by training a model for each set. Then given a new observation sequence, you could classify it by computing its likelihood with respect to each model, and predict the model with the highest log-likelihood.

argmax[ log P(X|model_i) ] over all model_i

176

answered Nov 15 '22 23:11

Amro

I do not use the toolbox that you mention, but I do use HTK. There is a book that describes the function of HTK very clearly, available for free

http://htk.eng.cam.ac.uk/docs/docs.shtml

The introductory chapters might help you understanding.

I can have a quick attempt at answering #4 on your list. . . The number of emitting states is linked to the length and complexity of your feature vectors. However, it certainly does not have to equal the length of the array of feature vectors, as each emitting state can have a transition probability of going back into itself or even back to a previous state depending on the architecture. I'm also not sure if the value that you give includes the non-emitting states at the start and the end of the hmm, but these need to be considered also. Choosing the number of states often comes down to trial and error.

Good luck!

answered Nov 16 '22 00:11

learnvst

Related questions
                            
                                Is there any code or algorithm for signature recognition?
                            
                                How to penalize False Negatives more than False Positives
                            
                                multilayer_perceptron : ConvergenceWarning: Stochastic Optimizer: Maximum iterations reached and the optimization hasn't converged yet.Warning?
                            
                                Deep learning for image classification [closed]
                            
                                Why is Random Forest with a single tree much better than a Decision Tree classifier?
                            
                                Implementing dropout from scratch
                            
                                What does the value of 'leaf' in the following xgboost model tree diagram means?
                            
                                Why do we maximize variance during Principal Component Analysis?
                            
                                Proper way to feed time-series data to stateful LSTM?
                            
                                R: ggplot display all dates on x axis
                            
                                Difference between OpenAI Gym environments 'CartPole-v0' and 'CartPole-v1'
                            
                                how to split a dataset into training and validation set keeping ratio between classes?
                            
                                How to explore a decision tree built using scikit learn
                            
                                TensorFlow TypeError: Value passed to parameter input has DataType uint8 not in list of allowed values: float16, float32
                            
                                Keras + TensorFlow Realtime training chart
                            
                                Neural networks for email spam detection
                            
                                Cross Validation in Keras
                            
                                Naive Bayes vs. SVM for classifying text data
                            
                                ValueError: x and y must be the same size
                            
                                conversion of pandas dataframe to h2o frame efficiently

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With