Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get transition and emission matrix from multiple sequence for HMM in MATLAB?

I am doing a sequence classifying task in MATLAB using HMM. I have 13 sequences and their corresponding classes. As far I understood hmmestimate() returns the transition and emission matrix for one sequence and its class. But I need the final transition and emission matrix calculated from all these 13 sequences. How can I do it ?

like image 454
Nazifa Khan Avatar asked Nov 09 '22 18:11

Nazifa Khan


1 Answers

What you should do...

A sincere, totally snark-free suggestion is to write a couple for loops to tally all the transitions and state-emission pairs that are present in the sequences, then normalize the rows in the two resulting matrices (transition and emission) so that they add to 1. This is what hmmestimate is doing in the end, and this is probably how you should do it.

That said, let's go ahead and force the square peg into the round hole anyway...

and, what you could do

If you concatenated your sequences together, then that could be run through hmmestimate. This would give the correct emissions matrix, but the transitions between adjacent sequences will mess with the transition probabilities. A trick around this is to augment each sequence with a new unique state and corresponding emission. By doing so, all the info about concatenations will be relegated to a subset of the output matrix that you can discard.

Example

Let's generate some data, so the input is clear.

% true transitions and emission probabilities
tr = [0.9 0.1; 0.05 0.95];
em = [0.9 0.1; 0.2 0.8];

num_seqs = 100;
seq_len = 100;

seqs = zeros(num_seqs,seq_len);
states = zeros(num_seqs,seq_len);

% generate some sequences
for i = 1:num_seqs
    [seqs(i,:), states(i,:)] = hmmgenerate(seq_len,tr,em);
end

Using hmmestimate to estimate

Note that, MATLAB represents its states as consecutive integers, so we need to use the next integer for our token delimiter state. In the example case, we use '3'.

% augment the sequences
seqs_aug = [3*ones(num_seqs,1) seqs];
states_aug = [3*ones(num_seqs,1) states];

% concatenate the rows, and estimate
% credit: http://stackoverflow.com/a/2731032/570918
[tr_aug,em_aug] = hmmestimate(reshape(seqs_aug.',1,[]),reshape(states_aug.',1,[]));

% subset the good parts
tr_hat = tr_aug(1:2,1:2);
em_hat = em_aug(1:2,1:2);

% renormalize
tr_hat = tr_hat./sum(tr_hat,2);
% NB: em_hat is already normalized

Using rng(1) before generating the data above, this gives

tr_hat % [0.9008 0.0992; 0.0490 0.9510]
em_hat % [0.9090 0.0910; 0.1950 0.8050]
like image 116
merv Avatar answered Nov 15 '22 09:11

merv