I am trying to learn HMM GMM implementation and created a simple model to detect some certain sounds (animal calls etc.)
I am trying to train a HMM (Hidden Markov Model) network with GMM (Gaussian Mixtures) in MATLAB.
I have a few questions, I could not be able to find any info about.
1) Should mhmm_em()
function be called in a loop for each HMM-state or it is automatically done?
Such as:
for each state
Initialize GMM’s and get parameters (use mixgauss_init.m)
end
Train HMM with EM (use mhmm_em.m)
2)
[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ...
mhmm_em(MFCCs, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', M);
The last Parameter, should it be the number of Gaussians or a number_of_states-1?
3) If we are looking for Maximum likelihood, then where the Viterbi comes into play?
Say if I want to detect a certain type of animal/human call after training my model with the accoustic feature-vectors that I have extracted, should I still need a Viterbi algorithm in test mode?
It is a little bit confusing me and I would highly appreciate an explanation for this part.
Any comments for the code in terms of HMM GMM logic would also be appreciated.
Thanks
Here is my MATLAB routine;
O = 21; % Number of coefficients in a vector(coefficient)
M = 10; % Number of Gaussian mixtures
Q = 3; % Number of states (left to right)
% MFCC Parameters
Tw = 128; % analysis frame duration (ms)
Ts = 64; % analysis frame shift (ms)
alpha = 0.95; % preemphasis coefficient
R = [ 1 1000 ]; % frequency range to consider
f_bank = 20; % number of filterbank channels
C = 21; % number of cepstral coefficients
L = 22; % cepstral sine lifter parameter(?)
%Training
[speech, fs, nbits ] = wavread('Train.wav');
[MFCCs, FBEs, frames ] = mfcc( speech, fs, Tw, Ts, alpha, hamming, R, f_bank, C, L );
cov_type = 'full'; %the covariance type that is chosen as ҦullҠfor gaussians.
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
[mu0, Sigma0] = mixgauss_init(Q*M, dat, cov_type, 'kmeans');
mu0 = reshape(mu0, [O Q M]);
Sigma0 = reshape(Sigma0, [O O Q M]);
mixmat0 = mk_stochastic(rand(Q,M));
[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ...
mhmm_em(MFCCs, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', M);
%Testing
for i = 1:length(filelist)
fprintf('Processing %s\n', filelist(i).name);
[speech_tst, fs, nbits ] = wavread(filelist(i).name);
[MFCCs, FBEs, frames ] = ...
mfcc( speech_tst, fs, Tw, Ts, alpha, hamming, R, f_bank, C, L);
loglik(i) = mhmm_logprob( MFCCs,prior1, transmat1, mu1, Sigma1, mixmat1);
end;
[Winner, Winner_idx] = max(loglik);
1) No, EM estimates the model as a whole after you initialized it with kmeans. It doesn't estimate states separately.
2) Neither, last parameter in your code is the value of 'max_iter', it is the number of iterations of EM. Usually it's something around 6. It should not be M.
3) Yes, you need Viterbi in test mode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With