Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate MFCC Algorithm's triangular windows and how to use them?

I am implementing MFCC algorithm in Java.

There is a sample code here: http://www.ee.columbia.edu/~dpwe/muscontent/practical/mfcc.m at Matlab. However I have some problems with mel filter banking process. How to generate triangular windows and how to use them?

PS1: An article which has a part that describes MFCC: http://arxiv.org/pdf/1003.4083

PS2: If there is a document about MFCC algorithms steps basically, it will be good.

PS3: My main question is related to that: MFCC with Java Linear and Logarithmic Filters some implementations use both linear and logarithmic filter and some of them not. What is that filters and what is the center frequent concept. I follow that code:MFCC Java , what is the difference of it between that code: MFCC Matlab

like image 773
kamaci Avatar asked May 20 '11 20:05

kamaci


2 Answers

Triangular windows as frequency band filters aren't hard to implement. You basically want to integrate the FFT data within each band (defined as the frequency space between center frequency i-1 and center frequency i+1).

You're basically looking for something like,

for(int bandIdx = 0; bandIdx < numBands; bandIdx++) {
    int startFreqIdx  = centerFreqs[bandIdx-1];
    int centerFreqIdx = centerFreqs[bandIdx];
    int stopFreqIdx   = centerFreqs[bandIdx+1];

    for(int freq = startFreqIdx; i < centerFreqIdx; i++) {
        magnitudeScale = centerFreqIdx-startFreqIdx;
        bandData[bandIdx] += fftData[freq]*(i-startFreqIdx)/magnitudeScale;
    }

    for(int freq = centerFreqIdx; i <= stopFreqIdx; i++) {
        magnitudeScale = centerFreqIdx-stopFreqIdx;
        bandData[bandIdx] += fftData[freq]*(i-stopFreqIdx)/magnitudeScale;
    }
}

If you do not understand the concept of a "center frequency" or a "band" or a "filter," pick up an elementary signals textbook--you shouldn't be implementing this algorithm without understanding what it does.

As for what the exact center frequencies are, it's up to you. Experiment and pick (or find in publications) values that capture the information you want to isolate from the data. The reason that there are no definitive values, or even scale for values, is because this algorithm tries to approximate a human ear, which is a very complicated listening device. Whereas one scale may work better for, say, speech, another may work better for music, etc. It's up to you to choose what is appropriate.

like image 100
Thomas Minor Avatar answered Oct 06 '22 11:10

Thomas Minor


Answer for the second PS: I found this tutorial that really helped me computing the MFCCs.

As for the triangular windows and the filterbanks, from what I understood, they do overlap, they do not extend to negative frequences and the whole process of computing them from the FFT spectrum and applying them back to it goes something like this:

  1. Choose a minimum and a maximum frequency for the filters (for example, min freq = 300Hz - the minimum voice frequency and max frequency = your sample rate / 2. Maybe this is where you should choose the 1000Hz limit you were talking about)
  2. Compute the mel values from the min and max chosen frequences. Formula here.
  3. Compute N equally distanced values between these two mel values. (I've seen examples of different values for N, you can even find a efficiency comparison for different of values in this work, for my tests I've picked 26)
  4. Convert these values back to Hz. (you can find the formula on the same wiki page) => array of N + 2 filter values
  5. Compute a filterbank (filter triangle) for each three consecutive values, either how Thomas suggested above (being careful with the indexes) or like in the turorial recommended at the beginning of this post) => an array of arrays, size NxM, asuming your FFT returned 2*M values and you only use M.
  6. Pass the whole power spectrum (M values obtained from FFT) through each triangular filter to get a "filterbank energy" for each filter (for each filterbank (N loop), multiply each magnitude obtained after FFT to each value in the corresponding filterbank (M loop) and add the M obtained values) => N-sized array of energies.

These are your filterbank energies that you can further apply a log to, apply the DCT and extract the MFCCs...

like image 36
Ioanna Avatar answered Oct 06 '22 11:10

Ioanna