I am implementing MFCC algorithm in Java.
There is a sample code here: http://www.ee.columbia.edu/~dpwe/muscontent/practical/mfcc.m at Matlab. However I have some problems with mel filter banking process. How to generate triangular windows and how to use them?
PS1: An article which has a part that describes MFCC: http://arxiv.org/pdf/1003.4083
PS2: If there is a document about MFCC algorithms steps basically, it will be good.
PS3: My main question is related to that: MFCC with Java Linear and Logarithmic Filters some implementations use both linear and logarithmic filter and some of them not. What is that filters and what is the center frequent concept. I follow that code:MFCC Java , what is the difference of it between that code: MFCC Matlab
Triangular windows as frequency band filters aren't hard to implement. You basically want to integrate the FFT data within each band (defined as the frequency space between center frequency i-1
and center frequency i+1
).
You're basically looking for something like,
for(int bandIdx = 0; bandIdx < numBands; bandIdx++) {
int startFreqIdx = centerFreqs[bandIdx-1];
int centerFreqIdx = centerFreqs[bandIdx];
int stopFreqIdx = centerFreqs[bandIdx+1];
for(int freq = startFreqIdx; i < centerFreqIdx; i++) {
magnitudeScale = centerFreqIdx-startFreqIdx;
bandData[bandIdx] += fftData[freq]*(i-startFreqIdx)/magnitudeScale;
}
for(int freq = centerFreqIdx; i <= stopFreqIdx; i++) {
magnitudeScale = centerFreqIdx-stopFreqIdx;
bandData[bandIdx] += fftData[freq]*(i-stopFreqIdx)/magnitudeScale;
}
}
If you do not understand the concept of a "center frequency" or a "band" or a "filter," pick up an elementary signals textbook--you shouldn't be implementing this algorithm without understanding what it does.
As for what the exact center frequencies are, it's up to you. Experiment and pick (or find in publications) values that capture the information you want to isolate from the data. The reason that there are no definitive values, or even scale for values, is because this algorithm tries to approximate a human ear, which is a very complicated listening device. Whereas one scale may work better for, say, speech, another may work better for music, etc. It's up to you to choose what is appropriate.
Answer for the second PS: I found this tutorial that really helped me computing the MFCCs.
As for the triangular windows and the filterbanks, from what I understood, they do overlap, they do not extend to negative frequences and the whole process of computing them from the FFT spectrum and applying them back to it goes something like this:
These are your filterbank energies that you can further apply a log to, apply the DCT and extract the MFCCs...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With