SHORT AND SIMPLE: Hi all very simply... I just want to know the steps that are involved to get an MFCC from an FFT.
DETAILED:
Hi all. I am working on a drum application where I want to classify sounds. Its just a matching application, it returns the name of the note that you play on the drum.
Its a simple indian loud big drum. There are only a few notes on there that one can play.
I've implemented the fft algorithm and successfully obtain a spectrum. I now want to take it one step further and return the mfcc from the fft.
This is what i understand so far. its based on linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
it uses triangulation to filter out the frequencies and get a desired coefficient. http://instruct1.cit.cornell.edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png
so if you have around 1000 values returned from the fft algorithm - the spectrum of the sound, then desirably you'll get around 12 elements (i.e., coefficients). This 12-element vector is used to classify the instrument, including the drum played...
this is exactly what i want.
Could someone please help me on how to do something like this? my programming skills are alright. Im currently creating an application for the iphone. with openframeworks.
Any help would be greatly appreciated. Cheers
2.4 Fast Fourier Transform-It is a signal analysis technique which is used to extract and compress some features of the speech signal without losing any relevant information so that speech processing becomes easier. It represents the given signal in a frequency domain.
The FFT algorithm is used to convert a digital signal (x) with length (N) from the time domain into a signal in the frequency domain (X), since the amplitude of vibration is recorded on the basis of its evolution versus the frequency at that the signal appears [40].
DCT is the last step of the main process of MFCC feature extraction. The basic concept of DCT is correlating value of mel spectrum so as to produce a good representation of property spectral local. Basically the concept of DCT is the same as inverse fourier transform.
FFT analysis is one of the most used techniques when performing signal analysis across several application domains. FFT transforms signals from the time domain to the frequency domain.
First, you have to split the signal in small frames with 10 to 30ms, apply a windowing function (humming is recommended for sound applications), and compute the fourier transform of the signal. With DFT, to compute Mel Frequecy Cepstral Coefficients you have to follow these steps:
A python code example:
import numpy
from scipy.fftpack import dct
from scipy.io import wavfile
sampleRate, signal = wavfile.read("file.wav")
numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000
complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2) # MFCC :)
def melFilterBank(blockSize):
numBands = int(numCoefficients)
maxMel = int(freqToMel(maxHz))
minMel = int(freqToMel(minHz))
# Create a matrix for triangular filters, one row per filter
filterMatrix = numpy.zeros((numBands, blockSize))
melRange = numpy.array(xrange(numBands + 2))
melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel
# each array index represent the center of each triangular filter
aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0
aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050
aux = 0.5 + 700 * blockSize * aux
aux = numpy.floor(aux) # Arredonda pra baixo
centerIndex = numpy.array(aux, int) # Get int values
for i in xrange(numBands):
start, centre, end = centerIndex[i:i + 3]
k1 = numpy.float32(centre - start)
k2 = numpy.float32(end - centre)
up = (numpy.array(xrange(start, centre)) - start) / k1
down = (end - numpy.array(xrange(centre, end))) / k2
filterMatrix[i][start:centre] = up
filterMatrix[i][centre:end] = down
return filterMatrix.transpose()
def freqToMel(freq):
return 1127.01048 * math.log(1 + freq / 700.0)
def melToFreq(mel):
return 700 * (math.exp(mel / 1127.01048) - 1)
This code is based on MFCC Vamp example. I hope this help you!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With