I understand the basic steps of creating an automated speech recognition engine. However, I need a clear-er idea of how segmentation is done and what are frames and samples. I will write down what I know and expect the answer-er to correct me in the places where I'm wrong and guide me further.
The basic steps of Speech Recognition as I know it are:
(I'm assuming the input data is a wav/ogg (or some kind of audio) file)
Although these are clear to me, I am confused if step 3 is correct. If It is correct, In the steps following 3, do I apply that to each frame? Also, after step 6, I think that each frame has their own set of MFCC, am I right?
Thank you in advance!
The MFCC feature extraction technique basically includes windowing the signal, applying the DFT, taking the log of the magnitude, and then warping the frequencies on a Mel scale, followed by applying the inverse DCT.
Framing is the process of dividing the speech signal into small frames typically in the range of 5 to 50 milliseconds. The next step windowing is the process to window each frame to reduce discontinuities and leakage at start and end of each frame [1]. MFCC features are calculated for each frame.
The output after applying MFCC is a matrix having feature vectors extracted from all the frames. In this output matrix the rows represent the corresponding frame numbers and columns represent corresponding feature vector coefficients [1-4]. Finally this output matrix is used for classification process.
Segment the clip into smaller time frames, each segment being like 30msecs long. Further, Each segment will have about 256 Frames and two segments will have a seperation of 100 Frames? (i.e., 30*100/256 msec ?)
Not frames, but samples. Each frame of 30ms at 8khz sample rate is 30/1000 * 8000 = 240 samples. Frames are overlapped and shift between frames is 10ms or 80 samples. Here how it looks on the picture:
Here Q is 80 and K is 240 samples.
If it is correct, in the steps following 3, do I apply that to each frame?
Yes
Also, after step 6, I think that each frame has their own set of MFCC, am I right.
Yes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With