Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speech Recognition for small vocabulary (about 20 words)

I am currently working on a project for my university. The task is to write speech recognition system that is going to run on a phone in background waiting for few commands (like. call 0 123 ...).

It's 2 months project so it does not have to be very accurate. The amount of acceptable noise can be small and words will be separated by moments of silence.

I am currently at point of loading sample word encoded in RAW 16 bit PCM format. Splitting it to chunks (about 50 per second) and running FFT on each chunk in order to get frequency spectrum.

Things to solve are: 1) going through the longer recording and splitting it into words. 2) finding to best match for the word

1) I was thinking about just checking chunk after chunk and if I encounter few chunks that have higher altitudes of human voice frequencies assume that the word has started. Anyway I am looking for resources that may help with this.

2) This one seams a little bit tougher. Is it necessary to use HMM's for system like this or maybe there are simpler methods assuming that the vocabulary is so small ( 20 words )?

Edit: The point of the project is writing the system on my own so I cannot use ready libraries like Sphinx or HTK.

Regards, Karol

like image 649
Karol Czaradzki Avatar asked Oct 20 '22 11:10

Karol Czaradzki


1 Answers

If anybody will have the same question in future. Look for 2 main keywords:

MFCC - Mel-Frequency cepstrum coefficients to calculate series of coefficients for each word template

DTW - To match captured word with templates Good enough description of DTW can be found on wikipedia

This approach was good enough to have around 80% accuracy on 20 words dictionary and give a good demo during the class.

like image 160
Karol Czaradzki Avatar answered Oct 28 '22 00:10

Karol Czaradzki