Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I compare two voice samples on iOS?

First of all I'd like to state that my question is not per say about the "classic" definition of voice recognition.

What we are trying to do is somewhat different, in the sense of:

  1. User records his command
  2. Later, when the user will speak pre-recorded command, a certain action will occur.

For example, I record a voice command for calling my mom, so I click on her and say "Mom". Then when I use the program and say "Mom", it will automatically call her.

How would I perform the comparison of a spoken command to a saved voice sample?

EDIT: We have no need for any "text-to-speech" abilities, solely a comparison of sound signals. Obviously we're looking for some sort of a off-the-shelf product or framework.

like image 550
Ron Rejwan Avatar asked Apr 05 '11 16:04

Ron Rejwan


1 Answers

One way this is done for music recognition is to take a time sequence of frequency spectrums (time windowed STFT FFTs) for the two sounds in question, map the locations of the frequency peaks over the time axis, and cross-correlate the two 2D time-frequency peak mappings for a match. This is far more robust than just cross-correlating the 2 sound samples, as the peaks change far less than all the spectral "cruft" between the spectral peaks. This method will work better if the rate of the two utterances and their pitch haven't changed too much.

In iOS 4.x, you can use the Accelerate framework for the FFTs and maybe the 2D cross correlations as well.

like image 57
hotpaw2 Avatar answered Oct 02 '22 04:10

hotpaw2