Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify a specific sound on iOS

Tags:

ios

audio

fft

I'd like to be able to recognise a specific sound in an iOS application. I guess it would basically work like speech recognition in that it's fairly fuzzy, but it would only have to be for 1 specific sound.

I've done some quick FFT stuff to identify specific frequencies over a certain threshold and only when they're solo (ie, they're not surrounded by other frequencies) so I can identify individual tones pretty easily. I'm thinking it's just an extension of this, but comparing to an FFT data set of a recording of the sound, and compare say 0.1 second chunks over the length of the audio. And I would also have to account for variation in amplitude, a little in pitch and a little in time.

Can anyone point me to any pre-existing source that I could use to speed this process along? I can't seem to find anything usable. Or failing that, any ideas on how to get started on something like this?

Thanks very much

like image 653
Max Clarke Avatar asked Nov 14 '22 22:11

Max Clarke


1 Answers

From your description it is not entirely clear what you want to do. What is the "specific" sound like? Does it have high background noise? Whats the specific recognizable feature (e.g. pitch, inhamonicity, timbre ...)? Against which other "sounds" do you want to compare it? Do you simply want to match an arbitrary sound spectrum against a "template sound"? Is your sound percussive, melodic, speech, ...? Is it long, short ...? Whats the frequency range you expect the best discriminability? Are the features invariant with time?

There is no "general" solution that works for everything. Speech recognition in itself is fairly complex and wont work well for abstract sounds whose discriminable frequencies are not in the e.g. MEL bands.

So in conclusion, you are leaving too many open questions to get a useful answer. Only suggestion i can make based on the few informations is the following:

For the template sound:
1) Extract spectral peak positions from the power spectrum
2) Measure the standard deviation around the peaks and construct a gaussian from it
3) save the gaussians for later classification

For unkown sounds:
1) Extract spectral peak positions
2) Project those points onto the saved gaussians which leaves you with z-scores of the peak positions
3) With the computed z-scores you should be able to classify your template sound 

Note: This is a very crude method which discriminates sounds according to their most powerful frequencies. Using the gaussians it leaves room for slight shifts in the most powerful frequencies.

like image 82
pokey909 Avatar answered Jan 06 '23 21:01

pokey909