Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract human sound from a wav file using java

I am working on a project where I have to extract the human sound from a audio .wav file using java.

The audio .wav file may have 3 to 4 sounds like dog, cat, music and human. I will have to identify the human sound then exatract that part from the audio .wav file.

I am using FFT.java and Complex.java.

Now I have written an AudioFileReader class which reads the audio.wav file from the hard-drive and then convert this to bytes array. Then used the above mentioned FFT.java and Complex.java to apply FFT.fft(bytesArray), which gives me Complex array in return;

Now the problem is how to extract the human sound byte pattern from the returned Complex array... does anyone know how I might be able to achieve this?


Edit: We are assuming a very simple audio.wav file. For example, cat sound then silence, human sound then silence, dog sound then silence etc. No mixture of voices.
like image 848
Muhammad Ijaz Avatar asked Mar 24 '11 07:03

Muhammad Ijaz


2 Answers

I think the standard way to handle problems like this are to convert the input signals into a Cepstrum or Mel-Cepstrum representation and then use the coefficients for the feature space for input into a classifier. There are many research papers that discuss solutions to these sorts of problems based on this basic approach, for example:

http://www.ics.forth.gr/netlab/data/J17.pdf

One possible shortcut you might try would be to put the input signals through a low bit-rate vocoder such as AMBE, then decode, and compare the quality of the original signal to the encoded/decoded signal. These vocoders are designed to highly compress human speech with fair to good quality at the expense of not being able to adequately represent non-speech sounds.

like image 146
bdk Avatar answered Nov 07 '22 23:11

bdk


This can be achieved by AI (and little short of that). You might investigate APIs for speech recognition, but I doubt their ability to support signals with noise in the background.

E.G.

  • Is that a cat, or someone saying 'meow'?
  • Is that music, or someone singing 'do, re, mi..'?
  • Who said 'Polly wanna cracker', the human or the parrot?
like image 27
Andrew Thompson Avatar answered Nov 07 '22 23:11

Andrew Thompson