Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting where vocals start in a song?

what would be the best way to detect where the vocals in a song start? I just need the start time for vocals. Extreme precision is not necessary. Speed is more important.

Any clues to papers or algorithms (if such exist) are greatly appreciated.. Also looking for recommendations on which framework / language fits best for this.

like image 416
Stpn Avatar asked May 31 '12 02:05

Stpn


People also ask

Can I hum a song to Google?

Play a song or hum, whistle, or sing the melody of a song. Play a song: Google will identify the song. Hum, whistle, or sing: Google will identify potential matches for the song. Select one of them to view the Search results page and listen to the song, read lyrics, or view the music video.

Do vocals have to be in the same key as the song?

If a song is in a certain key, can you add an instrument or vocal to the song / track as long as they are in the same scale of that key. So for example, if a song is in key D major, you can add any instrument/vocal as long as the key is in the D major scale, so E minor, B minor, G Major, A Major, etc, etc.


1 Answers

* SPOILER: ANSWER IS NOT BELOW *

Since I plan to do something similar to this, I did a little research on my own on the subject, and found out that there are some exact numeric techniques that MIGHT be able to do that.

I'll list the references, and let you as the reader decide if that's a right way to go. It all has to do with vocal audio feature extraction, and finding there vocal features ARE in audio data.

You can start here, but it really doesn't lead anywhere, but could be useful to see what are you into :)

http://en.wikipedia.org/wiki/Voice_activity_detection

Then, some articles about speaker recognition:

Here, there is a primer that you need to know about mel frequency cepstral coefficients (MFCC) feature extraction.

http://www.speaker-recognition.org/navAlg.html

Then, for example, this:

http://www.iccce.co.in/Papers/ICCCECE358.pdf

I know that none of them lead directly to the solution to your problem, but at least you'll be able to grasp the size of the monster that you'll be dealing with.

EDIT: frameworks

I use c# for something related to that, and at first I used roll-my-own fft algorithm, then moved to ILNumerics library that uses Intel math library, and later on replaced all that with fftw.

http://ilnumerics.net/ (hm, it was free at a time)

http://software.intel.com/en-us/articles/intel-mkl/ Intel Math Kernel

http://www.fftw.org/ (a simple web page, but BRUTAL performance)

EDIT: new fft engine

Since I was porting some of my code to android, I had a great working experience with a man that did something thought impossible - FFT library that is even faster than FFTW: FFTS. My understanding of his magic is limited, but he uses codelets for various processor architectures and outperforms every library there is.

like image 149
Daniel Mošmondor Avatar answered Oct 12 '22 01:10

Daniel Mošmondor