I'm not entirely sure this is the correct stack exchange subsite to post this question to, but...
I'm looking for an algorithm that I can use to determine with a decent amount of certainty if a given piece of audio is music or not. Just a boolean result is fine, I don't need to know the key, bpm or anything like that, I just need to be able to determine if it appears to be music (as opposed to speech). Programming language is irrelevant, but I'll end up converting it to Python.
In a phrase, Fourier analysis. Look at the power of different frequencies over time. Here's speech, and here's violin playing. The former shows dramatic changes with every syllable; the 'flow' is very disjoint and could be picked up by an algorithm which took the derivative of the different frequency bands as a function of time. In paradigmatic music, on the other hand, the transitions are much smoother and the tones are purer (less 'blur' in the graph). See also the 'spectrogram' wikipedia page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With