I'm developing a little tool which is able to classify musical genres. To do this, I would like to use a K-nn algorithm (or another one, but this one seems to be good enough) and I'm using python-yaafe
for the feature extraction.
My problem is that, when I extract a feature from my song (example: mfcc), as my songs are 44100Hz-sampled, I retrieve a lot (number of sample windows) of 12-values-array, and I really don't know how to deal with that. Is there an approach to get just one representative value per feature and per song?
One approach would be to take the least RMS energy value of the signal as a parameter for classification.
You should use a music segment, rather than using the whole music file for classification.Theoretically, the part of the music of 30 sec, starting after the first 30 secs of the music, is best representative for genre classification.
So instead of taking the whole array, what you can do is to consider the part which corresponds to this time window, 30sec-59sec. Calculate the RMS energy of the signal separately for every music file, averaged over the whole time. You may also take other features into account, eg. , MFCC. In order to use MFCC, you may go for the averaged value of all signal windows for a particular music file. Make a feature vector out of it.
You may use the difference between the features as the distance between the data points for classification.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With