Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MFCC feature descriptors for audio classification using librosa

I am trying to obtain single vector feature representations for audio files to use in a machine learning task (specifically, classification using a neural net). I have experience in computer vision and natural language processing, but I need some help getting up to speed with audio files.

There are a variety of feature descriptors for audio files out there, but it seems that MFCCs are used the most for audio classification tasks. My question is this: how do I take the MFCC representation for an audio file, which is usually a matrix (of coefficients, presumably), and turn it into a single feature vector? I am currently using librosa for this.

I have a bunch of audio files, but they all vary in their shape:

for filename in os.listdir('data'):
    y, sr = librosa.load('data/' + filename)
    print filename, librosa.feature.mfcc(y=y, sr=sr).shape

213493.ogg (20, 2375)
120093.ogg (20, 7506)
174576.ogg (20, 2482)
194439.ogg (20, 14)
107936.ogg (20, 2259)

What I would do as a CV person is quantize these coefficients by doing k-means and then use something like scipy.cluster.vq to get vectors of identical shape that I can use as input to my NN. Is this what you would do in the audio case as well, or are there different/better approaches to this problem?

like image 491
Doa Avatar asked Sep 23 '14 06:09

Doa


People also ask

What does Librosa feature MFCC return?

feature. mfcc. If multi-channel audio input y is provided, the MFCC calculation will depend on the peak loudness (in decibels) across all channels.

How extract MFCC feature from audio?

To get the MFCC features, all we need to do is call 'feature. mfcc' of librosa and git it the audio data and corresponding sample rate of the audio signal.

How do I extract features using Librosa?

Spectral featuresCompute a chromagram from a waveform or power spectrogram. Compute a mel-scaled spectrogram. Compute root-mean-square (RMS) value for each frame, either from the audio samples y or from a spectrogram S . Compute the spectral centroid.

What are the MFCC features?

The MFCC feature extraction technique basically includes windowing the signal, applying the DFT, taking the log of the magnitude, and then warping the frequencies on a Mel scale, followed by applying the inverse DCT. The detailed description of various steps involved in the MFCC feature extraction is explained below.


2 Answers

It really depends on the task. I would try kmeans, etc, but there are a lot of cases where that might not be helpful.

There's a few good examples of using dynamic time warping with librosa.

There's also the idea of using a sliding windows of a known shape, might be good too. Then you could consider the previous prediction and a transition probability matrix.

like image 177
Shane Walker Avatar answered Oct 17 '22 00:10

Shane Walker


Check out scikits.talkbox. It has various functions that help you generate MFCC from audio files. Specifically you would wanna do something like this to generate MFCCs.

import numpy as np
import scipy.io.wavfile
from scikits.talkbox.features import mfcc

sample_rate, X = scipy.io.wavfile.read("path/to/audio_file")
ceps, mspec, spec = mfcc(X)
np.save("cache_file_name", ceps) # cache results so that ML becomes fast

Then while doing ML, do something like:

X = []
ceps = np.load("cache_file_name")
num_ceps = len(ceps)
X.append(np.mean(ceps[int(num_ceps / 10):int(num_ceps * 9 / 10)], axis=0))
Vx = np.array(X)
# use Vx as input values vector for neural net, k-means, etc

I used this stuff when I was building an audio genre classification tool ( genreXpose).

PS: One handy tool for audio conversion that I used was PyDub

like image 17
jazdev Avatar answered Oct 17 '22 00:10

jazdev