<pre class="prettyprint"><code>from librosa.feature import mfcc from librosa.core import load def extract_mfcc(sound): data, frame = load(sound) return mfcc(data, frame) mfcc = extract_mfcc("sound.wav") </code></pre> I would like to get the MFCC of the following sound.wav file which is 48 seconds long. I understand that the <code>data * frame = length of audio.</code> But when I compute the MFCC as shown above and get its shape, this is the result: <code>(20, 2086)</code> What do those numbers represent? How can I calculate the time of the audio just by its MFCC? I'm trying to calculate the average MFCC per ms of audio. Any help is appreciated! Thank you :)

That's because mel-frequency cepstral coefficients are computed over a window, i.e. number of samples. Sound is wave and one cannot derive any features by taking a single sample (number), hence the window. To compute MFCC, fast Fourier transform (FFT) is used and that exactly requires that length of a window is provided. If you check librosa documentation for mfcc you won't find this as an explicit parameter. That's because it's implicit, specifically: <ul> <li>length of the FFT window: 2048</li> <li>number of samples between successive frames: 512</li> </ul> They are passed as <code>**kwargs</code> and defined here. If you now take into account sampling frequency of your audio and these numbers. you will arrive at the final result you have provided. Since the default sampling rate for librosa is 22050, audio length is 48s and window equals 512, here's what follows: <img src="https://i.stack.imgur.com/t8Ioa.gif" alt="Formula"> The number is not exactly <code>2086</code>, as: <ul> <li>Your audio length isn't exacatly 48 seconds</li> <li>The actual window length is 2048, with 512 hop. That means you will "loose" a few frames at the end.</li> </ul>

Understanding the output of mfcc

Tags:

python

artificial-intelligence

audio

feature-extraction

mfcc

Click to copy

from librosa.feature import mfcc
from librosa.core import load

def extract_mfcc(sound):
    data, frame = load(sound)
    return mfcc(data, frame)


mfcc = extract_mfcc("sound.wav")

I would like to get the MFCC of the following sound.wav file which is 48 seconds long.

I understand that the data * frame = length of audio.

But when I compute the MFCC as shown above and get its shape, this is the result: (20, 2086)

What do those numbers represent? How can I calculate the time of the audio just by its MFCC?

I'm trying to calculate the average MFCC per ms of audio.

Any help is appreciated! Thank you :)

736

asked Sep 08 '18 06:09

Eduardo Morales

1 Answers

That's because mel-frequency cepstral coefficients are computed over a window, i.e. number of samples. Sound is wave and one cannot derive any features by taking a single sample (number), hence the window.

To compute MFCC, fast Fourier transform (FFT) is used and that exactly requires that length of a window is provided. If you check librosa documentation for mfcc you won't find this as an explicit parameter. That's because it's implicit, specifically:

length of the FFT window: 2048
number of samples between successive frames: 512

They are passed as **kwargs and defined here.

If you now take into account sampling frequency of your audio and these numbers. you will arrive at the final result you have provided.

Since the default sampling rate for librosa is 22050, audio length is 48s and window equals 512, here's what follows:

Formula

The number is not exactly 2086, as:

Your audio length isn't exacatly 48 seconds
The actual window length is 2048, with 512 hop. That means you will "loose" a few frames at the end.

175

answered Sep 21 '22 23:09

Lukasz Tracewski

Related questions
                            
                                Trainable sklearn StandardScaler for R
                            
                                ImportError: No module named 'tensorflow.python' with tensorflow-gpu
                            
                                Check if multiple substrings are in pandas dataframe [duplicate]
                            
                                Cannot batch tensors with different shapes in component 0 with tf.data.Dataset
                            
                                python side_effect - mocking behavior of a method
                            
                                is it possible to implement dynamic class weights in keras?
                            
                                Boto3 - Print AWS Instance Average CPU Utilization
                            
                                Save 1 bit deep binary image in Python
                            
                                Correct handling of AttributeError in __getattr__ when using property
                            
                                Does python's csv.reader read the entire file into memory?
                            
                                Streaming in / chunking csv's from S3 to Python
                            
                                Getting decision path to a node in sklearn
                            
                                Optional positional arguments with Python's argparse
                            
                                pytest does not raise HTTPError using mock.patch
                            
                                How to round away from 0 in Python 3.x?
                            
                                Using non-linear scale with Seaborn heatmap
                            
                                TwilioRestClient removed [duplicate]
                            
                                Set color for missing values in folium choropleth
                            
                                How to ignore specific elements being added to Python list
                            
                                matplotlib: How to return a matplotlib object then plot as subplot?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With