Python Librosa : What is the default frame size used to compute the MFCC features?

Tags:

Using Librosa library, I generated the MFCC features of audio file 1319 seconds into a matrix 20 X 56829. The 20 here represents the no of MFCC features (Which I can manually adjust it). But I don't know how it segmented the audio length into 56829. What is the frame size it takes process the audio?

import numpy as np
import matplotlib.pyplot as plt
import librosa

def getPathToGroundtruth(episode):
    """Return path to groundtruth file for episode"""
    pathToGroundtruth = "../../../season01/Audio/" \
                        + "Season01.Episode%02d.en.wav" % episode
    return pathToGroundtruth

def getduration(episode):
    pathToAudioFile = getPathToGroundtruth(episode)
    y, sr = librosa.load(pathToAudioFile)
    duration = librosa.get_duration(y=y, sr=sr)
    return duration
def getMFCC(episode):
    filename = getPathToGroundtruth(episode)
    y, sr = librosa.load(filename)  # Y gives 
    data = librosa.feature.mfcc(y=y, sr=sr)
    return data


data = getMFCC(1)

226

asked Jun 22 '16 08:06

Rangooski

1 Answers

Short Answer

You can specify the change the length by changing the parameters used in the stft calculations. The following code will double the size of your output (20 x 113658)

data = librosa.feature.mfcc(y=y, sr=sr, n_fft=1012, hop_length=256, n_mfcc=20)

Long Answer

Librosa's librosa.feature.mfcc() function really just acts as a wrapper to librosa's librosa.feature.melspectrogram() function (which is a wrapper to librosa.core.stft and librosa.filters.mel functions).

All of the parameters pertaining to segementation of the audio signal - namely the frame and overlap values - are specified utilized in the Mel-scaled power spectrogram function (with other tune-able parameters specified for nested core functions). You specify these parameters as keyword arguments in the librosa.feature.mfcc() function.

All extra **kwargs parameters are fed to librosa.feature.melspectrogram() and subsequently to librosa.filters.mel()

By Default, the Mel-scaled power spectrogram window and hop length are the following:

n_fft=2048

hop_length=512

So assuming you used the default sample rate (sr=22050), the output of your mfcc function makes sense:

output length = (seconds) * (sample rate) / (hop_length)

(1319) * (22050) / (512) = 56804 samples

The parameters that you are able to tune, are the following:

Melspectrogram Parameters
-------------------------
y : np.ndarray [shape=(n,)] or None
    audio time-series

sr : number > 0 [scalar]
    sampling rate of `y`

S : np.ndarray [shape=(d, t)]
    power spectrogram

n_fft : int > 0 [scalar]
    length of the FFT window

hop_length : int > 0 [scalar]
    number of samples between successive frames.
    See `librosa.core.stft`

kwargs : additional keyword arguments
  Mel filter bank parameters.
  See `librosa.filters.mel` for details.

If you want to further specify characteristics of the mel filterbank used to define the Mel-scaled power spectrogram, you can tune the following

Mel Frequency Parameters
------------------------
sr        : number > 0 [scalar]
    sampling rate of the incoming signal

n_fft     : int > 0 [scalar]
    number of FFT components

n_mels    : int > 0 [scalar]
    number of Mel bands to generate

fmin      : float >= 0 [scalar]
    lowest frequency (in Hz)

fmax      : float >= 0 [scalar]
    highest frequency (in Hz).
    If `None`, use `fmax = sr / 2.0`

htk       : bool [scalar]
    use HTK formula instead of Slaney

Documentation for Librosa:

librosa.feature.melspectrogram

librosa.filters.mel

librosa.core.stft

answered Oct 06 '22 13:10

Ryan M

Related questions
                            
                                pandas left join where right is null on multiple columns
                            
                                How to call global function from class method
                            
                                get a class name of calling method
                            
                                How does Python know the values already stored in its memory?
                            
                                What is the overhead of an asyncio task? [closed]
                            
                                How to set up a python application with selenium in a docker container
                            
                                How do you create a Protobuf Struct from a Python Dict?
                            
                                How to use annotate=True on Cythonize()
                            
                                How to change image size in plotly dash
                            
                                How to read input() from a text file in Python
                            
                                String concatenation with + vs. f-string
                            
                                Python3 Singleton metaclass method not working
                            
                                NLTK 3 POS_TAG throws UnicodeDecodeError
                            
                                Finding "decent" numbers algorithm reasoning?
                            
                                How to install Rodeo IDE in Anaconda python distribution?
                            
                                Python 3 urlopen context manager mocking
                            
                                Find out if an Python object is callable
                            
                                Can't install zbar
                            
                                BeautifulSoup: Can't convert NavigableString to string
                            
                                Can't install python Polyglot package on Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Librosa : What is the default frame size used to compute the MFCC features?

Tags:

python-3.x

audio

mfcc

Rangooski

People also ask

1 Answers

Ryan M

Recent Activity

Donate For Us