Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between mel-spectrogram and an MFCC

I'm using the librosa library to convert music segments into mel-spectrograms to use as inputs for my neural network, as shown in the docs here.

How is this different from MFCCs, if at all? Are there any advantages or disadvantages to using either?

like image 782
monadoboi Avatar asked Dec 25 '18 20:12

monadoboi


People also ask

What is the difference between spectrogram and mel spectrogram?

The linear audio spectrogram is ideally suited for applications where all frequencies have equal importance, while mel spectrograms are better suited for applications that need to model human hearing perception. Mel spectrogram data is also suited for use in audio classification applications.

Why do we use the Mel-Frequency Cepstral Coefficients MFCCs as features for audio data?

Since, Mel-frequency bands are distributed evenly in MFCC and they are much similar to the voice system of a human, thus, MFCC can efficiently be used to characterize speakers, for instance, it can be used to recognize the speaker's cell phone model details and further the details of the speaker.

What are the MFCC features?

The MFCC feature extraction technique basically includes windowing the signal, applying the DFT, taking the log of the magnitude, and then warping the frequencies on a Mel scale, followed by applying the inverse DCT. The detailed description of various steps involved in the MFCC feature extraction is explained below.


2 Answers

To get MFCC, compute the DCT on the mel-spectrogram. The mel-spectrogram is often log-scaled before.

MFCC is a very compressible representation, often using just 20 or 13 coefficients instead of 32-64 bands in Mel spectrogram. The MFCC is a bit more decorrelarated, which can be beneficial with linear models like Gaussian Mixture Models. With lots of data and strong classifiers like Convolutional Neural Networks, mel-spectrogram can often perform better.

like image 157
Jon Nordby Avatar answered Sep 19 '22 11:09

Jon Nordby


I suppose, jonnor's answer is not exactly correct. There are two steps:
1. Take logs of Mel spectrogram.
2. Compute DCT on logs.
Moreover, taking logs seems to be "the main part" for training NN: https://qr.ae/TWtPLD

like image 30
Mikhail Akulov Avatar answered Sep 21 '22 11:09

Mikhail Akulov