Why do MFCC extraction libs return different values?

Question

I am extracting the MFCC features using two different libraries:

The python_speech_features lib
The BOB lib

However the output of the two is different and even the shapes are not the same. Is that normal? or is there a parameter that I am missing?

The relevant section of my code is the following:

import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank

def bob_extract_features(audio, rate):
    #get MFCC
    rate              = 8000  # rate
    win_length_ms     = 30    # The window length of the cepstral analysis in milliseconds
    win_shift_ms      = 10    # The window shift of the cepstral analysis in milliseconds
    n_filters         = 26    # The number of filter bands
    n_ceps            = 13    # The number of cepstral coefficients
    f_min             = 0.    # The minimal frequency of the filter bank
    f_max             = 4000. # The maximal frequency of the filter bank
    delta_win         = 2     # The integer delta value used for computing the first and second order derivatives
    pre_emphasis_coef = 0.97  # The coefficient used for the pre-emphasis
    dct_norm          = True  # A factor by which the cepstral coefficients are multiplied
    mel_scale         = True  # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale

    c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
                    f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
    c.with_delta       = False
    c.with_delta_delta = False
    c.with_energy      = False

    signal = np.cast['float'](audio)           # vector should be in **float**
    example_mfcc = c(signal)                   # mfcc + mfcc' + mfcc''
    return  example_mfcc


def psf_extract_features(audio, rate):
    signal = np.cast['float'](audio) #vector should be in **float**
    mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
                        nfilt = 26, nfft = 512,appendEnergy = False)

    #mfcc_feature = preprocessing.scale(mfcc_feature)
    deltas       = delta(mfcc_feature, 2)
    fbank_feat   = logfbank(audio, rate)
    combined     = np.hstack((mfcc_feature, deltas))
    return mfcc_feature



track = 'test-sample.wav'
rate, audio = read(track)

features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)

print("--------------------------------------------")
t = (features1 == features2)
print(t)

Nikolay Shmyrev · Accepted Answer

However the output of the two is different and even the shapes are not the same. Is that normal?

Yes, there are different varieties of the algorithm and each implementation choose its own flavor

or is there a parameter that I am missing?

It is not just about parameters, there are algorithmic differences too like window shape (hamming vs hanning), shape of mel filters, starts of mel filters, normalization of mel filters, liftering, dct flavor and so on and so forth.

If you want same results just use the single library for extraction, it is pretty hopeless to sync them.

motjuste · Answer

Have you tried comparing the two with some tolerance? I believe the two MFCCs are arrays of floating point numbers, and testing for exact equality might not be wise. Try using numpy.testing.assert_allclose with some tolerance, and decide if the tolerance is good enough.

Nevertheless, I missed you saying that even the shapes mismatch, and I am not experienced with bob.ap to comment on that confidently. However, there's often the case that some libraries pad the input with zeros either in the beginning or the end of the input array for windowing reasons, and that may be responsible if one of these is doing it differently.

Why do MFCC extraction libs return different values?

Tags:

python

speech

voice

voice-recognition

mfcc

SuperKogito

2 Answers

Nikolay Shmyrev

motjuste

Recent Activity

Donate For Us

Why do MFCC extraction libs return different values?

Tags:

python

speech

voice

voice-recognition

mfcc

SuperKogito

2 Answers

Nikolay Shmyrev

motjuste

Related questions

Recent Activity

Donate For Us