Which tool can I trust?

Tags:

I seem to have to problems determining which tool I can trust...

The tools i've been testing is Librosa and Kaldi in creating dataset for plots visualizations of 40 filterbank energies of an audio file.

The filterbank energies are extracted using these configuration in kaldi.

fbank.conf

--htk-compat=false
--window-type=hamming
--sample-frequency=16000
--num-mel-bins=40
--use-log-fbank=true

The data extracted are plotted using librosa plot. Librosa make use of matplotlib pcolormesh, meaning that there should not be any difference, other than librosa provide an easier API to use.

print static.shape
print type(static)
print np.min(static)
print np.max(static)
fig = plt.figure()
librosa.display.specshow(static.T,sr=16000,x_axis='frames',y_axis='mel',hop_length=160,cmap=cm.jet)
#plt.axis('off')
plt.title("log mel power spectrum of " + name)
plt.colorbar(format='%+02.0f dB')
plt.tight_layout()
plt.savefig(plot+"/"+name+"_plot_static_conv.png")
plt.show()

outputs:

(474, 40)
<type 'numpy.ndarray'>
-1.828067
22.70058
Got bus address:  "unix:abstract=/tmp/dbus-aYbBS1JWyw,guid=17dd413abcda54272e1d93d159174cdf" 
Connected to accessibility bus at:  "unix:abstract=/tmp/dbus-aYbBS1JWyw,guid=17dd413abcda54272e1d93d159174cdf" 
Registered DEC:  true 
Registered event listener change listener:  true

enter image description here

Similar plot created in Librosa as such:

audio_path="../../../../Dropbox/SI1392.wav"
#audio_path = librosa.util.example_audio_file()
print "Example audio found"
y, sr = librosa.load(audio_path)
print "Example audio loaded"
specto = librosa.feature.melspectrogram(y, sr=sr, n_fft=400, hop_length=160, n_mels=40)
print "Example audio spectogram"
log_specto = librosa.core.logamplitude(specto)

print "min and max"
print np.min(log_specto)
print np.max(log_specto)
print "Example audio log specto"

plt.figure(figsize=(12,4))
librosa.display.specshow(log_specto,sr=sr,x_axis='frames', y_axis='mel', hop_length=160,cmap=cm.jet)

plt.title('mel power spectrogram')

plt.colorbar(format='%+02.0f dB')

plt.tight_layout()
print "See"

print specto.shape

print log_specto.shape
plt.show()

outputs this:

libraries loaded!
Example audio found
Example audio loaded
Example audio spectogram
min and max
-84.6796661558
-4.67966615584
Example audio log specto
See
(40, 657)
(40, 657)

enter image description here

Both shows similar plots despite the colors, but the energy ranges seems a bit different.

Kaldi has a min/max of -1.828067/22.70058

And Librosa has a min/max -84.6796661558/-4.67966615584

The problem is I am trying to store these plots as numpy arrays, for further processing.

Which seem to create a different plots.. Using Librosa data, I create the plot as :

plt.figure()
min_max_scaled_log_specto = min_max_scaler.fit_transform(log_specto)
convert = plt.get_cmap(cm.jet)
numpy_static = convert(min_max_scaled_log_specto)
plt.imshow(np.flipud(log_specto), aspect='auto')
plt.colorbar()
print "Sooo?"
plt.show()

enter image description here

Which is perfect... It resembles the original dataset..

But with Kaldi I get this plot from this code:

convert = plt.get_cmap(cm.jet)
numpy_output_static = convert(np.flipud(static.T))
plt.imshow(numpy_output_static,aspect = 'auto')
plt.show()
raw_input("sadas")

enter image description here

I found from a prior post that the reason for the red occuring could be due to the ranges, and a normalization before would help - but this caused this:

min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0,1))
convert = plt.get_cmap(cm.jet)
numpy_output_static = convert(min_max_scaler.fit_transform(np.flipud(static.T)))
plt.imshow(numpy_output_static,aspect = 'auto')
plt.show()

enter image description here

But this can in no way be related to the original plot from the Kaldi plot... So why does it look like this?.. Why am I able plot it with energies extracted from Librosa, but not from Kaldi?

Minimal working example for Librosa:

#
#   Minimal example of Librosa plot example.
#   Made for testing the plot, and test for accurat
#   Conversion between the two parts.
#

import os
import sys
from os import listdir
from os.path import isfile, join
import numpy as np
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import Normalize
import matplotlib
from PIL import Image
import librosa
import colormaps as cmaps
import librosa.display
import ast
from scipy.misc import toimage
from matplotlib import cm
from sklearn import preprocessing

print "libraries loaded!"
min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0,1))

audio_path="../../../../Dropbox/SI1392.wav"
#audio_path = librosa.util.example_audio_file()
print "Example audio found"
y, sr = librosa.load(audio_path)
print "Example audio loaded"
specto = librosa.feature.melspectrogram(y, sr=sr, n_fft=400, hop_length=160, n_mels=40)
print "Example audio spectogram"
log_specto = librosa.core.logamplitude(specto)

print "min and max"
print np.min(log_specto)
print np.max(log_specto)
print "Example audio log specto"

plt.figure(figsize=(12,4))
librosa.display.specshow(log_specto,sr=sr,x_axis='frames', y_axis='mel', hop_length=160,cmap=cm.jet)

plt.title('mel power spectrogram')

plt.colorbar(format='%+02.0f dB')

plt.tight_layout()
print "See"
#plt.show()

print specto.shape

print log_specto.shape

plt.figure()
min_max_scaled_log_specto = min_max_scaler.fit_transform(log_specto)
convert = plt.get_cmap(cm.jet)
numpy_static = convert(min_max_scaled_log_specto)
plt.imshow(np.flipud(log_specto), aspect='auto')
plt.colorbar()
print "Sooo?"
plt.show()

Minimal working example with kaldi - (Real data):

#
#   Extracted version:
#
#
#

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from PIL import Image
import librosa
import librosa.display
from matplotlib import cm
from sklearn import preprocessing
import ast
import urllib
import os
import sys
from os import listdir
from os.path import isfile, join

min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0,1))

def make_plot_store_data(name,interweaved,static,delta,delta_delta,isTrain,isTest,isDev):

    print static.shape
    print type(static)
    print np.min(static)
    print np.max(static)
    fig = plt.figure()

    librosa.display.specshow(static.T,sr=16000,x_axis='frames',y_axis='mel',hop_length=160,cmap=cm.jet)
    #plt.axis('off')
    plt.title("log mel power spectrum of " + name)
    plt.colorbar(format='%+02.0f dB')
    plt.tight_layout()
    #plt.show()
    #plt.close()
    #raw_input("asd")

    if isTrain == True:
        plt.figure()
        convert = plt.get_cmap(cm.jet)
        numpy_output_static = convert(min_max_scaler.fit_transform(np.flipud(static.T)))
        plt.imshow(numpy_output_static,aspect = 'auto')
        plt.show()
        raw_input("sadas")

link = "https://gist.githubusercontent.com/Miail/51311b34f5e5333bbddf9cb17c737ea4/raw/786b72477190023e93b9dd0cbbb43284ab59921b/feature.txt"
f = urllib.urlopen(link)

temp_list = []
for line in f:
    entries = 0
    data_splitted = line.split()
    if len(data_splitted) == 2:
            file_name = data_splitted[0]
    else:
        entries = 1+entries
        if data_splitted[-1] == ']':
            temp_list.extend([ast.literal_eval(i) for i in data_splitted[:-1]])
        else:
            temp_list.extend([ast.literal_eval(i) for i in data_splitted])


dimension = 120
entries = len(temp_list)/dimension
data = np.array(temp_list)
interweaved = data.reshape(entries,dimension)
static =interweaved[:,:-80]
delta =interweaved[:,40:-40]
delta_delta =interweaved[:,80:]
plot_interweaved = data.reshape(entries*3,dimension/3)
print static.shape
print delta.shape
print delta_delta.shape
make_plot_store_data(file_name,plot_interweaved,static,delta,delta_delta,True,False,False)

739

asked May 18 '17 08:05

I am not Fat

1 Answers

I seem to have found the answer in this post. The problem was my normalization. So instead of doing:

numpy_output_static = convert(min_max_scaler.fit_transform(np.flipud(static.T)))

I should have done:

norm_static = matplotlib.colors.Normalize(vmin=static.min(),vmax=static.max())
numpy_output_static = convert(norm_static(np.flipud(static.T)))

157

answered Sep 24 '22 21:09

I am not Fat

Related questions
                            
                                Function is an object of class in python?
                            
                                HP QC REST API using python
                            
                                Mocking time issue in django test: time seems not to be frozen using freezegun
                            
                                How to retrieve the filename of an image with keras flow_from_directory shuffled method?
                            
                                Proper connection string to pass to sqlalchemy create_engine() for mysql AWS RDS
                            
                                Config-Class in Python
                            
                                How to animate matplotlib's drawgreatcircle function?
                            
                                Send/receive data with python socket
                            
                                is there any alternative to sys.getsizeof() in PyPy?
                            
                                How to skip blank lines with read_fwf in pandas?
                            
                                Timestamp roundtrip from Spark Python to Pandas and back
                            
                                Does pickle randomly fail with OSError on large files?
                            
                                pytest won't convert date field to datetime.date object in Django
                            
                                tqdm progressbar and colorama do not work together
                            
                                Why do properties have to be class attributes in Python?
                            
                                how can you tell if github repository is for python 2 or python 3
                            
                                Does sklearn have group lasso?
                            
                                TensorFlow ValueError: Variable does not exist, or was not created with tf.get_variable()
                            
                                Split array into equal sized windows
                            
                                Python/Splinter: How to find and select an option on a site?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which tool can I trust?

Tags:

python

matplotlib

plot

librosa

kaldi

I am not Fat

People also ask

1 Answers

I am not Fat

Recent Activity

Donate For Us