Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which tool can I trust?

I seem to have to problems determining which tool I can trust...

The tools i've been testing is Librosa and Kaldi in creating dataset for plots visualizations of 40 filterbank energies of an audio file.

The filterbank energies are extracted using these configuration in kaldi.

fbank.conf

--htk-compat=false
--window-type=hamming
--sample-frequency=16000
--num-mel-bins=40
--use-log-fbank=true

The data extracted are plotted using librosa plot. Librosa make use of matplotlib pcolormesh, meaning that there should not be any difference, other than librosa provide an easier API to use.

print static.shape
print type(static)
print np.min(static)
print np.max(static)
fig = plt.figure()
librosa.display.specshow(static.T,sr=16000,x_axis='frames',y_axis='mel',hop_length=160,cmap=cm.jet)
#plt.axis('off')
plt.title("log mel power spectrum of " + name)
plt.colorbar(format='%+02.0f dB')
plt.tight_layout()
plt.savefig(plot+"/"+name+"_plot_static_conv.png")
plt.show()

outputs:

(474, 40)
<type 'numpy.ndarray'>
-1.828067
22.70058
Got bus address:  "unix:abstract=/tmp/dbus-aYbBS1JWyw,guid=17dd413abcda54272e1d93d159174cdf" 
Connected to accessibility bus at:  "unix:abstract=/tmp/dbus-aYbBS1JWyw,guid=17dd413abcda54272e1d93d159174cdf" 
Registered DEC:  true 
Registered event listener change listener:  true 

enter image description here

Similar plot created in Librosa as such:

audio_path="../../../../Dropbox/SI1392.wav"
#audio_path = librosa.util.example_audio_file()
print "Example audio found"
y, sr = librosa.load(audio_path)
print "Example audio loaded"
specto = librosa.feature.melspectrogram(y, sr=sr, n_fft=400, hop_length=160, n_mels=40)
print "Example audio spectogram"
log_specto = librosa.core.logamplitude(specto)

print "min and max"
print np.min(log_specto)
print np.max(log_specto)
print "Example audio log specto"

plt.figure(figsize=(12,4))
librosa.display.specshow(log_specto,sr=sr,x_axis='frames', y_axis='mel', hop_length=160,cmap=cm.jet)

plt.title('mel power spectrogram')

plt.colorbar(format='%+02.0f dB')

plt.tight_layout()
print "See"

print specto.shape

print log_specto.shape
plt.show()

outputs this:

libraries loaded!
Example audio found
Example audio loaded
Example audio spectogram
min and max
-84.6796661558
-4.67966615584
Example audio log specto
See
(40, 657)
(40, 657)

enter image description here

Both shows similar plots despite the colors, but the energy ranges seems a bit different.

Kaldi has a min/max of -1.828067/22.70058

And Librosa has a min/max -84.6796661558/-4.67966615584

The problem is I am trying to store these plots as numpy arrays, for further processing.

Which seem to create a different plots.. Using Librosa data, I create the plot as :

plt.figure()
min_max_scaled_log_specto = min_max_scaler.fit_transform(log_specto)
convert = plt.get_cmap(cm.jet)
numpy_static = convert(min_max_scaled_log_specto)
plt.imshow(np.flipud(log_specto), aspect='auto')
plt.colorbar()
print "Sooo?"
plt.show()

enter image description here

Which is perfect... It resembles the original dataset..

But with Kaldi I get this plot from this code:

convert = plt.get_cmap(cm.jet)
numpy_output_static = convert(np.flipud(static.T))
plt.imshow(numpy_output_static,aspect = 'auto')
plt.show()
raw_input("sadas")

enter image description here

I found from a prior post that the reason for the red occuring could be due to the ranges, and a normalization before would help - but this caused this:

min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0,1))
convert = plt.get_cmap(cm.jet)
numpy_output_static = convert(min_max_scaler.fit_transform(np.flipud(static.T)))
plt.imshow(numpy_output_static,aspect = 'auto')
plt.show()

enter image description here

But this can in no way be related to the original plot from the Kaldi plot... So why does it look like this?.. Why am I able plot it with energies extracted from Librosa, but not from Kaldi?

Minimal working example for Librosa:

#
#   Minimal example of Librosa plot example.
#   Made for testing the plot, and test for accurat
#   Conversion between the two parts.
#

import os
import sys
from os import listdir
from os.path import isfile, join
import numpy as np
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import Normalize
import matplotlib
from PIL import Image
import librosa
import colormaps as cmaps
import librosa.display
import ast
from scipy.misc import toimage
from matplotlib import cm
from sklearn import preprocessing

print "libraries loaded!"
min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0,1))

audio_path="../../../../Dropbox/SI1392.wav"
#audio_path = librosa.util.example_audio_file()
print "Example audio found"
y, sr = librosa.load(audio_path)
print "Example audio loaded"
specto = librosa.feature.melspectrogram(y, sr=sr, n_fft=400, hop_length=160, n_mels=40)
print "Example audio spectogram"
log_specto = librosa.core.logamplitude(specto)

print "min and max"
print np.min(log_specto)
print np.max(log_specto)
print "Example audio log specto"

plt.figure(figsize=(12,4))
librosa.display.specshow(log_specto,sr=sr,x_axis='frames', y_axis='mel', hop_length=160,cmap=cm.jet)

plt.title('mel power spectrogram')

plt.colorbar(format='%+02.0f dB')

plt.tight_layout()
print "See"
#plt.show()

print specto.shape

print log_specto.shape

plt.figure()
min_max_scaled_log_specto = min_max_scaler.fit_transform(log_specto)
convert = plt.get_cmap(cm.jet)
numpy_static = convert(min_max_scaled_log_specto)
plt.imshow(np.flipud(log_specto), aspect='auto')
plt.colorbar()
print "Sooo?"
plt.show()

Minimal working example with kaldi - (Real data):

#
#   Extracted version:
#
#
#

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from PIL import Image
import librosa
import librosa.display
from matplotlib import cm
from sklearn import preprocessing
import ast
import urllib
import os
import sys
from os import listdir
from os.path import isfile, join

min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0,1))

def make_plot_store_data(name,interweaved,static,delta,delta_delta,isTrain,isTest,isDev):

    print static.shape
    print type(static)
    print np.min(static)
    print np.max(static)
    fig = plt.figure()

    librosa.display.specshow(static.T,sr=16000,x_axis='frames',y_axis='mel',hop_length=160,cmap=cm.jet)
    #plt.axis('off')
    plt.title("log mel power spectrum of " + name)
    plt.colorbar(format='%+02.0f dB')
    plt.tight_layout()
    #plt.show()
    #plt.close()
    #raw_input("asd")

    if isTrain == True:
        plt.figure()
        convert = plt.get_cmap(cm.jet)
        numpy_output_static = convert(min_max_scaler.fit_transform(np.flipud(static.T)))
        plt.imshow(numpy_output_static,aspect = 'auto')
        plt.show()
        raw_input("sadas")

link = "https://gist.githubusercontent.com/Miail/51311b34f5e5333bbddf9cb17c737ea4/raw/786b72477190023e93b9dd0cbbb43284ab59921b/feature.txt"
f = urllib.urlopen(link)

temp_list = []
for line in f:
    entries = 0
    data_splitted = line.split()
    if len(data_splitted) == 2:
            file_name = data_splitted[0]
    else:
        entries = 1+entries
        if data_splitted[-1] == ']':
            temp_list.extend([ast.literal_eval(i) for i in data_splitted[:-1]])
        else:
            temp_list.extend([ast.literal_eval(i) for i in data_splitted])


dimension = 120
entries = len(temp_list)/dimension
data = np.array(temp_list)
interweaved = data.reshape(entries,dimension)
static =interweaved[:,:-80]
delta =interweaved[:,40:-40]
delta_delta =interweaved[:,80:]
plot_interweaved = data.reshape(entries*3,dimension/3)
print static.shape
print delta.shape
print delta_delta.shape
make_plot_store_data(file_name,plot_interweaved,static,delta,delta_delta,True,False,False)
like image 739
I am not Fat Avatar asked May 18 '17 08:05

I am not Fat


People also ask

Why is trust important?

Trust means that you rely on someone else to do the right thing. You believe in the person's integrity and strength, to the extent that you're able to put yourself on the line, at some risk to yourself. Trust is essential to an effective team, because it provides a sense of safety.

Why trust is important in business?

Without trust, transactions cannot occur, influence is destroyed, leaders can lose teams and salespeople can lose sales. The list goes on. Trust and relationships, much more than money, are the currency of business. Trust is the natural result of thousands of tiny actions, words, thoughts, and intentions.


1 Answers

I seem to have found the answer in this post. The problem was my normalization. So instead of doing:

numpy_output_static = convert(min_max_scaler.fit_transform(np.flipud(static.T)))

I should have done:

norm_static = matplotlib.colors.Normalize(vmin=static.min(),vmax=static.max())
numpy_output_static = convert(norm_static(np.flipud(static.T)))
like image 157
I am not Fat Avatar answered Sep 24 '22 21:09

I am not Fat