I am working on speech recognition using neural network. To do so I need to get the spectrograms of those training audio files (.wav) . How to get those spectrograms in python ?
There are numerous ways to do so. The easiest is to check out the methods proposed in Kernels on Kaggle competition TensorFlow Speech Recognition Challenge (just sort by most voted). This one is particularly clear and simple and contains the following function. The input is a numeric vector of samples extracted from the wav file, the sample rate, the size of the frame in milliseconds, the step (stride or skip) size in milliseconds and a small offset.
from scipy.io import wavfile
from scipy import signal
import numpy as np
sample_rate, audio = wavfile.read(path_to_wav_file)
def log_specgram(audio, sample_rate, window_size=20,
step_size=10, eps=1e-10):
nperseg = int(round(window_size * sample_rate / 1e3))
noverlap = int(round(step_size * sample_rate / 1e3))
freqs, times, spec = signal.spectrogram(audio,
fs=sample_rate,
window='hann',
nperseg=nperseg,
noverlap=noverlap,
detrend=False)
return freqs, times, np.log(spec.T.astype(np.float32) + eps)
Outputs are defined in the SciPy manual, with an exception that the spectrogram is rescaled with a monotonic function (Log()), which depresses larger values much more than smaller values, while leaving the larger values still larger than the smaller values. This way no extreme value in spec will dominate the computation. Alternatively, one can cap the values at some quantile, but log (or even square root) are preferred. There are many other ways to normalize the heights of the spectrogram, i.e. to prevent extreme values from "bullying" the output :)
freq (f) : ndarray, Array of sample frequencies.
times (t) : ndarray, Array of segment times.
spec (Sxx) : ndarray, Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.
Alternatively, you can check the train.py and models.py code on github repo from the Tensorflow example on audio recognition.
Here is another thread that explains and gives code on building spectrograms in Python.
Scipy serve this purpose.
import scipy
# Read the .wav file
sample_rate, data = scipy.io.wavfile.read('directory_path/file_name.wav')
# Spectrogram of .wav file
sample_freq, segment_time, spec_data = signal.spectrogram(data, sample_rate)
# Note sample_rate and sampling frequency values are same but theoretically they are different measures
Use matplot library to visualize the spectrogram
import matplotlib.pyplot as plt
plt.pcolormesh(segment_time, sample_freq, spec_data )
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With