plotting spectrogram in audio analysis

Question

I am working on speech recognition using neural network. To do so I need to get the spectrograms of those training audio files (.wav) . How to get those spectrograms in python ?

Oleg Melnikov · Accepted Answer

There are numerous ways to do so. The easiest is to check out the methods proposed in Kernels on Kaggle competition TensorFlow Speech Recognition Challenge (just sort by most voted). This one is particularly clear and simple and contains the following function. The input is a numeric vector of samples extracted from the wav file, the sample rate, the size of the frame in milliseconds, the step (stride or skip) size in milliseconds and a small offset.

from scipy.io import wavfile
from scipy import signal
import numpy as np

sample_rate, audio = wavfile.read(path_to_wav_file)

def log_specgram(audio, sample_rate, window_size=20,
                 step_size=10, eps=1e-10):
    nperseg = int(round(window_size * sample_rate / 1e3))
    noverlap = int(round(step_size * sample_rate / 1e3))
    freqs, times, spec = signal.spectrogram(audio,
                                    fs=sample_rate,
                                    window='hann',
                                    nperseg=nperseg,
                                    noverlap=noverlap,
                                    detrend=False)
    return freqs, times, np.log(spec.T.astype(np.float32) + eps)

Outputs are defined in the SciPy manual, with an exception that the spectrogram is rescaled with a monotonic function (Log()), which depresses larger values much more than smaller values, while leaving the larger values still larger than the smaller values. This way no extreme value in spec will dominate the computation. Alternatively, one can cap the values at some quantile, but log (or even square root) are preferred. There are many other ways to normalize the heights of the spectrogram, i.e. to prevent extreme values from "bullying" the output :)

freq (f) : ndarray, Array of sample frequencies.
times (t) : ndarray, Array of segment times.
spec (Sxx) : ndarray, Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

Alternatively, you can check the train.py and models.py code on github repo from the Tensorflow example on audio recognition.

Here is another thread that explains and gives code on building spectrograms in Python.

Saranraj Nambusubramaniyan · Answer

Scipy serve this purpose.

import scipy
# Read the .wav file
sample_rate, data = scipy.io.wavfile.read('directory_path/file_name.wav')

# Spectrogram of .wav file
sample_freq, segment_time, spec_data = signal.spectrogram(data, sample_rate)  
# Note sample_rate and sampling frequency values are same but theoretically they are different measures

Use matplot library to visualize the spectrogram

import matplotlib.pyplot as plt
plt.pcolormesh(segment_time, sample_freq, spec_data )
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()

plotting spectrogram in audio analysis

Tags:

python

neural-network

tensorflow

audio

spectrogram

kks

2 Answers

Oleg Melnikov

Saranraj Nambusubramaniyan

Recent Activity

Donate For Us

plotting spectrogram in audio analysis

Tags:

python

neural-network

tensorflow

audio

spectrogram

kks

2 Answers

Oleg Melnikov

Saranraj Nambusubramaniyan

Related questions

Recent Activity

Donate For Us