I am trying to create a spectrogram from a .wav file in python3.
I want the final saved image to look similar to this image:
I have tried the following:
This stack overflow post: Spectrogram of a wave file
This post worked, somewhat. After running it, I got
However, This graph does not contain the colors that I need. I need a spectrogram that has colors. I tried to tinker with this code to try and add the colors however after spending significant time and effort on this, I couldn't figure it out!
I then tried this tutorial.
This code crashed(on line 17) when I tried to run it with the error TypeError: 'numpy.float64' object cannot be interpreted as an integer.
line 17:
samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig)
I tried to fix it by casting
samples = int(np.append(np.zeros(np.floor(frameSize/2.0)), sig))
and I also tried
samples = np.append(np.zeros(int(np.floor(frameSize/2.0)), sig))
However neither of these worked in the end.
I would really like to know how to convert my .wav files to spectrograms with color so that I can analyze them! Any help would be appreciated!!!!!
Please tell me if you want me to provide any more information about my version of python, what I tried, or what I want to achieve.
The wave module in Python's standard library is an easy interface to the audio WAV format. The functions in this module can write audio data in raw format to a file like object and read the attributes of a WAV file.
Use scipy.signal.spectrogram
.
import matplotlib.pyplot as plt from scipy import signal from scipy.io import wavfile sample_rate, samples = wavfile.read('path-to-mono-audio-file.wav') frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate) plt.pcolormesh(times, frequencies, spectrogram) plt.imshow(spectrogram) plt.ylabel('Frequency [Hz]') plt.xlabel('Time [sec]') plt.show()
Be sure that your wav file is mono (single channel) and not stereo (dual channel) before trying to do this. I highly recommend reading the scipy documentation at https://docs.scipy.org/doc/scipy- 0.19.0/reference/generated/scipy.signal.spectrogram.html.
Putting plt.pcolormesh
before plt.imshow
seems to fix some issues, as pointed out by @Davidjb, and if unpacking error occurs, follow the steps by @cgnorthcutt below.
I have fixed the errors you are facing for http://www.frank-zalkow.de/en/code-snippets/create-audio-spectrograms-with-python.html
This implementation is better because you can change the binsize
(e.g. binsize=2**8
)
import numpy as np from matplotlib import pyplot as plt import scipy.io.wavfile as wav from numpy.lib import stride_tricks """ short time fourier transform of audio signal """ def stft(sig, frameSize, overlapFac=0.5, window=np.hanning): win = window(frameSize) hopSize = int(frameSize - np.floor(overlapFac * frameSize)) # zeros at beginning (thus center of 1st window should be for sample nr. 0) samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig) # cols for windowing cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1 # zeros at end (thus samples can be fully covered by frames) samples = np.append(samples, np.zeros(frameSize)) frames = stride_tricks.as_strided(samples, shape=(int(cols), frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy() frames *= win return np.fft.rfft(frames) """ scale frequency axis logarithmically """ def logscale_spec(spec, sr=44100, factor=20.): timebins, freqbins = np.shape(spec) scale = np.linspace(0, 1, freqbins) ** factor scale *= (freqbins-1)/max(scale) scale = np.unique(np.round(scale)) # create spectrogram with new freq bins newspec = np.complex128(np.zeros([timebins, len(scale)])) for i in range(0, len(scale)): if i == len(scale)-1: newspec[:,i] = np.sum(spec[:,int(scale[i]):], axis=1) else: newspec[:,i] = np.sum(spec[:,int(scale[i]):int(scale[i+1])], axis=1) # list center freq of bins allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1]) freqs = [] for i in range(0, len(scale)): if i == len(scale)-1: freqs += [np.mean(allfreqs[int(scale[i]):])] else: freqs += [np.mean(allfreqs[int(scale[i]):int(scale[i+1])])] return newspec, freqs """ plot spectrogram""" def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"): samplerate, samples = wav.read(audiopath) s = stft(samples, binsize) sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate) ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel timebins, freqbins = np.shape(ims) print("timebins: ", timebins) print("freqbins: ", freqbins) plt.figure(figsize=(15, 7.5)) plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none") plt.colorbar() plt.xlabel("time (s)") plt.ylabel("frequency (hz)") plt.xlim([0, timebins-1]) plt.ylim([0, freqbins]) xlocs = np.float32(np.linspace(0, timebins-1, 5)) plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate]) ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10))) plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs]) if plotpath: plt.savefig(plotpath, bbox_inches="tight") else: plt.show() plt.clf() return ims ims = plotstft(filepath)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With