Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a .wav file to a spectrogram in python3

I am trying to create a spectrogram from a .wav file in python3.

I want the final saved image to look similar to this image:

I have tried the following:

This stack overflow post: Spectrogram of a wave file

This post worked, somewhat. After running it, I got

However, This graph does not contain the colors that I need. I need a spectrogram that has colors. I tried to tinker with this code to try and add the colors however after spending significant time and effort on this, I couldn't figure it out!

I then tried this tutorial.

This code crashed(on line 17) when I tried to run it with the error TypeError: 'numpy.float64' object cannot be interpreted as an integer.

line 17:

samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig) 

I tried to fix it by casting

samples = int(np.append(np.zeros(np.floor(frameSize/2.0)), sig)) 

and I also tried

samples = np.append(np.zeros(int(np.floor(frameSize/2.0)), sig))     

However neither of these worked in the end.

I would really like to know how to convert my .wav files to spectrograms with color so that I can analyze them! Any help would be appreciated!!!!!

Please tell me if you want me to provide any more information about my version of python, what I tried, or what I want to achieve.

like image 740
Sreehari R Avatar asked Jun 27 '17 18:06

Sreehari R


People also ask

Can Python read WAV files?

The wave module in Python's standard library is an easy interface to the audio WAV format. The functions in this module can write audio data in raw format to a file like object and read the attributes of a WAV file.


2 Answers

Use scipy.signal.spectrogram.

import matplotlib.pyplot as plt from scipy import signal from scipy.io import wavfile  sample_rate, samples = wavfile.read('path-to-mono-audio-file.wav') frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate)  plt.pcolormesh(times, frequencies, spectrogram) plt.imshow(spectrogram) plt.ylabel('Frequency [Hz]') plt.xlabel('Time [sec]') plt.show() 

Be sure that your wav file is mono (single channel) and not stereo (dual channel) before trying to do this. I highly recommend reading the scipy documentation at https://docs.scipy.org/doc/scipy- 0.19.0/reference/generated/scipy.signal.spectrogram.html.

Putting plt.pcolormesh before plt.imshow seems to fix some issues, as pointed out by @Davidjb, and if unpacking error occurs, follow the steps by @cgnorthcutt below.

like image 149
Tom Wyllie Avatar answered Oct 05 '22 02:10

Tom Wyllie


I have fixed the errors you are facing for http://www.frank-zalkow.de/en/code-snippets/create-audio-spectrograms-with-python.html
This implementation is better because you can change the binsize (e.g. binsize=2**8)

import numpy as np from matplotlib import pyplot as plt import scipy.io.wavfile as wav from numpy.lib import stride_tricks  """ short time fourier transform of audio signal """ def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):     win = window(frameSize)     hopSize = int(frameSize - np.floor(overlapFac * frameSize))      # zeros at beginning (thus center of 1st window should be for sample nr. 0)        samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig)         # cols for windowing     cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1     # zeros at end (thus samples can be fully covered by frames)     samples = np.append(samples, np.zeros(frameSize))      frames = stride_tricks.as_strided(samples, shape=(int(cols), frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()     frames *= win      return np.fft.rfft(frames)      """ scale frequency axis logarithmically """     def logscale_spec(spec, sr=44100, factor=20.):     timebins, freqbins = np.shape(spec)      scale = np.linspace(0, 1, freqbins) ** factor     scale *= (freqbins-1)/max(scale)     scale = np.unique(np.round(scale))      # create spectrogram with new freq bins     newspec = np.complex128(np.zeros([timebins, len(scale)]))     for i in range(0, len(scale)):                 if i == len(scale)-1:             newspec[:,i] = np.sum(spec[:,int(scale[i]):], axis=1)         else:                     newspec[:,i] = np.sum(spec[:,int(scale[i]):int(scale[i+1])], axis=1)      # list center freq of bins     allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])     freqs = []     for i in range(0, len(scale)):         if i == len(scale)-1:             freqs += [np.mean(allfreqs[int(scale[i]):])]         else:             freqs += [np.mean(allfreqs[int(scale[i]):int(scale[i+1])])]      return newspec, freqs  """ plot spectrogram""" def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"):     samplerate, samples = wav.read(audiopath)      s = stft(samples, binsize)      sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate)      ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel      timebins, freqbins = np.shape(ims)      print("timebins: ", timebins)     print("freqbins: ", freqbins)      plt.figure(figsize=(15, 7.5))     plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")     plt.colorbar()      plt.xlabel("time (s)")     plt.ylabel("frequency (hz)")     plt.xlim([0, timebins-1])     plt.ylim([0, freqbins])      xlocs = np.float32(np.linspace(0, timebins-1, 5))     plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])     ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))     plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])      if plotpath:         plt.savefig(plotpath, bbox_inches="tight")     else:         plt.show()      plt.clf()      return ims  ims = plotstft(filepath) 
like image 23
Beginner Avatar answered Oct 05 '22 02:10

Beginner