I want to get frequency with maximum power for every moment in wav file. So I wrote STFT in Python using fft from scipy. I used kaiser window function from scipy. Everything looking great, but my output looks strange. It has some very small numbers and some very high.
here is the output for one wav file: http://pastebin.com/5Ryd2uXj and here is the code in python:
import scipy, pylab
import wave
import struct
import sys
def stft(data, cp, do, hop):
dos = int(do*cp)
w = scipy.kaiser(dos,12) //12 is very high for kaiser window
temp=[]
wyn=[]
for i in range(0, len(data)-dos, hop):
temp=scipy.fft(w*data[i:i+dos])
max=-1
for j in range(0, len(temp),1):
licz=temp[j].real**2+temp[j].imag**2
if( licz>max ):
max = licz
maxj = j
wyn.append(maxj)
#wyn = scipy.array([scipy.fft(w*data[i:i+dos])
#for i in range(0, len(data)-dos, 1)])
return wyn
file = wave.open(sys.argv[1])
bity = file.readframes(file.getnframes())
data=struct.unpack('{n}h'.format(n=file.getnframes()), bity)
file.close()
cp=44100 #sampling frequency
do=0.05 #window size
hop = 5
wyn=stft(data,cp,do,hop)
print len(wyn)
for i in range(0, len(wyn), 1):
print wyn[i]
The actual FT of a sine wave is a pair of delta functions equidistant from 0-frequency. With a discrete function (samples), this is repeated every fs
(sampling rate) in the frequency domain. Small errors in FFT computation will mean these two deltas (FT of your sine wave) will not be exactly the same height, so your algorithm is simply picking the taller one.
The scipy FFT function will give you frequency components with the domain [0, fs]
. Since (as I mentioned above) this is periodic, these values could also be remapped as [-fs/2, fs/2]
by swapping the result at the center point - look into using fftshift
to do this.
It sounds like you may only be interested in the positive frequencies, however, so you can simply discard the second half of the result of your FFT.
From the notes of scipy.fftpack.fft
:
The packing of the result is “standard”: If A = fft(a, n), then A[0] contains the zero-frequency term, A[1:n/2+1] contains the positive-frequency terms, and A[n/2+1:] contains the negative-frequency terms, in order of decreasingly negative frequency. So for an 8-point transform, the frequencies of the result are [ 0, 1, 2, 3, 4, -3, -2, -1].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With