I am trying to do a quick spectral analysis on the streaming audio data to capture vowels (something like JLip-sync). Using PyAudio to capture the voice data in small chunks (1024) for short durations (0.0625 sec.). Using numpy.fft for the analysis, and to get rid of leakage using numpy.hanning window. I am using 4096*4 as the sampling rate (not 44100 or 22050, and open to discussion as well; 4096*4 being nearest to 22050).
Considering the frequencies I am interested in (ranging from 300 Hz to 3000Hz) how can the ideal window size be calculated using data length and min/max frequencies I am looking for?
Thanks.
Kadir
You can minimize the effects of performing an FFT over a noninteger number of cycles by using a technique called windowing. Windowing reduces the amplitude of the discontinuities at the boundaries of each finite sequence acquired by the digitizer.
The FFT window size is typically a power of 2. If your sampling rate is 44,100 samples per second, then a window size of 32 samples is about 0.0007 s, and a window size of 65536 is about 1.486 s. There's a tradeoff in the choice of window size.
FR = Fmax/N(Bins) For a 44100 sampling rate, we have a 22050 Hz band. With a 1024 FFT size, we divide this band into 512 bins. FR = 22050/1024 ≃ 21,53 Hz. Basically, the FFT size can be defined independently from the window size.
The critical factor is how much resolution you need in the frequency domain to discriminate between different vowels. Resolution is 1 / T
, where T
is the duration of your FFT window. So if you sample for 62.5 ms then your maximum resolution is 16 Hz (i.e. each FFT bin is 16 Hz wide) if your FFT is the same size as your sampling interval (1024 samples). If you go to a smaller FFT then obviously your resolution will worsen proportionately, e.g. a 512 point FFT would only have a resolution of 32 Hz.
@Kadir:
The purpose of windowing your data before processing it with a discrete Fourier transform (DFT or FFT), is to minimize spectral leakage, which happens when you try to Fourier-transform non-cyclical data.
Windowing works by forcing your data smoothly to zero at exactly the start and end of the sequence, but not before. Shortening your window destroys information unnecessarily.
So your window length should match the length of your sample sequences. For instance, with 1024 samples, your window length should be 1024.
If the highest frequency you want to resolve is 3 KHz, use 8192 samples or more, such as 16384, or 32768 samples, at various sampling rates.
Also, try a different FFT algorithm, different sample lengths, and different windows, including the Hann (Hanning), but also other windows with better side lobe attenuation, such as the Blackman-Harris series, and the Kaiser-Bessel series, etc.
If your application is noisy, you may have to choose between the better noise suppression windows, and the higher spectral resolution windows. So it's a good idea to try different windows, so you can find the best one for your application.
Now, write down your results with each setup (i.e. with each window, sample length, sampling rate, etc.), and look for results that agree across multiple setups. You will learn much about your data, and very likely find the answer to your problem.
You can do this with Matlab: http://www.mathworks.com/help/techdoc/ref/fft.html
Or with this online FFT spectrum analyzer: http://www.sooeet.com/math/fft.php
And don't forget to post your results here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With