Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why should I discard half of what a FFT returns?

Tags:

python

fft

wav

Looking at this answer: Python Scipy FFT wav files

The technical part is obvious and working, but I have two theoretical questions (the code mentioned is below):

1) Why do I have to normalized (b=...) the frames? What would happen if I used the raw data?

2) Why should I only use half of the FFT result (d=...)?

3) Why should I abs(c) the FFT result?

Perhaps I'm missing something due to inadequate understanding of WAV format or FFT, but while this code works just fine, I'd be glad to understand why it works and how to make the best use of it.

Edit: in response to the comment by @Trilarion :

I'm trying to write a simple, not 100% accurate but more like a proof-of-concept Speaker Diarisation in Python. That means taking a wav file (right now I am using this one for my tests) and in each second (or any other resolution) say if the speaker is person #1 or person #2. I know in advance that these are 2 persons and I am not trying to link them to any known voice signatures, just to separate. Right now take each second, FFT it (and thus get a list of frequencies), and cluster them using KMeans with the number of clusters between 2 and 4 (A, B [,Silence [,A+B]]).

I'm still new to analyzing wav files and audio in general.

import matplotlib.pyplot as plt
from scipy.io import wavfile # get the api
fs, data = wavfile.read('test.wav') # load the data
a = data.T[0] # this is a two channel soundtrack, I get the first track
b=[(ele/2**8.)*2-1 for ele in a] # this is 8-bit track, b is now normalized on [-1,1)
c = sfft.fft(b) # create a list of complex number
d = len(c)/2  # you only need half of the fft list
plt.plot(abs(c[:(d-1)]),'r') 
plt.show()
like image 625
Guy Rapaport Avatar asked Jul 27 '15 20:07

Guy Rapaport


People also ask

What does a FFT return?

The FFT function returns a result equal to the complex, discrete Fourier transform of Array. The result of this function is a single- or double-precision complex array. The FFT function calls the MKL_FFT function unless it is performing an 8D transform.

What units does FFT return?

The FFT sums samples xk in the original units (U) multiplied by unitless complex values (due to discretization) e−2πj⋅. Thus the units after FFT remain the same as for the original signal, i.e. U. If you take the absolute value, the same again.

What are the limitations of FFT?

A disadvantage associated with the FFT is the restricted range of waveform data that can be transformed and the need to apply a window weighting function (to be defined) to the waveform to compensate for spectral leakage (also to be defined). An alternative to the FFT is the discrete Fourier transform (DFT).

What does the amplitude of FFT mean?

The amplitude of the FFT is related to the number of points in the time-domain signal. Use the following equation to compute the amplitude and phase versus frequency from the FFT. where the arctangent function here returns values of phase between –π and +π, a full range of 2π radians.


1 Answers

To address these in order:

1) You don't need to normalize, but the input normalization is close to the raw structure of the digitized waveform so the numbers are unintuitive. For example, how loud is a value of 67? It's easier to normalize it to be in the range -1 to 1 to interpret the values. (But if you wanted to implement a filter, for example, where you did an FFT, modified the FFT values, followed by an IFFT, normalizing would be an unnecessary hassle.)

2) and 3) are similar in that they both have to do with the math living primarily in the complex numbers space. That is, FFTs take a waveform of complex numbers (eg, [.5+.1j, .4+.7j, .4+.6j, ...]) to another sequence of complex numbers.

So in detail:

2) It turns out that if the input waveform is real instead of complex, then the FFT has a symmetry about 0, so only the values that have a frequency >=0 are uniquely interesting.

3) The values output by the FFT are complex, so they have a Re and Im part, but this can also be expressed as a magnitude and phase. For audio signals, it's usually the magnitude that's the most interesting, because this is primarily what we hear. Therefore people often use abs (which is the magnitude), but the phase can be important for different problems as well.

like image 105
tom10 Avatar answered Oct 22 '22 14:10

tom10