Why should I discard half of what a FFT returns?

Tags:

Looking at this answer: Python Scipy FFT wav files

The technical part is obvious and working, but I have two theoretical questions (the code mentioned is below):

1) Why do I have to normalized (b=...) the frames? What would happen if I used the raw data?

2) Why should I only use half of the FFT result (d=...)?

3) Why should I abs(c) the FFT result?

Perhaps I'm missing something due to inadequate understanding of WAV format or FFT, but while this code works just fine, I'd be glad to understand why it works and how to make the best use of it.

Edit: in response to the comment by @Trilarion :

I'm trying to write a simple, not 100% accurate but more like a proof-of-concept Speaker Diarisation in Python. That means taking a wav file (right now I am using this one for my tests) and in each second (or any other resolution) say if the speaker is person #1 or person #2. I know in advance that these are 2 persons and I am not trying to link them to any known voice signatures, just to separate. Right now take each second, FFT it (and thus get a list of frequencies), and cluster them using KMeans with the number of clusters between 2 and 4 (A, B [,Silence [,A+B]]).

I'm still new to analyzing wav files and audio in general.

import matplotlib.pyplot as plt
from scipy.io import wavfile # get the api
fs, data = wavfile.read('test.wav') # load the data
a = data.T[0] # this is a two channel soundtrack, I get the first track
b=[(ele/2**8.)*2-1 for ele in a] # this is 8-bit track, b is now normalized on [-1,1)
c = sfft.fft(b) # create a list of complex number
d = len(c)/2  # you only need half of the fft list
plt.plot(abs(c[:(d-1)]),'r') 
plt.show()

625

asked Jul 27 '15 20:07

Guy Rapaport

1 Answers

To address these in order:

1) You don't need to normalize, but the input normalization is close to the raw structure of the digitized waveform so the numbers are unintuitive. For example, how loud is a value of 67? It's easier to normalize it to be in the range -1 to 1 to interpret the values. (But if you wanted to implement a filter, for example, where you did an FFT, modified the FFT values, followed by an IFFT, normalizing would be an unnecessary hassle.)

2) and 3) are similar in that they both have to do with the math living primarily in the complex numbers space. That is, FFTs take a waveform of complex numbers (eg, [.5+.1j, .4+.7j, .4+.6j, ...]) to another sequence of complex numbers.

So in detail:

2) It turns out that if the input waveform is real instead of complex, then the FFT has a symmetry about 0, so only the values that have a frequency >=0 are uniquely interesting.

3) The values output by the FFT are complex, so they have a Re and Im part, but this can also be expressed as a magnitude and phase. For audio signals, it's usually the magnitude that's the most interesting, because this is primarily what we hear. Therefore people often use abs (which is the magnitude), but the phase can be important for different problems as well.

105

answered Oct 22 '22 14:10

tom10

Related questions
                            
                                Accessing sharepoint site in python with windows authentication
                            
                                Additional keyword arguments in seaborn jointplot
                            
                                How to center a tkinter widget in a sticky frame
                            
                                Checking code for deprecation warnings
                            
                                pika, stop_consuming does not work
                            
                                django: value has an invalid date format. It must be in YYYY-MM-DD format
                            
                                Canonical way to do bulk create in django-rest-framework 3.x?
                            
                                How can you style Django's file picker form button?
                            
                                Ordered Logit in Python?
                            
                                abortable sleep() in Python
                            
                                Retrieve string version of document by ID in Gensim
                            
                                Python Key Error when setting environment variable in supervisord
                            
                                reverse on @list_route with custom url_path
                            
                                Python igraph: delete vertices from a graph
                            
                                flask running in mod_wsgi cannot write to /tmp
                            
                                does nolearn/lasagne support python 3
                            
                                how to store ipython magic output into variable
                            
                                Django .aggregate() on .annotate()
                            
                                Python, sharing mysql connection in multiple functions - pass connection or cursor?
                            
                                How to perform discrete optimization of functions over matrices?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why should I discard half of what a FFT returns?

Tags:

python

fft

wav

Guy Rapaport

People also ask

1 Answers

tom10

Recent Activity

Donate For Us