I have recorded an array[1024] of data from my mic on my Android phone, passed it through a 1D forward DFT of the real data (setting a further 1024 bits to 0). I saved the array to a text file, and repeated this 8 times.
I got back 16384 results. I opened the text file in Excel and made a graph to see what it looked like(x=index of array, y=size of number returned). There are some massive spikes (both positive and negative) in magnitude around 110, 232, and small spikes continuing in that fashion until around 1817 and 1941 where the spikes get big again, then drop again.
My problem is that wherever I look for help on the topic it mentions gettng the real and imaginary numbers, I only have a 1D array, that I got back from the method I used from Piotr Wendykier's class:
DoubleFFT_1D.realForwardFull(audioDataArray); // from the library JTransforms.
My question is: What do I need to do to this data to return a frequency? The sound recorded was me playing an 'A' on the bottom string (5th fret) of my guitar (at roughly 440Hz) .
Let X = fft(x) . Both x and X have length N . Suppose X has two peaks at n0 and N-n0 . Then the sinusoid frequency is f0 = fs*n0/N Hertz.
The frequency resolution is defined as Fs/N in FFT. Where Fs is sample frequency, N is number of data points used in the FFT. For example, if the sample frequency is 1000 Hz and the number of data points used by you in FFT is 1000. Then the frequency resolution is equal to 1000 Hz/1000 = 1 Hz.
In many examples including the documentation about fft, the one-sided frequency vector is defined as: f = Fs*(0:(L/2))/L where L=length of the FFT.
The complex data is interleaved, with real components at even indices and imaginary components at odd indices, i.e. the real components are at index 2*i
, the imaginary components are at index 2*i+1
.
To get the magnitude of the spectrum at index i, you want:
re = fft[2*i]; im = fft[2*i+1]; magnitude[i] = sqrt(re*re+im*im);
Then you can plot magnitude[i] for i = 0 to N / 2 to get the power spectrum. Depending on the nature of your audio input you should see one or more peaks in the spectrum.
To get the approximate frequency of any given peak you can convert the index of the peak as follows:
freq = i * Fs / N;
where:
freq = frequency in Hz i = index of peak Fs = sample rate in Hz (e.g. 44100 Hz, or whatever you are using) N = size of FFT (e.g. 1024 in your case)
Note: if you have not previously applied a suitable window function to the time-domain input data then you will get a certain amount of spectral leakage and the power spectrum will look rather "smeared".
To expand on this further, here is pseudo-code for a complete example where we take audio data and identify the frequency of the largest peak:
N = 1024 // size of FFT and sample window Fs = 44100 // sample rate = 44.1 kHz data[N] // input PCM data buffer fft[N * 2] // FFT complex buffer (interleaved real/imag) magnitude[N / 2] // power spectrum // capture audio in data[] buffer // ... // apply window function to data[] // ... // copy real input data to complex FFT buffer for i = 0 to N - 1 fft[2*i] = data[i] fft[2*i+1] = 0 // perform in-place complex-to-complex FFT on fft[] buffer // ... // calculate power spectrum (magnitude) values from fft[] for i = 0 to N / 2 - 1 re = fft[2*i] im = fft[2*i+1] magnitude[i] = sqrt(re*re+im*im) // find largest peak in power spectrum max_magnitude = -INF max_index = -1 for i = 0 to N / 2 - 1 if magnitude[i] > max_magnitude max_magnitude = magnitude[i] max_index = i // convert index of largest peak to frequency freq = max_index * Fs / N
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With