I have found for several times the following guidelines for getting the power spectrum of an audio signal: <ul> <li>collect N samples, where N is a power of 2</li> <li>apply a suitable window function to the samples, e.g. Hanning</li> <li>pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts</li> <li>calculate the squared magnitude of your FFT output bins (re * re + im * im)</li> <li>(optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB</li> <li>Now that you have your power spectrum you just need to identify the peak(s), which should be pretty straightforward if you have a reasonable S/N ratio. Note that frequency resolution improves with larger N. For the above example of 44.1 kHz sample rate and N = 32768 the frequency resolution of each bin is 44100 / 32768 = 1.35 Hz.</li> </ul> But... why do I need to apply a window function to the samples? What does that really means? What about the power spectrum, is it the power of each frequency in the range of sample rate? (example: windows media player visualizer of sound?)

Most real world audio signals are non-periodic, meaning that real audio signals do not generally repeat exactly, over any given time span. However, the math of the Fourier transform assumes that the signal being Fourier transformed is periodic over the time span in question. This mismatch between the Fourier assumption of periodicity, and the real world fact that audio signals are generally non-periodic, leads to errors in the transform. These errors are called "spectral leakage", and generally manifest as a wrongful distribution of energy across the power spectrum of the signal. The plot below shows a closeup of the power spectrum of an acoustic guitar playing the A4 note. The spectrum was calculated with the FFT (Fast Fourier Transform), but the signal was not windowed prior to the FFT. Notice the distribution of energy above the -60 dB line, and the three distinct peaks at roughly 440 Hz, 880 Hz, and 1320 Hz. This particular distribution of energy contains "spectral leakage" errors. <img src="https://i.stack.imgur.com/UQwbe.jpg" alt="Power spectrum of guitar playing an A4 note, no window was applied"> To somewhat mitigate the "spectral leakage" errors, you can pre-multiply the signal by a window function designed specifically for that purpose, like for example the Hann window function. The plot below shows the Hann window function in the time-domain. Notice how the tails of the function go smoothly to zero, while the center portion of the function tends smoothly towards the value 1. <img src="https://i.stack.imgur.com/v2h5S.jpg" alt="Hann window function"> Now let's apply the Hann window to the guitar's audio data, and then FFT the resulting signal. The plot below shows a closeup of the power spectrum of the same signal (an acoustic guitar playing the A4 note), but this time the signal was pre-multiplied by the Hann window function prior to the FFT. Notice how the distribution of energy above the -60 dB line has changed significantly, and how the three distinct peaks have changed shape and height. This particular distribution of spectral energy contains fewer "spectral leakage" errors. <img src="https://i.stack.imgur.com/yVuks.jpg" alt="Power spectrum of guitar playing an A4 note, Hann window was applied"> The acoustic guitar's A4 note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, no other instruments or voices, and no post processing. References: Real audio signal data, Hann window function, plots, FFT, and spectral analysis were done here: Fast Fourier Transform, spectral analysis, Hann window function, audio data

Why do I need to apply a window function to samples when building a power spectrum of an audio signal?

Tags:

I have found for several times the following guidelines for getting the power spectrum of an audio signal:

collect N samples, where N is a power of 2
apply a suitable window function to the samples, e.g. Hanning
pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts
calculate the squared magnitude of your FFT output bins (re * re + im * im)
(optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB
Now that you have your power spectrum you just need to identify the peak(s), which should be pretty straightforward if you have a reasonable S/N ratio. Note that frequency resolution improves with larger N. For the above example of 44.1 kHz sample rate and N = 32768 the frequency resolution of each bin is 44100 / 32768 = 1.35 Hz.

But... why do I need to apply a window function to the samples? What does that really means?

What about the power spectrum, is it the power of each frequency in the range of sample rate? (example: windows media player visualizer of sound?)

394

asked Sep 07 '11 17:09

Nuno Santos

1 Answers

Most real world audio signals are non-periodic, meaning that real audio signals do not generally repeat exactly, over any given time span.

However, the math of the Fourier transform assumes that the signal being Fourier transformed is periodic over the time span in question.

This mismatch between the Fourier assumption of periodicity, and the real world fact that audio signals are generally non-periodic, leads to errors in the transform.

These errors are called "spectral leakage", and generally manifest as a wrongful distribution of energy across the power spectrum of the signal.

The plot below shows a closeup of the power spectrum of an acoustic guitar playing the A4 note. The spectrum was calculated with the FFT (Fast Fourier Transform), but the signal was not windowed prior to the FFT.

Notice the distribution of energy above the -60 dB line, and the three distinct peaks at roughly 440 Hz, 880 Hz, and 1320 Hz. This particular distribution of energy contains "spectral leakage" errors.

Power spectrum of guitar playing an A4 note, no window was applied

To somewhat mitigate the "spectral leakage" errors, you can pre-multiply the signal by a window function designed specifically for that purpose, like for example the Hann window function.

The plot below shows the Hann window function in the time-domain. Notice how the tails of the function go smoothly to zero, while the center portion of the function tends smoothly towards the value 1.

Hann window function

Now let's apply the Hann window to the guitar's audio data, and then FFT the resulting signal.

The plot below shows a closeup of the power spectrum of the same signal (an acoustic guitar playing the A4 note), but this time the signal was pre-multiplied by the Hann window function prior to the FFT.

Notice how the distribution of energy above the -60 dB line has changed significantly, and how the three distinct peaks have changed shape and height. This particular distribution of spectral energy contains fewer "spectral leakage" errors.

Power spectrum of guitar playing an A4 note, Hann window was applied

The acoustic guitar's A4 note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, no other instruments or voices, and no post processing.

References:

Real audio signal data, Hann window function, plots, FFT, and spectral analysis were done here:

Fast Fourier Transform, spectral analysis, Hann window function, audio data

answered Oct 07 '22 00:10

Babson

Related questions
                            
                                Rails route to username instead of id
                            
                                Best Practice for Bulk Update in Controller
                            
                                Serializing generic java object to JSON using Jackson
                            
                                Cannot find the declaration of element 'beans' in internet offline mode
                            
                                ffmpeg resize down larger video to fit desired size and add padding
                            
                                How to get NSArray of localised day-of-week names in IOS?
                            
                                Select only specific fields in Magento
                            
                                List ids in group query
                            
                                Why the error 404 happens when I access *.mp4 file by HTTP?
                            
                                Javascript - Get position of the element of the array
                            
                                Determining type of object with id in FB Graph API
                            
                                Calculating area of a polygon drawn on google map

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With