I've been working on a simple frequency detection setup on the iphone. Analyzing in the frequency domain using FFT results has been somewhat unreliable in the presence of harmonics. I was hoping to use Cepstrum results to help decide what fundamental frequency is playing.
I am working with AudioQueues in the AudioToolbox framework, and do the Fourier transforms using the Accelerate framework.
My process has been exactly what is listed on Wikipedia's Cepstrum article for the Real Power Cepstrum, specifically: signal → FT → abs() → square → log → FT → abs() → square → power cepstrum.
The problem I have is that the Cepstrum results are extremely noisy. I have to drop the first and last 20 values as they are astronomical compared to the other values. Even after "cleaning" the data, there is still a huge amount of variation - far more than I would expect given the first graph. See the pictures below for the visualizations of the frequency domain and the quefrency domain. FFT Cepstrum
When I see such a clear winner in the frequency domain as on that graph, I expect to see a similarly clear result in the quefrency domain. I played A440 and would expect bin 82 or so to have the highest magnitude. The third peak on the graph represents bin 79, which is close enough. As I said, the first 20 or so bins are so astronomical in magnitude as to be unusuable, and I had to delete them from the data set in order to see anything. Another odd quality of the cepstrum data is that the even bins seem to be much higher than the odd bins. Here are the frequency bins from 77-86:
77: 151150.0313
78: 22385.92773
79: 298753.1875
80: 56532.72656
81: 114177.4766
82: 31222.88281
83: 4620.785156
84: 13382.5332
85: 83.668259
86: 1205.023193
My question is how to clean up the frequency domain so that my Cepstrum domain results are not so wild. Alternately, help me better understand how to interpret these results if they are as one would expect in a Cepstrum analysis. I can post examples of the code I'm using, but it mostly uses vDSP calls and I don't know how helpful that would be.
The cepstrum is a representation used in homomorphic signal processing, to convert signals combined by convolution (such as a source and filter) into sums of their cepstra, for linear separation. In particular, the power cepstrum is often used as a feature vector for representing the human voice and musical signals.
Cepstrum Analysis is a tool for the detection of periodicity in a frequency spectrum, and seems so far to have been used mainly in speech analysis for voice pitch determination and related questions.
A cepstrum, or cepstral analysis, is a technique used to try to separate a signal with high overtone content into two portions. The portion near DC represents the spectral envelope of all the overtones, or the speech formant, which might be useful for speaker or instrument recognition. Later peaks in the cepstrum result represents the exciter frequency or frequencies, if that frequency generates enough harmonic overtone content.
Since a cepstrum is usually done without any (non-rectangular) window, it can produce a Sinc response even to a clean overtone sequence, with the width of the response inversely roughly proportional to the length of the overtone sequence or the number of overtones. And, of course, any slightly inharmonic overtones (as found in actual musical instruments) will make the cepstrum results even messier. So a cepstrum peak may only be good at giving one the approximate location of the fundamental frequency, which could still be a useful result in rejecting other frequency candidates when doing frequency estimation.
A "clean looking" cepstrum might be the result of a very long sequence of exactly harmonic overtones with a nearly flat frequency response, which is perhaps not what is found in real life signals.
The following analysis illustrates Cepstrum's performance on synthetic and real-world signals.
First we examine a synthetic signal.
The plot below shows a synthetic steady-state E2 note, synthesized using a typical near-DC component, a fundamental at 82.4 Hz, and a total of 8 harmonics at integer multiples of 82.4 Hz. The synthetic sinusoid was programmed to generate 4096 samples.
The plot below shows a closeup of the input that was used for the Cepstrum calculation of the synthetic E2 note. It is the log(|FFT|^2) output from the synthetic E2 note.
The plot below shows the Cepstrum of the synthetic E2 note. Observe the prominent non-DC peak at 12.36. The Cepstrum width is 1024 (the output of the second FFT), therefore the peak corresponds to 1024/12.36 = 82.8 Hz which is very close to the actual 82.4 Hz of the fundamental.
Now we examine a real-world signal.
The plot below shows the spectrum of the E2 note from a real acoustic guitar.
The plot below shows a closeup of the input that was used for the Cepstrum calculation of the acoustic guitar's E2 note. It is the log(|FFT|^2) output from the acoustic guitar's E2 note.
The plot below shows the Cepstrum of the acoustic guitar's E2 note. Observe the prominent non-DC peak at 542.8. The Cepstrum width is 32768 (the output of the second FFT), therefore the peak corresponds to 32768/542.8 = 60.4 Hz which is fairly far from the actual 82.4 Hz of the fundamental.
The recording of the E2 guitar note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, and no other instruments or voices.
This illustrates the significant challenge of using Cepstral analysis for pitch determination in real-world audio signals.
References:
Real audio signal data, synthetic signal generation, plots, FFT, and Cepstral analysis were done here: Musical instrument cepstrum
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With