I've been working on a simple frequency detection setup on the iphone. Analyzing in the frequency domain using FFT results has been somewhat unreliable in the presence of harmonics. I was hoping to use Cepstrum results to help decide what fundamental frequency is playing. I am working with AudioQueues in the AudioToolbox framework, and do the Fourier transforms using the Accelerate framework. My process has been exactly what is listed on Wikipedia's Cepstrum article for the Real Power Cepstrum, specifically: signal → FT → abs() → square → log → FT → abs() → square → power cepstrum. The problem I have is that the Cepstrum results are extremely noisy. I have to drop the first and last 20 values as they are astronomical compared to the other values. Even after "cleaning" the data, there is still a huge amount of variation - far more than I would expect given the first graph. See the pictures below for the visualizations of the frequency domain and the quefrency domain. <img src="https://i.stack.imgur.com/C2ExC.png" alt="FFT">FFT <img src="https://i.stack.imgur.com/sw3Ot.png" alt="Cepstrum">Cepstrum When I see such a clear winner in the frequency domain as on that graph, I expect to see a similarly clear result in the quefrency domain. I played A440 and would expect bin 82 or so to have the highest magnitude. The third peak on the graph represents bin 79, which is close enough. As I said, the first 20 or so bins are so astronomical in magnitude as to be unusuable, and I had to delete them from the data set in order to see anything. Another odd quality of the cepstrum data is that the even bins seem to be much higher than the odd bins. Here are the frequency bins from 77-86: <pre class="prettyprint"><code>77: 151150.0313 78: 22385.92773 79: 298753.1875 80: 56532.72656 81: 114177.4766 82: 31222.88281 83: 4620.785156 84: 13382.5332 85: 83.668259 86: 1205.023193 </code></pre> My question is how to clean up the frequency domain so that my Cepstrum domain results are not so wild. Alternately, help me better understand how to interpret these results if they are as one would expect in a Cepstrum analysis. I can post examples of the code I'm using, but it mostly uses vDSP calls and I don't know how helpful that would be.

The following analysis illustrates Cepstrum's performance on synthetic and real-world signals. First we examine a synthetic signal. The plot below shows a synthetic steady-state E2 note, synthesized using a typical near-DC component, a fundamental at 82.4 Hz, and a total of 8 harmonics at integer multiples of 82.4 Hz. The synthetic sinusoid was programmed to generate 4096 samples. <img src="https://i.stack.imgur.com/qGq6S.jpg" alt="Synthetic E2 note spectrum"> The plot below shows a closeup of the input that was used for the Cepstrum calculation of the synthetic E2 note. It is the log(|FFT|^2) output from the synthetic E2 note. <img src="https://i.stack.imgur.com/6sp9x.jpg" alt="Cepstrum input: synthetic E2 note's spectrum"> The plot below shows the Cepstrum of the synthetic E2 note. Observe the prominent non-DC peak at 12.36. The Cepstrum width is 1024 (the output of the second FFT), therefore the peak corresponds to 1024/12.36 = 82.8 Hz which is very close to the actual 82.4 Hz of the fundamental. <img src="https://i.stack.imgur.com/uIzlR.jpg" alt="Synthetic E2 note cepstrum closeup"> Now we examine a real-world signal. The plot below shows the spectrum of the E2 note from a real acoustic guitar. <img src="https://i.stack.imgur.com/9P5NB.jpg" alt="Guitar E2 note spectrum closeup"> The plot below shows a closeup of the input that was used for the Cepstrum calculation of the acoustic guitar's E2 note. It is the log(|FFT|^2) output from the acoustic guitar's E2 note. <img src="https://i.stack.imgur.com/uV0cS.jpg" alt="enter image description here"> The plot below shows the Cepstrum of the acoustic guitar's E2 note. Observe the prominent non-DC peak at 542.8. The Cepstrum width is 32768 (the output of the second FFT), therefore the peak corresponds to 32768/542.8 = 60.4 Hz which is fairly far from the actual 82.4 Hz of the fundamental. <img src="https://i.stack.imgur.com/hRmK0.jpg" alt="Guitar E2 note cepstrum closeup"> The recording of the E2 guitar note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, and no other instruments or voices. This illustrates the significant challenge of using Cepstral analysis for pitch determination in real-world audio signals. References: Real audio signal data, synthetic signal generation, plots, FFT, and Cepstral analysis were done here: Musical instrument cepstrum

Cleaning up noisy Cepstrum results

Tags:

iphone

signal-processing

fft

frequency-analysis

pitch

I've been working on a simple frequency detection setup on the iphone. Analyzing in the frequency domain using FFT results has been somewhat unreliable in the presence of harmonics. I was hoping to use Cepstrum results to help decide what fundamental frequency is playing.

I am working with AudioQueues in the AudioToolbox framework, and do the Fourier transforms using the Accelerate framework.

My process has been exactly what is listed on Wikipedia's Cepstrum article for the Real Power Cepstrum, specifically: signal → FT → abs() → square → log → FT → abs() → square → power cepstrum.

The problem I have is that the Cepstrum results are extremely noisy. I have to drop the first and last 20 values as they are astronomical compared to the other values. Even after "cleaning" the data, there is still a huge amount of variation - far more than I would expect given the first graph. See the pictures below for the visualizations of the frequency domain and the quefrency domain. FFT FFT Cepstrum

When I see such a clear winner in the frequency domain as on that graph, I expect to see a similarly clear result in the quefrency domain. I played A440 and would expect bin 82 or so to have the highest magnitude. The third peak on the graph represents bin 79, which is close enough. As I said, the first 20 or so bins are so astronomical in magnitude as to be unusuable, and I had to delete them from the data set in order to see anything. Another odd quality of the cepstrum data is that the even bins seem to be much higher than the odd bins. Here are the frequency bins from 77-86:

Click to copy

77: 151150.0313
78:  22385.92773
79: 298753.1875
80:  56532.72656
81: 114177.4766
82:  31222.88281
83:   4620.785156
84:  13382.5332
85:     83.668259
86: 1205.023193

My question is how to clean up the frequency domain so that my Cepstrum domain results are not so wild. Alternately, help me better understand how to interpret these results if they are as one would expect in a Cepstrum analysis. I can post examples of the code I'm using, but it mostly uses vDSP calls and I don't know how helpful that would be.

827

asked Mar 12 '11 18:03

brodney

2 Answers

A cepstrum, or cepstral analysis, is a technique used to try to separate a signal with high overtone content into two portions. The portion near DC represents the spectral envelope of all the overtones, or the speech formant, which might be useful for speaker or instrument recognition. Later peaks in the cepstrum result represents the exciter frequency or frequencies, if that frequency generates enough harmonic overtone content.

Since a cepstrum is usually done without any (non-rectangular) window, it can produce a Sinc response even to a clean overtone sequence, with the width of the response inversely roughly proportional to the length of the overtone sequence or the number of overtones. And, of course, any slightly inharmonic overtones (as found in actual musical instruments) will make the cepstrum results even messier. So a cepstrum peak may only be good at giving one the approximate location of the fundamental frequency, which could still be a useful result in rejecting other frequency candidates when doing frequency estimation.

A "clean looking" cepstrum might be the result of a very long sequence of exactly harmonic overtones with a nearly flat frequency response, which is perhaps not what is found in real life signals.

120

answered Oct 06 '22 16:10

hotpaw2

The following analysis illustrates Cepstrum's performance on synthetic and real-world signals.

First we examine a synthetic signal.

The plot below shows a synthetic steady-state E2 note, synthesized using a typical near-DC component, a fundamental at 82.4 Hz, and a total of 8 harmonics at integer multiples of 82.4 Hz. The synthetic sinusoid was programmed to generate 4096 samples.

Synthetic E2 note spectrum

The plot below shows a closeup of the input that was used for the Cepstrum calculation of the synthetic E2 note. It is the log(|FFT|^2) output from the synthetic E2 note.

Cepstrum input: synthetic E2 note's spectrum

The plot below shows the Cepstrum of the synthetic E2 note. Observe the prominent non-DC peak at 12.36. The Cepstrum width is 1024 (the output of the second FFT), therefore the peak corresponds to 1024/12.36 = 82.8 Hz which is very close to the actual 82.4 Hz of the fundamental.

Synthetic E2 note cepstrum closeup

Now we examine a real-world signal.

The plot below shows the spectrum of the E2 note from a real acoustic guitar.

Guitar E2 note spectrum closeup

The plot below shows a closeup of the input that was used for the Cepstrum calculation of the acoustic guitar's E2 note. It is the log(|FFT|^2) output from the acoustic guitar's E2 note.

enter image description here

The plot below shows the Cepstrum of the acoustic guitar's E2 note. Observe the prominent non-DC peak at 542.8. The Cepstrum width is 32768 (the output of the second FFT), therefore the peak corresponds to 32768/542.8 = 60.4 Hz which is fairly far from the actual 82.4 Hz of the fundamental.

Guitar E2 note cepstrum closeup

The recording of the E2 guitar note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, and no other instruments or voices.

This illustrates the significant challenge of using Cepstral analysis for pitch determination in real-world audio signals.

References:

Real audio signal data, synthetic signal generation, plots, FFT, and Cepstral analysis were done here: Musical instrument cepstrum

answered Oct 06 '22 16:10

Babson

Related questions
                            
                                Text input box like the SMS app on the iPhone
                            
                                How can i make a Vertical UISlider? [duplicate]
                            
                                Wrapping/warping a CALayer/UIView (or OpenGL) in 3D (iPhone)
                            
                                Code signing error: Doesn't match key pair in default keychain
                            
                                Can I get an audio session / apply audio units to playback from MPMusicPlayerController?
                            
                                iPhone, No Garbage Collection: What About MonoTouch?
                            
                                iPhone Mobile Safari, How many max parallel http connections?
                            
                                Get to the view controller that pushed the visible view controller
                            
                                For iPhone OS 4.0 "dateFromString" method of NSDateFormatter returns nil
                            
                                moving/updating MKOverlay on MKMapView
                            
                                How to get List of Images Using NSBundle
                            
                                UITextView inputView
                            
                                iPhone app crash on iOS 4.0
                            
                                Calculate max font size that fits in a rect?
                            
                                Limit web service access to iPhone app
                            
                                xcode basic game template
                            
                                iTunes Connect: How to change the bundle id prior to app submission?
                            
                                VNC viewer for iPhone, cotvnc or libvncclient [closed]
                            
                                Edge Detection of Image in iPhone Using Image Magick
                            
                                In which thread are iOS completion handler blocks called?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With