FFT Inaccuracy for C#

Tags:

Ive been experimenting with the FFT algorithm. I use NAudio along with a working code of the FFT algorithm from the internet. Based on my observations of the performance, the resulting pitch is inaccurate.

What happens is that I have an MIDI (generated from GuitarPro) converted to WAV file (44.1khz, 16-bit, mono) that contains a pitch progression starting from E2 (the lowest guitar note) up to about E6. What results is for the lower notes (around E2-B3) its generally very wrong. But reaching C4 its somewhat correct in that you can already see the proper progression (next note is C#4, then D4, etc.) However, the problem there is that the pitch detected is a half-note lower than the actual pitch (e.g. C4 should be the note but D#4 is displayed).

What do you think may be wrong? I can post the code if necessary. Thanks very much! Im still beginning to grasp the field of DSP.

Edit: Here is a rough scratch of what Im doing

Click to copy

byte[] buffer = new byte[8192];
int bytesRead;
do
{
  bytesRead = stream16.Read(buffer, 0, buffer.Length);
} while (bytesRead != 0);

And then: (waveBuffer is simply a class that is there to convert the byte[] into float[] since the function only accepts float[])

Click to copy

public int Read(byte[] buffer, int offset, int bytesRead)
{
  int frames = bytesRead / sizeof(float);
  float pitch = DetectPitch(waveBuffer.FloatBuffer, frames);
}

And lastly: (Smbpitchfft is the class that has the FFT algo ... i believe theres nothing wrong with it so im not posting it here)

Click to copy

private float DetectPitch(float[] buffer, int inFrames)
{
  Func<int, int, float> window = HammingWindow;
  if (prevBuffer == null)
  {
    prevBuffer = new float[inFrames]; //only contains zeroes
  }  

  // double frames since we are combining present and previous buffers
  int frames = inFrames * 2;
  if (fftBuffer == null)
  {
    fftBuffer = new float[frames * 2]; // times 2 because it is complex input
  }

  for (int n = 0; n < frames; n++)
  {
     if (n < inFrames)
     {
       fftBuffer[n * 2] = prevBuffer[n] * window(n, frames);
       fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
     }
     else
     {
       fftBuffer[n * 2] = buffer[n - inFrames] * window(n, frames);
       fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
     }
   }
   SmbPitchShift.smbFft(fftBuffer, frames, -1);
  }

And for interpreting the result:

Click to copy

float binSize = sampleRate / frames;
int minBin = (int)(82.407 / binSize); //lowest E string on the guitar
int maxBin = (int)(1244.508 / binSize); //highest E string on the guitar

float maxIntensity = 0f;
int maxBinIndex = 0;

for (int bin = minBin; bin <= maxBin; bin++)
{
    float real = fftBuffer[bin * 2];
    float imaginary = fftBuffer[bin * 2 + 1];
    float intensity = real * real + imaginary * imaginary;
    if (intensity > maxIntensity)
    {
        maxIntensity = intensity;
        maxBinIndex = bin;
    }
}

return binSize * maxBinIndex;

UPDATE (if anyone is still interested):

So, one of the answers below stated that the frequency peak from the FFT is not always equivalent to pitch. I understand that. But I wanted to try something for myself if that was the case (on the assumption that there are times in which the frequency peak IS the resulting pitch). So basically, I got 2 softwares (SpectraPLUS and FFTProperties by DewResearch ; credits to them) that is able to display the frequency-domain for the audio signals.

So here are the results of the frequency peaks in the time domain:

SpectraPLUS

and FFT Properties: enter image description here

This was done using a test note of A2 (around 110Hz). Upon looking at the images, they have frequency peaks around the range of 102-112 Hz for SpectraPLUS and 108 Hz for FFT Properties. On my code, I get 104Hz (I use 8192 blocks and a samplerate of 44.1khz ... 8192 is then doubled to make it complex input so in the end, I get around 5Hz for binsize, as compared to the 10Hz binsize of SpectraPLUS).

So now Im a bit confused, since on the softwares they seem to return the correct result but on my code, I always get 104Hz (note that I have compared the FFT function that I used with others such as Math.Net and it seems to be correct).

Do you think that the problem may be with my interpretation of the data? Or do the softwares do some other thing before displaying the Frequency-Spectrum? Thanks!

753

asked Feb 11 '11 06:02

user488792

1 Answers

It sounds like you may have an interpretation problem with your FFT output. A few random points:

the FFT has a finite resolution - each output bin has a resolution of Fs / N, where Fs is the sample rate and N is the size of the FFT
for notes which are low on the musical scale, the difference in frequency between successive notes is relatively small, so you will need a sufficiently large N to discrimninate between notes which are a semitone apart (see note 1 below)
the first bin (index 0) contains energy centered at 0 Hz but includes energy from +/- Fs / 2N
bin i contains energy centered at i * Fs / N but includes energy from +/- Fs / 2N either side of this center frequency
you will get spectral leakage from adjacent bins - how bad this is depends on what window function you use - no window (== rectangular window) and spectral leakage will be very bad (very broad peaks) - for frequency estimation you want to pick a window function that gives you sharp peaks
pitch is not the same thing as frequency - pitch is a percept, frequency is a physical quantity - the perceived pitch of a musical instrument may be slightly different from the fundamental frequency, depending on the type of instrument (some instruments do not even produce significant energy at their fundamental frequency, yet we still perceive their pitch as if the fundamental were present)

My best guess from the limited information available though is that perhaps you are "off by one" somewhere in your conversion of bin index to frequency, or perhaps your FFT is too small to give you sufficient resolution for the low notes, and you may need to increase N.

You can also improve your pitch estimation via several techniques, such as cepstral analysis, or by looking at the phase component of your FFT output and comparing it for successive FFTs (this allows for a more accurate frequency estimate within a bin for a given FFT size).

Notes

(1) Just to put some numbers on this, E2 is 82.4 Hz, F2 is 87.3 Hz, so you need a resolution somewhat better than 5 Hz to discriminate between the lowest two notes on a guitar (and much finer than this if you actually want to do, say, accurate tuning). At a 44.1 kHz sample then you probably need an FFT of at least N = 8192 to give you sufficient resolution (44100 / 8192 = 5.4 Hz), probably N = 16384 would be better.

165

answered Nov 06 '22 04:11

Paul R

Related questions
                            
                                Generic Class & Type.GetType()
                            
                                Lambda expressions, captured variables and threading
                            
                                WPF out of memory exception when loading large amount of bitmaps in single instance of app. Is there a limit?
                            
                                Confused by Boxing. Casting -1 to Int64 throws InvalidCastException
                            
                                SignedXml checksignature returns false
                            
                                Idea for extending C# syntax
                            
                                Castle Windsor: Auto-register types from one assembly that implement interfaces from another
                            
                                ToDictionary not working as expected
                            
                                Preserve exception when continuing Task<T>
                            
                                Linq To Xml Null Checking of attributes
                            
                                Using ZeroMQ with C# with inproc transport
                            
                                why does Enumerable.Except returns DISTINCT items?
                            
                                How to get the Count property using reflection for Generic types
                            
                                WCF Discovery returns hard-coded URL
                            
                                Object already exists in RSACryptoServiceProvider
                            
                                C# checking Hotmail Inbox through HTTP
                            
                                Code is behaving differently in Release vs Debug Mode
                            
                                Winforms C# Outlook Style Calendar
                            
                                System.Net.HttpWebResponse.GetResponseStream() returns truncated body in WebException
                            
                                Editing Microsoft Word Documents Programmatically

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

FFT Inaccuracy for C#

Tags:

c#

signal-processing

fft

pitch-tracking

pitch

user488792

People also ask

1 Answers

Paul R

Recent Activity

Donate For Us