Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FFT Inaccuracy for C#

Ive been experimenting with the FFT algorithm. I use NAudio along with a working code of the FFT algorithm from the internet. Based on my observations of the performance, the resulting pitch is inaccurate.

What happens is that I have an MIDI (generated from GuitarPro) converted to WAV file (44.1khz, 16-bit, mono) that contains a pitch progression starting from E2 (the lowest guitar note) up to about E6. What results is for the lower notes (around E2-B3) its generally very wrong. But reaching C4 its somewhat correct in that you can already see the proper progression (next note is C#4, then D4, etc.) However, the problem there is that the pitch detected is a half-note lower than the actual pitch (e.g. C4 should be the note but D#4 is displayed).

What do you think may be wrong? I can post the code if necessary. Thanks very much! Im still beginning to grasp the field of DSP.

Edit: Here is a rough scratch of what Im doing

byte[] buffer = new byte[8192];
int bytesRead;
do
{
  bytesRead = stream16.Read(buffer, 0, buffer.Length);
} while (bytesRead != 0);

And then: (waveBuffer is simply a class that is there to convert the byte[] into float[] since the function only accepts float[])

public int Read(byte[] buffer, int offset, int bytesRead)
{
  int frames = bytesRead / sizeof(float);
  float pitch = DetectPitch(waveBuffer.FloatBuffer, frames);
}

And lastly: (Smbpitchfft is the class that has the FFT algo ... i believe theres nothing wrong with it so im not posting it here)

private float DetectPitch(float[] buffer, int inFrames)
{
  Func<int, int, float> window = HammingWindow;
  if (prevBuffer == null)
  {
    prevBuffer = new float[inFrames]; //only contains zeroes
  }  

  // double frames since we are combining present and previous buffers
  int frames = inFrames * 2;
  if (fftBuffer == null)
  {
    fftBuffer = new float[frames * 2]; // times 2 because it is complex input
  }

  for (int n = 0; n < frames; n++)
  {
     if (n < inFrames)
     {
       fftBuffer[n * 2] = prevBuffer[n] * window(n, frames);
       fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
     }
     else
     {
       fftBuffer[n * 2] = buffer[n - inFrames] * window(n, frames);
       fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
     }
   }
   SmbPitchShift.smbFft(fftBuffer, frames, -1);
  }

And for interpreting the result:

float binSize = sampleRate / frames;
int minBin = (int)(82.407 / binSize); //lowest E string on the guitar
int maxBin = (int)(1244.508 / binSize); //highest E string on the guitar

float maxIntensity = 0f;
int maxBinIndex = 0;

for (int bin = minBin; bin <= maxBin; bin++)
{
    float real = fftBuffer[bin * 2];
    float imaginary = fftBuffer[bin * 2 + 1];
    float intensity = real * real + imaginary * imaginary;
    if (intensity > maxIntensity)
    {
        maxIntensity = intensity;
        maxBinIndex = bin;
    }
}

return binSize * maxBinIndex;

UPDATE (if anyone is still interested):

So, one of the answers below stated that the frequency peak from the FFT is not always equivalent to pitch. I understand that. But I wanted to try something for myself if that was the case (on the assumption that there are times in which the frequency peak IS the resulting pitch). So basically, I got 2 softwares (SpectraPLUS and FFTProperties by DewResearch ; credits to them) that is able to display the frequency-domain for the audio signals.

So here are the results of the frequency peaks in the time domain:

SpectraPLUS

SpectraPLUS

and FFT Properties: enter image description here

This was done using a test note of A2 (around 110Hz). Upon looking at the images, they have frequency peaks around the range of 102-112 Hz for SpectraPLUS and 108 Hz for FFT Properties. On my code, I get 104Hz (I use 8192 blocks and a samplerate of 44.1khz ... 8192 is then doubled to make it complex input so in the end, I get around 5Hz for binsize, as compared to the 10Hz binsize of SpectraPLUS).

So now Im a bit confused, since on the softwares they seem to return the correct result but on my code, I always get 104Hz (note that I have compared the FFT function that I used with others such as Math.Net and it seems to be correct).

Do you think that the problem may be with my interpretation of the data? Or do the softwares do some other thing before displaying the Frequency-Spectrum? Thanks!

like image 753
user488792 Avatar asked Feb 11 '11 06:02

user488792


People also ask

What is FFT analysis?

What is FFT Analysis? FFT analysis is one of the most used techniques when performing signal analysis across several application domains. FFT transforms signals from the time domain to the frequency domain. FFT is the abbreviation of Fast Fourier Transform.

What happens after the FFT has completed?

After the FFT has completed it will write over the provided arrays as the FFT is done “in place”. Note, we’re using floating point here and not fixed width.

How many lines of code does FFT have?

Use that. The FFT routines here have less than a hundred lines of code. The library implements forward and inverse fast Fourier transform (FFT) algorithms using both decimation in time (DIT) and decimation in frequency (DIF). @DaBler That's exactly what I was searching for! thank you!

What is fast Fourier transform (FFT)?

Other forms of the FFT like the 2D or the 3D FFT can be found on the book too. The Fast Fourier Transform is an optimized computational algorithm to implement the Discreet Fourier Transform to an array of 2^N samples. It allows to determine the frequency of a discreet signal, represent the signal in the frequency domain, convolution, etc...


1 Answers

It sounds like you may have an interpretation problem with your FFT output. A few random points:

  • the FFT has a finite resolution - each output bin has a resolution of Fs / N, where Fs is the sample rate and N is the size of the FFT

  • for notes which are low on the musical scale, the difference in frequency between successive notes is relatively small, so you will need a sufficiently large N to discrimninate between notes which are a semitone apart (see note 1 below)

  • the first bin (index 0) contains energy centered at 0 Hz but includes energy from +/- Fs / 2N

  • bin i contains energy centered at i * Fs / N but includes energy from +/- Fs / 2N either side of this center frequency

  • you will get spectral leakage from adjacent bins - how bad this is depends on what window function you use - no window (== rectangular window) and spectral leakage will be very bad (very broad peaks) - for frequency estimation you want to pick a window function that gives you sharp peaks

  • pitch is not the same thing as frequency - pitch is a percept, frequency is a physical quantity - the perceived pitch of a musical instrument may be slightly different from the fundamental frequency, depending on the type of instrument (some instruments do not even produce significant energy at their fundamental frequency, yet we still perceive their pitch as if the fundamental were present)

My best guess from the limited information available though is that perhaps you are "off by one" somewhere in your conversion of bin index to frequency, or perhaps your FFT is too small to give you sufficient resolution for the low notes, and you may need to increase N.

You can also improve your pitch estimation via several techniques, such as cepstral analysis, or by looking at the phase component of your FFT output and comparing it for successive FFTs (this allows for a more accurate frequency estimate within a bin for a given FFT size).


Notes

(1) Just to put some numbers on this, E2 is 82.4 Hz, F2 is 87.3 Hz, so you need a resolution somewhat better than 5 Hz to discriminate between the lowest two notes on a guitar (and much finer than this if you actually want to do, say, accurate tuning). At a 44.1 kHz sample then you probably need an FFT of at least N = 8192 to give you sufficient resolution (44100 / 8192 = 5.4 Hz), probably N = 16384 would be better.

like image 165
Paul R Avatar answered Nov 06 '22 04:11

Paul R