I am creating a pitch detection program that extracts the fundamental frequency from the power spectrum obtained from the FFT of a frame. This is what I have so far:
Now the program produces an integer with value from 0 to 87 for each frame. Each integer corresponds to a piano note according to a formula I found here. I am now trying to imitate the melodies in the input signal by synthesizing sounds based on the calculated notes. I tried to simply generate a sine wave with magnitude and frequency corresponding to the fundamental frequency but the result sounded nothing like the original sound (almost sounded like random beeps).
I don't really understand music so based on what I have, can I generate a sound with melodies similar to the input (instrument, voice, instrument + voice) based on the information I get from the fundamental frequency? If not, what other ideas can I try using the code I currently have.
Thanks!
It depends greatly on the musical content you want to work with - extracting the pitch of a monophonic recording (i.e. single instrument or voice) is not the same as extracting the pitch of a single instrument from a polyphonic mixture (e.g. extracting the pitch of the melody from a polyphonic recording).
For monophonic pitch extraction there are various algorithm you could try to implement both in the time domain and frequency domain. A couple of examples include Yin (time domain) and HPS (frequency domain), link to further details on both are provided in wikipedia:
However, neither will work well if you want to extract the melody from polyphonic material. Melody extraction from polyphonic music is still a research problem, and there isn't a simple set of steps you can follow. There are some tools out there provided by the research community that you can try out (for non-commercial use only though), namely:
As a final note, when synthesizing your output I'd recommend synthesizing the continuous pitch curve that you extract (the easiest way to do this is to estimate the pitch every X ms (e.g. 10) and synthesize a sine wave that changes frequency every 10 ms, ensuring continuous phase). This will make your result sound a lot more natural, and you avoid the extra error involved in quantizing a continuous pitch curve into discrete notes (which is another problem in its own).
You probably don't want to be picking peaks from a FFT to calculate the pitch. You probably want to use autocorrelation. I wrote up a long answer to a very similar question here: Cepstral Analysis for pitch detection
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With