I am looking to figure out how to separate the data in a WAV file into it's constituent notes. I load the WAV files with:
import scipy.io.wavfile as wavfile
rate, data = wavfile.read('scale.wav')
time = np.arange(len(data[:,0]))*1.0/rate
and plot with
plt.plot(time, data[:,0])
plt.show()
this gives me this picture, which is the piano scale having eight notes in it. I want a way to isolate each note, so that I can then find it's frequency and figure out which note is being played. Once I have the notes isolated, I can take care of the rest.
I have tried finding the maximums, but there are so many and takes multiple iterations to get it down to the maximums I was wanting, and it is an unreliable method since doing too many iterations gets rid of some of the lower amplitude peaks. Getting the length of the note in time as well would be nice.
EDIT: So this is quite complicated like you gentlemen stated. I am now thinking that I just want to find the "extreme" peaks and then find the extreme minimums that go after these peaks, and use that as my note, since we don't need too large a slice of the data to figure out it's frequency. The problem I have with that is, there are a lot of peaks, and it's hard to find only the ones I want. Any ideas?
Probably the easiest and most interesting thing to do is to calculate a spectrogram of your data, which is basically a plot of the spectra of short sections of your data, plotted against time. Do make the frequency scale logarithmic, since the frequencies of the keys on the piano are spaces exponentially. In Python, you could use the function specgram to calculate this, which is included with matplotlib. See for example this google image search for how this looks for different types of music. Also have a look at some computer programs that can play MP3/WAV and have visualization plugins, I remember that Winamp had a way of playing live spectrograms more than 10 years ago.
This is a fun exercise, but let me warn you if you want to use this technique to automatically transcribe the notes of some piece of music: this is a very hard problem, which has been studied by scientists for many years. One problem is for example that most instruments produce a lot of harmonics, which can confuse any algorithm to automatically find notes. And forget about any music with some human voices or percussion, since these produce a lot of wide-band noise (especially the letter 's' and hi-hats), making it almost impossible to recognize any other note.
If you want to get fancy, have a look at the Q-transform (see wikipedia and the papers referenced from there). You can consider this as a spectrogram, but with the bins along the frequency axis spaced logarithmically (e.g. bins for every half or quarter note on a piano scale). The advantage of this method over a standard spectrogram is that it has a constant number of frequency bins per note, while a linear frequency scale has few bins for the low notes, and too many for the high notes. I don't know if this is available for numpy, you might have to write the code yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With