I am trying to extract pitch features from an audio file which I would use for a classification problem. I am using python(scipy/numpy) for classification.
I think I can get frequency features using scipy.fft
but I don't know how to approximate musical notes using frequencies. I researched a bit and found that I need to get chroma features which map frequencies to 12
bins for notes of a chromatic scale.
I think there's a chroma toolbox for matlab but I don't think there's anything similiar for python.
How should I go forward with this? Could anyone also suggest reading material I should look into?
Audio feature extraction is a necessary step in audio signal processing, which is a subfield of signal processing. It deals with the processing or manipulation of audio signals. It removes unwanted noise and balances the time-frequency ranges by converting digital and analog signals.
Call extract to extract the audio features from the audio signal. features = extract(aFE,audioIn); Use info to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.
If you have a . wav file you can calculate the frequency directly. If the file is a single pure tone simply count n the number of samples between successive 0 crossings and divide 44.1/n to get the frequency in kHz. If the file is a mix of tones then you will need to do a Fourier transform.
You can map frequencies to musical notes:
with being the midi note number to be calculated, the frequency and the chamber pitch (in modern music 440.0 Hz is common).
As you may know a single frequency doesn't make a musical pitch. "Pitch" arises from the sensation of the fundamental of harmonic sounds, i.e. sounds that mainly consist of integer multiples of one single frequency (= the fundamental).
If you want to have Chroma Features in Python, you can use the Bregman Audio-Visual Information Toolbox. Note that chroma features don't give you information about the octave of a pitch, so you just get information about the pitch class.
from bregman.suite import Chromagram
audio_file = "mono_file.wav"
F = Chromagram(audio_file, nfft=16384, wfft=8192, nhop=2205)
F.X # all chroma features
F.X[:,0] # one feature
The general problem of extracting pitch information from audio is called pitch detection.
You can try reading the literature on pitch detection, which is quite extensive. Generally autocorrelation-based methods seem to work pretty well; frequency-domain or zero-crossing methods are less robust (so FFT doesn't really help much). A good starting point may be to implement one of these two algorithms:
YAAPT, from: Stephen A. Zahorian and Hongbing Hu, "A spectral-temporal method for robust fundamental frequency tracking", J. Acoust. Soc. Am. 123, 4559 (2008). http://bingweb.binghamton.edu/~hhu1/paper/Zahorian2008spectral.pdf and MATLAB code here: http://ws2.binghamton.edu/zahorian/yaapt.htm
YIN, from: De Cheveigné, A., Kawahara, H. "YIN, a fundamental frequency estimator for speech and music", J. Acoust. Soc. Am. 111, 1917-1930 (2002). http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf
As far as off-the-shelf solutions, check out Aubio, C code with python wrapper, several pitch-extraction algorithms available including YIN and multiple-comb.
If you're willing to use 3rd party libraries (at least as a reference for how other people accomplished this):
Extracting musical information from sound, a presentation from PyCon 2012, shows how to use the AudioNest Python API:
Here is the relevant EchoNest documentation:
Relevant excerpt:
pitch content is given by a “chroma” vector, corresponding to the 12 pitch classes C, C#, D to B, with values ranging from 0 to 1 that describe the relative dominance of every pitch in the chromatic scale. For example a C Major chord would likely be represented by large values of C, E and G (i.e. classes 0, 4, and 7). Vectors are normalized to 1 by their strongest dimension, therefore noisy sounds are likely represented by values that are all close to 1, while pure tones are described by one value at 1 (the pitch) and others near 0.
EchoNest does the analysis on their servers. They provide free API keys for non-commercial use.
If EchoNest is not an option, I would look at the open-source aubio project. It has python bindings, and you can examine the source to see how they accomplished pitch detection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With