Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graphing the pitch (frequency) of a sound

I want to plot the pitch of a sound into a graph.

Currently I can plot the amplitude. The graph below is created by the data returned by getUnscaledAmplitude():

alt text

AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream(file))); byte[] bytes = new byte[(int) (audioInputStream.getFrameLength()) * (audioInputStream.getFormat().getFrameSize())]; audioInputStream.read(bytes);  // Get amplitude values for each audio channel in an array. graphData = type.getUnscaledAmplitude(bytes, 1);   public int[][] getUnscaledAmplitude(byte[] eightBitByteArray, int nbChannels) {     int[][] toReturn = new int[nbChannels][eightBitByteArray.length / (2 * nbChannels)];     int index = 0;      for (int audioByte = 0; audioByte < eightBitByteArray.length;)     {         for (int channel = 0; channel < nbChannels; channel++)         {             // Do the byte to sample conversion.             int low = (int) eightBitByteArray[audioByte];             audioByte++;             int high = (int) eightBitByteArray[audioByte];             audioByte++;             int sample = (high << 8) + (low & 0x00ff);              toReturn[channel][index] = sample;         }         index++;     }      return toReturn; } 

But I need to show the audio's pitch, not amplitude. Fast Fourier transform appears to get the pitch, but it needs to know more variables than the raw bytes I have, and is very complex and mathematical.

Is there a way I can do this?

like image 582
Amy B Avatar asked Jan 16 '11 22:01

Amy B


People also ask

How do you find the frequency of a pitch?

To convert from any frequency to pitch (i.e. to the nearest note and how far it is out of tune, go to the frequency to note converter written by Andrew Botros. no = log2(f2/f1). fn = 2n/12*440 Hz. n = 12*log2(fn/440 Hz).

What is the graph that shows sound?

acoustic phonetic studies is the sound spectrograph. This device gives a visible record of any kind of sound. In a spectrographic analysis of the phrase speech pictures, time of occurrence of each item is given on the horizontal scale. The vertical scale shows the frequency components at each moment in…

How do we determine the pitch of a sound?

The pitch of sound is determined by the frequency of vibration of the sound waves that produce them. A high frequency (e.g., 880 Hz) is seen as a high pitch, while a low frequency (e.g., 55 Hz) is regarded as a low pitch.


2 Answers

Frequency (an objective metric) is not the same as pitch (a subjective quantity). In general, pitch detection is a very tricky problem.

Assuming you just want to graph the frequency response for now, you have little choice but to use the FFT, as it is THE method to obtain the frequency response of time-domain data. (Well, there are other methods, such as the discrete cosine transform, but they're just as tricky to implement, and more tricky to interpret).

If you're struggling with the implementation of the FFT, note that it's really just an efficient algorithm for calculating the discrete Fourier transform (DFT); see http://en.wikipedia.org/wiki/Discrete_Fourier_transform. The basic DFT algorithm is much easier (just two nested loops), but runs a lot slower (O(N^2) rather than O(N log N)).

If you wish to do anything more complex than simply plotting frequency content (like pitch detection, or windowing (as others have suggested)), I'm afraid you are going to have learn what the maths means.

like image 81
Oliver Charlesworth Avatar answered Sep 18 '22 23:09

Oliver Charlesworth


Fast Fourier Transform doesn't need to know more then the input bytes you have. Don't be scared off by the Wikipedia article. An FFT algorithm will take your input signal (with the common FFT algorithms the number of samples is required to be a power of 2, e.g. 256, 512, 1024) and return a vector of complex numbers with the same size. Because your input is real, not complex, (imaginary portion set to zero) the returned vector will be symmetric. Only half of it will contain data. Since you do not care about the phase you can simply take the magnitude of the complex numbers, which is sqrt(a^2+b^2). Just taking the absoulte value of a complex number may also work, in some languages this is equivalent to the previous expression.

There are Java implementations of FFT available, e.g.: http://www.cs.princeton.edu/introcs/97data/FFT.java.html

Pseudo code will look something like:

Complex in[1024]; Complex out[1024]; Copy your signal into in FFT(in, out) for every member of out compute sqrt(a^2+b^2) To find frequency with highest power scan for the maximum value in the first 512 points in out 

The output will contain entires for frequencies between zero and half your sampling frequency.

Since FFT assumes a repeating signal you may want to apply a window to your input signal. But don't worry about this at first.

You can find more information on the web, e.g.: FFT for beginners

Also as Oli notes when multiple frequencies are present the perceived pitch is a more complex phenomenon.

like image 26
Guy Sirton Avatar answered Sep 22 '22 23:09

Guy Sirton