Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sound spectrogram

I have made an app that paints FFT to the screen realtime (from mic). Time on x-axis, frequency on y-axis and the color of the pixel represents the amplitude (pretty much a vanilla FFT spectrogram).

My problem is that even though I can see a pattern from the music there is also a lot of noise. Googling it I see people applying a logarithmic calculation to the amplitude. Should I be doing this? And if so, what would the formula look like? (I'm using C#, but I can translate the math into code so any sample is ok.)

I can bypass this problem by applying a color scheme showing lower values as darker colors. I'm just not sure if the audio is correctly represented without a logarithmic calculation on it.

like image 839
Tedd Hansen Avatar asked Dec 21 '22 15:12

Tedd Hansen


1 Answers

Representation of the amplitude on a logarithmic scale approximates the sensitivity of the human auditory system, and therefore gives you a better representation of what you hear, as compared to a non-logarithmic scale. Mathematically, all you have to do is:

Alog = 20*log10 (abs (A))

Where A is the amplitude of the FFT data, and Alog is the output. the factor of 20 is just a convention and has no effect on the image, which you probably scale anyway to a color-scheme.

EDIT

Explanation regarding the 20 factor: The dB (decibel) unit is a logarithmic unit measuring ratios: it represents a scale on which the distance between 100 and 10, is the same as between 1000 and 100 (since they have the same ratio: 1000/100 = 100/10). If you measure it in dB you get:

10*log10 (1000/100) = 10*log10 (100/10) = 10

The factor of 10 is because deci means tenth, which means 1 Bel is 10 deciBels, (like 1 kilogram is 1000 grams)

Since the human auditory system is also (approximately) measuring ratios, it makes sense to measure sound level on a logarithmic scale, i.e measure the ratio of sound level to some reference value. Since the level of a sound is associated with the power (in Watts) of the sound wave, you actually measure the ratio of powers P/Pref. Also, the power is proportional to the amplitude squared, so all in all you get:

10*log10 (P/Pref) = 10*log10 (A^2 / Aref^2) = 20*log10 (A/Aref)

by the log rules. That's the origin of the 20 factor - remember that in the computer the audio is represented by the instantaneous amplitude of the sound wave.

like image 152
Itamar Katz Avatar answered Dec 28 '22 07:12

Itamar Katz