I have used several audio programs such as SDL mixer, audacity, etc. but I want to see what's inside these little audio toys. How does audio data get processed and such. I've also seen some sample codes of an MP3 player in C++ that it uses void*
for audio data.
But all these do not help me understand in general about how audio work in computer. So could somebody explain to me (or introduce me some books) on how do computers store and process digital audio data? (for instance, if you store a triangle waveform into a .wav file, how does this waveform get stored as bit pattern?)
How Waveforms are represented
There is a more detailed explanation of how audio is represented in the Audacity manual:
...the height of each vertical line is represented as a signed number.
More about Digital Audio
You may notice that all these links come from the Audacity project. That's not a coincidence.
Digital audio is stored as a sequence of numbers, called samples. Example:
5, 18, 6, -4, -12, -3, 7, 14, 4
If you plot these numbers as points on a Cartesian graph: the sample value determines the position along the Y axis, and the sample's sequence number (0, 1, 2, 3, etc) determines the position along the X axis. The X axis is just a monotonically increasing number line.
Now trace a line through the points you've just plotted.
Congratulations, you have just rendered the waveform of your digital audio. :-)
The Y axis is amplitude and the X axis is time.
"Sample rate" determines how quickly the playback device (e.g. soundcard) advances through the samples. This is the "time value" of a sample. For example CD quality digital audio traverses 44,100 samples every second, reading the amplitude (Y axis value) at every sample point.
† The discussion above ignores compression. Compression changes little about the essential nature of digital audio. Much like zipping up a bitmap image doesn't change the core nature of a bitmap image. (The topic of audio compression is a rich one - I don't mean to oversimplify it, it's just that all compressed audio is eventually uncompressed before it is rendered -- that is, played as audible sound or drawn as a waveform -- at which point its compressed origins are of little consequence.)
Taking your WAV file example:
A WAV file will have a header, which specifies key information to a player or audio processor about the number of channels, sample rate, bit depth, length of data etc. After the header comes the raw bit pattern, which stores the audio samples (I'm assuming you know what sampling is - if not, see Wikipedia). Each sample is made up of a number of bytes (specified in the header) and specifies the amplitude of the waveform at any given point in time. Each sample could be stored in signed or unsigned form (also specified in the header).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With