Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deinterleaving PCM (*.wav) stereo audio data

I understand that PCM data is stored as [left][right][left][right].... Am trying to convert a stereo PCM to mono Vorbis (*.ogg) which I understand is achievable by halving the left and the right channels ((left+right)*0.5). I have actually achieved this by amending the encoder example in the libvorbis sdk like this,

#define READ 1024
signed char readbuffer[READ*4];

and the PCM data is read thus

fread(readbuffer, 1, READ*4, stdin)

I then halved the two channels,

buffer[0][i] = ((((readbuffer[i*4+1]<<8) | (0x00ff&(int)readbuffer[i*4]))/32768.f) + (((readbuffer[i*4+3]<<8) | (0x00ff&(int)readbuffer[i*4+2]))/32768.f)) * 0.5f;

It worked perfectly, but, I don't understand how they deinterleave the left and right channel from the PCM data (i.e. all the bit shifting and "ANDing" and "ORing").

like image 928
MOHW Avatar asked Jun 17 '14 13:06

MOHW


1 Answers

A .wav file typically stores its PCM data in little endian format, with 16 bits per sample per channel. For the usual signed 16-bit PCM file, this means that the data is physically stored as

[LEFT LSB] [LEFT MSB] [RIGHT LSB] [RIGHT MSB] ...

so that every group of 4 bytes makes up a single stereo PCM sample. Hence, you can find sample i by looking at bytes 4*i through 4*i+3, inclusive.

To decode a single 16-bit value from two bytes, you do this:

(MSB << 8) | LSB

Because your read buffer values are stored as signed chars, you have to be a bit careful because both MSB and LSB will be sign-extended. This is undesirable for the LSB; therefore, the code uses

0xff & (int)LSB

to obtain the unsigned version of the low byte (technically, this works by upcasting to an int, and selecting the low 8 bits; an alternate formulation would be to just write (uint8_t)LSB).

Note that the MSBs are at indices 1 and 3, and the LSBs are at indices 0 and 2. So,

((readbuffer[i*4+1]<<8) | (0x00ff&(int)readbuffer[i*4]))

and

((readbuffer[i*4+3]<<8) | (0x00ff&(int)readbuffer[i*4+2]))

are just obtaining the values of the left and right channels as 16-bit signed values by using some bit manipulation to assemble the bytes into numbers.

Then, each of these values is divided by 32768.0. Note that a signed 16-bit value has a range of [-32768, 32767]. Thus, dividing by 32768 gives a range of approximately [-1, 1]. The two divided values are added to give a number in the range [-2, 2], and then the whole thing is multiplied by 0.5 to obtain the average (a floating-point value in the range [-1, 1]).

like image 147
nneonneo Avatar answered Sep 17 '22 23:09

nneonneo