I'm attempting to write a jpeg encoder and am stumbling at creating the algorithms that gather the appropriate Y, Cb, and Cr color components in order to pass to the method performing the transform.
As I understand it for the four most common subsampling variants are setup as follows (I could be way off here):
There most explicit description of the laout I have found so far is described here
What I don't understand is how to gather those components in the correct order to pass as an 8x8 block for transforming and quantizing.
Would someone be able to write an example, (pseudocode would be fine I'm sure, C# even better), of how to group the bytes for transform?
I'll include the current, incorrect, code I am running.
/// <summary>
/// Writes the Scan header structure
/// </summary>
/// <param name="image">The image to encode from.</param>
/// <param name="writer">The writer to write to the stream.</param>
private void WriteStartOfScan(ImageBase image, EndianBinaryWriter writer)
{
// Marker
writer.Write(new[] { JpegConstants.Markers.XFF, JpegConstants.Markers.SOS });
// Length (high byte, low byte), must be 6 + 2 * (number of components in scan)
writer.Write((short)0xc); // 12
byte[] sos = {
3, // Number of components in a scan, usually 1 or 3
1, // Component Id Y
0, // DC/AC Huffman table
2, // Component Id Cb
0x11, // DC/AC Huffman table
3, // Component Id Cr
0x11, // DC/AC Huffman table
0, // Ss - Start of spectral selection.
0x3f, // Se - End of spectral selection.
0 // Ah + Ah (Successive approximation bit position high + low)
};
writer.Write(sos);
// Compress and write the pixels
// Buffers for each Y'Cb Cr component
float[] yU = new float[64];
float[] cbU = new float[64];
float[] crU = new float[64];
// The descrete cosine values for each componant.
int[] dcValues = new int[3];
// TODO: Why null?
this.huffmanTable = new HuffmanTable(null);
// TODO: Color output is incorrect after this point.
// I think I've got my looping all wrong.
// For each row
for (int y = 0; y < image.Height; y += 8)
{
// For each column
for (int x = 0; x < image.Width; x += 8)
{
// Convert the 8x8 array to YCbCr
this.RgbToYcbCr(image, yU, cbU, crU, x, y);
// For each component
this.CompressPixels(yU, 0, writer, dcValues);
this.CompressPixels(cbU, 1, writer, dcValues);
this.CompressPixels(crU, 2, writer, dcValues);
}
}
this.huffmanTable.FlushBuffer(writer);
}
/// <summary>
/// Converts the pixel block from the RGBA colorspace to YCbCr.
/// </summary>
/// <param name="image"></param>
/// <param name="yComponant">The container to house the Y' luma componant within the block.</param>
/// <param name="cbComponant">The container to house the Cb chroma componant within the block.</param>
/// <param name="crComponant">The container to house the Cr chroma componant within the block.</param>
/// <param name="x">The x-position within the image.</param>
/// <param name="y">The y-position within the image.</param>
private void RgbToYcbCr(ImageBase image, float[] yComponant, float[] cbComponant, float[] crComponant, int x, int y)
{
int height = image.Height;
int width = image.Width;
for (int a = 0; a < 8; a++)
{
// Complete with the remaining right and bottom edge pixels.
int py = y + a;
if (py >= height)
{
py = height - 1;
}
for (int b = 0; b < 8; b++)
{
int px = x + b;
if (px >= width)
{
px = width - 1;
}
YCbCr color = image[px, py];
int index = a * 8 + b;
yComponant[index] = color.Y;
cbComponant[index] = color.Cb;
crComponant[index] = color.Cr;
}
}
}
/// <summary>
/// Compress and encodes the pixels.
/// </summary>
/// <param name="componantValues">The current color component values within the image block.</param>
/// <param name="componantIndex">The componant index.</param>
/// <param name="writer">The writer.</param>
/// <param name="dcValues">The descrete cosine values for each componant</param>
private void CompressPixels(float[] componantValues, int componantIndex, EndianBinaryWriter writer, int[] dcValues)
{
// TODO: This should be an option.
byte[] horizontalFactors = JpegConstants.ChromaFourTwoZeroHorizontal;
byte[] verticalFactors = JpegConstants.ChromaFourTwoZeroVertical;
byte[] quantizationTableNumber = { 0, 1, 1 };
int[] dcTableNumber = { 0, 1, 1 };
int[] acTableNumber = { 0, 1, 1 };
for (int y = 0; y < verticalFactors[componantIndex]; y++)
{
for (int x = 0; x < horizontalFactors[componantIndex]; x++)
{
// TODO: This can probably be combined reducing the array allocation.
float[] dct = this.fdct.FastFDCT(componantValues);
int[] quantizedDct = this.fdct.QuantizeBlock(dct, quantizationTableNumber[componantIndex]);
this.huffmanTable.HuffmanBlockEncoder(writer, quantizedDct, dcValues[componantIndex], dcTableNumber[componantIndex], acTableNumber[componantIndex]);
dcValues[componantIndex] = quantizedDct[0];
}
}
}
This code is part of an open source library I am writing on Github
JPEG color subsampling can be implemented in a simple, yet functional manner without much code. The basic idea is that your eyes are less sensitive to changes in color versus changes in luminance, so the JPEG file can be much smaller by throwing away some color information. There are many ways to subsample the color information, but JPEG images tend to use 4 variants: none, 1/2 horizontal, 1/2 vertical and 1/2 horizontal+vertical. There are additional TIFF/EXIF options such as the "center point" of the subsampled color, but for simplicity we'll use an average of the sum technique.
In the simplest case (no subsampling), each MCU (minimum coded unit) is an 8x8 block of pixels made up of 3 components - Y, Cb, Cr. The image is processed in 8x8 pixel blocks where the 3 color components are separated, passed through a DCT transform and written to the file in the order (Y, Cb, Cr). In all cases of subsampling, the DCT blocks are always composed of 8x8 coefficients or 64 values, but the meaning of those values varies due to the color subsampling.
The next simplest case is subsampled in one dimension (horizontal or vertical). Let's use 1/2 horizontal subsampling for this example. The MCU is now 16-pixels wide by 8 pixels tall. The compressed output of each MCU will now be 4 8x8 DCT blocks (Y0, Y1, Cb, Cr). Y0 represents the luma values of the left 8x8 pixel block and Y1 represents the luma values of the right 8x8 pixel block. The Cb and Cr values are each 8x8 blocks based on the average value of horizontal pairs of pixels. I couldn't find any good images to insert here, but some pseudo-code can come in handy.
(update: image that might represent subsampling:)
Here's a simple loop which does the color subsampling of our 1/2 horizontal case:
unsigned char ucCb[8][8], ucCr[8][8];
int x, y;
for (y=0; y<8; y++)
{
for (x=0; x<8; x++)
{
ucCb[y][x] = (srcCb[y][x*2] + srcCb[y][(x*2)+1] + 1)/2; // average each horiz pair
ucCr[y][x] = (srcCr[y][x*2] + srcCr[y][(x*2)+1] + 1)/2;
} // for x
} // for y
As you can see, there's not much to it. Each pair of Cb and Cr pixels from the source image is averaged horizontally to form a new Cb/Cr pixel. These are then DCT transformed, zigzagged and encoded in the same form as always.
Finally for the 2x2 subsample case, the MCU is now 16x16 pixels and the DCT blocks written will be Y0, Y1, Y2, Y3, Cb, Cr. Where Y0 represents the upper left 8x8 luma pixels, Y1 the upper right, Y2 the lower left and Y3 the lower right. The Cb and Cr values in this case represent 4 source pixels (2x2) that have been averaged together. Just in case you were wondering, the color values are averaged together in the YCbCr colorspace. If you average the pixels together in RGB colorspace, it won't work correctly.
FYI - Adobe supports JPEG images in the RGB colorspace (instead of YCbCr). These images can't use color subsampling because R, G and B are of equal importance and subsampling them in this colorspace would lead to much worse visual artifacts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With