Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chroma subsampling algorithm for jpeg

Tags:

c#

algorithm

jpeg

I'm attempting to write a jpeg encoder and am stumbling at creating the algorithms that gather the appropriate Y, Cb, and Cr color components in order to pass to the method performing the transform.

As I understand it for the four most common subsampling variants are setup as follows (I could be way off here):

  • 4:4:4 - An MCU block of 8x8 pixels with Y, Cb, and Cr represented in each pixel.
  • 4:2:2 - An MCU block of 16x8 pixels with Y in each pixel and Cb, Cr every two pixels
  • 4:2:0 - An MCU block of 16x16 pixels with Y every two pixels and Cb, Cr every four

There most explicit description of the laout I have found so far is described here

What I don't understand is how to gather those components in the correct order to pass as an 8x8 block for transforming and quantizing.

Would someone be able to write an example, (pseudocode would be fine I'm sure, C# even better), of how to group the bytes for transform?

I'll include the current, incorrect, code I am running.

/// <summary>
/// Writes the Scan header structure
/// </summary>
/// <param name="image">The image to encode from.</param>
/// <param name="writer">The writer to write to the stream.</param>
private void WriteStartOfScan(ImageBase image, EndianBinaryWriter writer)
{
    // Marker
    writer.Write(new[] { JpegConstants.Markers.XFF, JpegConstants.Markers.SOS });

    // Length (high byte, low byte), must be 6 + 2 * (number of components in scan)
    writer.Write((short)0xc); // 12

    byte[] sos = {
        3, // Number of components in a scan, usually 1 or 3
        1, // Component Id Y
        0, // DC/AC Huffman table 
        2, // Component Id Cb
        0x11, // DC/AC Huffman table 
        3, // Component Id Cr
        0x11, // DC/AC Huffman table 
        0, // Ss - Start of spectral selection.
        0x3f, // Se - End of spectral selection.
        0 // Ah + Ah (Successive approximation bit position high + low)
    };

    writer.Write(sos);

    // Compress and write the pixels
    // Buffers for each Y'Cb Cr component
    float[] yU = new float[64];
    float[] cbU = new float[64];
    float[] crU = new float[64];

    // The descrete cosine values for each componant.
    int[] dcValues = new int[3];

    // TODO: Why null?
    this.huffmanTable = new HuffmanTable(null);

    // TODO: Color output is incorrect after this point. 
    // I think I've got my looping all wrong.
    // For each row
    for (int y = 0; y < image.Height; y += 8)
    {
        // For each column
        for (int x = 0; x < image.Width; x += 8)
        {
            // Convert the 8x8 array to YCbCr
            this.RgbToYcbCr(image, yU, cbU, crU, x, y);

            // For each component
            this.CompressPixels(yU, 0, writer, dcValues);
            this.CompressPixels(cbU, 1, writer, dcValues);
            this.CompressPixels(crU, 2, writer, dcValues);
        }
    }

    this.huffmanTable.FlushBuffer(writer);
}

/// <summary>
/// Converts the pixel block from the RGBA colorspace to YCbCr.
/// </summary>
/// <param name="image"></param>
/// <param name="yComponant">The container to house the Y' luma componant within the block.</param>
/// <param name="cbComponant">The container to house the Cb chroma componant within the block.</param>
/// <param name="crComponant">The container to house the Cr chroma componant within the block.</param>
/// <param name="x">The x-position within the image.</param>
/// <param name="y">The y-position within the image.</param>
private void RgbToYcbCr(ImageBase image, float[] yComponant, float[] cbComponant, float[] crComponant, int x, int y)
{
    int height = image.Height;
    int width = image.Width;

    for (int a = 0; a < 8; a++)
    {
        // Complete with the remaining right and bottom edge pixels.
        int py = y + a;
        if (py >= height)
        {
            py = height - 1;
        }

        for (int b = 0; b < 8; b++)
        {
            int px = x + b;
            if (px >= width)
            {
                px = width - 1;
            }

            YCbCr color = image[px, py];
            int index = a * 8 + b;
            yComponant[index] = color.Y;
            cbComponant[index] = color.Cb;
            crComponant[index] = color.Cr;
        }
    }
}

/// <summary>
/// Compress and encodes the pixels. 
/// </summary>
/// <param name="componantValues">The current color component values within the image block.</param>
/// <param name="componantIndex">The componant index.</param>
/// <param name="writer">The writer.</param>
/// <param name="dcValues">The descrete cosine values for each componant</param>
private void CompressPixels(float[] componantValues, int componantIndex, EndianBinaryWriter writer, int[] dcValues)
{
    // TODO: This should be an option.
    byte[] horizontalFactors = JpegConstants.ChromaFourTwoZeroHorizontal;
    byte[] verticalFactors = JpegConstants.ChromaFourTwoZeroVertical;
    byte[] quantizationTableNumber = { 0, 1, 1 };
    int[] dcTableNumber = { 0, 1, 1 };
    int[] acTableNumber = { 0, 1, 1 };

    for (int y = 0; y < verticalFactors[componantIndex]; y++)
    {
        for (int x = 0; x < horizontalFactors[componantIndex]; x++)
        {
            // TODO: This can probably be combined reducing the array allocation.
            float[] dct = this.fdct.FastFDCT(componantValues);
            int[] quantizedDct = this.fdct.QuantizeBlock(dct, quantizationTableNumber[componantIndex]);
            this.huffmanTable.HuffmanBlockEncoder(writer, quantizedDct, dcValues[componantIndex], dcTableNumber[componantIndex], acTableNumber[componantIndex]);
            dcValues[componantIndex] = quantizedDct[0];
        }
    }
}

This code is part of an open source library I am writing on Github

like image 572
James South Avatar asked Sep 25 '22 08:09

James South


1 Answers

JPEG color subsampling can be implemented in a simple, yet functional manner without much code. The basic idea is that your eyes are less sensitive to changes in color versus changes in luminance, so the JPEG file can be much smaller by throwing away some color information. There are many ways to subsample the color information, but JPEG images tend to use 4 variants: none, 1/2 horizontal, 1/2 vertical and 1/2 horizontal+vertical. There are additional TIFF/EXIF options such as the "center point" of the subsampled color, but for simplicity we'll use an average of the sum technique.

In the simplest case (no subsampling), each MCU (minimum coded unit) is an 8x8 block of pixels made up of 3 components - Y, Cb, Cr. The image is processed in 8x8 pixel blocks where the 3 color components are separated, passed through a DCT transform and written to the file in the order (Y, Cb, Cr). In all cases of subsampling, the DCT blocks are always composed of 8x8 coefficients or 64 values, but the meaning of those values varies due to the color subsampling.

The next simplest case is subsampled in one dimension (horizontal or vertical). Let's use 1/2 horizontal subsampling for this example. The MCU is now 16-pixels wide by 8 pixels tall. The compressed output of each MCU will now be 4 8x8 DCT blocks (Y0, Y1, Cb, Cr). Y0 represents the luma values of the left 8x8 pixel block and Y1 represents the luma values of the right 8x8 pixel block. The Cb and Cr values are each 8x8 blocks based on the average value of horizontal pairs of pixels. I couldn't find any good images to insert here, but some pseudo-code can come in handy.

(update: image that might represent subsampling:) enter image description here

Here's a simple loop which does the color subsampling of our 1/2 horizontal case:

unsigned char ucCb[8][8], ucCr[8][8];
int x, y;

for (y=0; y<8; y++)
{
   for (x=0; x<8; x++)
   {
      ucCb[y][x] = (srcCb[y][x*2] + srcCb[y][(x*2)+1] + 1)/2; // average each horiz pair
      ucCr[y][x] = (srcCr[y][x*2] + srcCr[y][(x*2)+1] + 1)/2;
   } // for x
} // for y

As you can see, there's not much to it. Each pair of Cb and Cr pixels from the source image is averaged horizontally to form a new Cb/Cr pixel. These are then DCT transformed, zigzagged and encoded in the same form as always.

Finally for the 2x2 subsample case, the MCU is now 16x16 pixels and the DCT blocks written will be Y0, Y1, Y2, Y3, Cb, Cr. Where Y0 represents the upper left 8x8 luma pixels, Y1 the upper right, Y2 the lower left and Y3 the lower right. The Cb and Cr values in this case represent 4 source pixels (2x2) that have been averaged together. Just in case you were wondering, the color values are averaged together in the YCbCr colorspace. If you average the pixels together in RGB colorspace, it won't work correctly.

FYI - Adobe supports JPEG images in the RGB colorspace (instead of YCbCr). These images can't use color subsampling because R, G and B are of equal importance and subsampling them in this colorspace would lead to much worse visual artifacts.

like image 76
BitBank Avatar answered Oct 18 '22 06:10

BitBank