LockBits appears to be too slow for my needs - alternatives?

Question

I'm working on 10 megapixel images taken by a video camera.

The aim is to register in a matrix (a two-dimensional array) the grayscale values for each pixel.

I first used GetPixel but it took 25 seconds to do it. Now I use Lockbits but it sill takes 10 seconds, and 3 if I don't save the results in a text file.

My tutor said they don't need to register the results but 3 seconds is still too slow. So am I doing something wrong in my program or is there something faster than Lockbits for my application?

Here is my code:

public void ExtractMatrix()
{
    Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp");

    int[,] GRAY = new int[3840, 2748]; //Matrix with "grayscales" in INTeger values

    unsafe
    {
        //create an empty bitmap the same size as original
        Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height);

        //lock the original bitmap in memory
        BitmapData originalData = bmpPicture.LockBits(
           new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
           ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);

        //lock the new bitmap in memory
        BitmapData newData = bmp.LockBits(
           new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
           ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);

        //set the number of bytes per pixel
        // here is set to 3 because I use an Image with 24bpp
        int pixelSize = 3;

        for (int y = 0; y < bmpPicture.Height; y++)
        {
            //get the data from the original image
            byte* oRow = (byte*)originalData.Scan0 + (y * originalData.Stride);

            //get the data from the new image
            byte* nRow = (byte*)newData.Scan0 + (y * newData.Stride);

            for (int x = 0; x < bmpPicture.Width; x++)
            {
                //create the grayscale version
                byte grayScale =
                   (byte)((oRow[x * pixelSize] * .114) + //B
                   (oRow[x * pixelSize + 1] * .587) +  //G
                   (oRow[x * pixelSize + 2] * .299)); //R

                //set the new image's pixel to the grayscale version
                //   nRow[x * pixelSize] = grayScale; //B
                //   nRow[x * pixelSize + 1] = grayScale; //G
                //   nRow[x * pixelSize + 2] = grayScale; //R

                GRAY[x, y] = (int)grayScale;
            }
        }

Jesse · Accepted Answer

Here are some more optimizations that may help:

Use jagged arrays ([][]); in .NET, accessing them is faster than multidimensional;
Cache properties that will be used inside of a loop. Though this answer states that JIT will optimize it, we don't know what's happening internally;
Multiplication is (generally) slower than addition;

As others have stated, float is faster than double, which applies to older processors (~10+ years). The only upside here is that you're using them as constants, and thus consume less memory (especially because of the many iterations);

Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp");

// jagged instead of multidimensional 
int[][] GRAY = new int[3840][]; //Matrix with "grayscales" in INTeger values
for (int i = 0, icnt = GRAY.Length; i < icnt; i++)
    GRAY[i] = new int[2748];

unsafe
{
    //create an empty bitmap the same size as original
    Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height);

    //lock the original bitmap in memory
    BitmapData originalData = bmpPicture.LockBits(
       new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
       ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);

    //lock the new bitmap in memory
    BitmapData newData = bmp.LockBits(
       new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
       ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);

    //set the number of bytes per pixel
    // here is set to 3 because I use an Image with 24bpp
    const int pixelSize = 3; // const because it doesn't change
    // store Scan0 value for reuse...we don't know if BitmapData caches it internally, or recalculated it every time, or whatnot
    int originalScan0 = originalData.Scan0;
    int newScan0 = newData.Scan0;
    // incrementing variables
    int originalStride = originalData.Stride;
    int newStride = newData.Stride;
    // store certain properties, because accessing a variable is normally faster than a property (and we don't really know if the property recalculated anything internally)
    int bmpwidth = bmpPicture.Width;
    int bmpheight = bmpPicture.Height;

    for (int y = 0; y < bmpheight; y++)
    {
        //get the data from the original image
        byte* oRow = (byte*)originalScan0 + originalStride++; // by doing Variable++, you're saying "give me the value, then increment one" (Tip: DON'T add parenthesis around it!)

        //get the data from the new image
        byte* nRow = (byte*)newScan0 + newStride++;

        int pixelPosition = 0;
        for (int x = 0; x < bmpwidth; x++)
        {
            //create the grayscale version
            byte grayScale =
               (byte)((oRow[pixelPosition] * .114f) + //B
               (oRow[pixelPosition + 1] * .587f) +  //G
               (oRow[pixelPosition + 2] * .299f)); //R

            //set the new image's pixel to the grayscale version
            //   nRow[pixelPosition] = grayScale; //B
            //   nRow[pixelPosition + 1] = grayScale; //G
            //   nRow[pixelPosition + 2] = grayScale; //R

            GRAY[x][y] = (int)grayScale;

            pixelPosition += pixelSize;
        }
    }

Daniel · Answer

Your code is converting from a row-major representation into a column-major representation. In the bitmap, pixel (x,y) is followed by (x+1,y) in memory; but in your GRAY array, pixel (x,y) is followed by (x,y+1).

This causes inefficient memory access when writing, as every write touches a different cache line; and you end up trashing the CPU cache if the image is big enough. This is especially bad if your image size is a power of two (see Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?).

Store your array in row-major order as well if possible to avoid the inefficient memory access (replace GRAY[x,y] with GRAY[y,x]).

If you really need it in column-major order, look at more cache-friendly algorithms for matrix transposition (e.g. A Cache Efficient Matrix Transpose Program?)

Eamon Nerbonne · Answer

Your code may not be optimal, but a quick skim seems to show even this version should run in a fraction of a second. This suggests there's some other problem:

Are you:

Compiling in Release mode? Debug mode turns off various optimizations
Running with a debugger attached? If you run from visual studio using F5 then (with the default C# keyshortcuts) the debugger will be attached. This can dramatically slow down your program, particularly if you have any breakpoints or intellitrace enabled.
Running on some limited device? It sounds like you're running on a PC, but if you're not, then device-specific limitations might be relevant.
I/O limited? Although you talk about a video camera, your code suggests you're dealing with the filesystem. Any file-system interaction can be a bottleneck, particularly once networked disks, virus scanners, physical platters and fragmentation come into play. A 10 mp image is 30MB (if uncompressed RGB without an alpha channel), and reading/writing that could easily take 3 seconds depending on the details of the filesystem.

LockBits appears to be too slow for my needs - alternatives?

Tags:

c#

lockbits

Elo Monval

3 Answers

Jesse

Daniel

Eamon Nerbonne

Recent Activity

Donate For Us

LockBits appears to be too slow for my needs - alternatives?

Tags:

c#

lockbits

Elo Monval

3 Answers

Jesse

Daniel

Eamon Nerbonne

Related questions

Recent Activity

Donate For Us