Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up Matrix Addition in C#

I'd like to optimize this piece of code :

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{            
        for (int x = 0; x < Width; x++)
        {
            for (int y = 0; y < Height; y++)
            {
                Byte  pixelValue = image.GetPixel(x, y).B;
                this.sumOfPixelValues[x, y] += pixelValue;
                this.sumOfPixelValuesSquared[x, y] += pixelValue * pixelValue;
            }
        }
}

This is to be used for image processing, and we're currently running this for about 200 images. We've optimized the GetPixel value to use unsafe code, and we're not using image.Width, or image.Height, as those properties were adding to our runtime costs.

However, we're still stuck at a low speed. The problem is that our images are 640x480, so the middle of the loop is being called about 640x480x200 times. I'd like to ask if there's a way to speed it up somehow, or convince me that it's fast enough as it is. Perhaps a way is through some fast Matrix Addition, or is Matrix Addition inherently an n^2 operation with no way to speed it up?

Perhaps doing array accesses via unsafe code would speed it up, but I'm not sure how to go about doing it, and whether it would be worth the time. Probably not. Thanks.

EDIT : Thank you for all your answers.

This is the GetPixel method we're using:

 public Color GetPixel(int x, int y)
    {
        int offsetFromOrigin = (y * this.stride) + (x * 3);
        unsafe
        {
            return Color.FromArgb(this.imagePtr[offsetFromOrigin + 2], this.imagePtr[offsetFromOrigin + 1], this.imagePtr[offsetFromOrigin]);
        }
    }
like image 654
Jean Azzopardi Avatar asked Dec 08 '09 16:12

Jean Azzopardi


7 Answers

Despite using unsafe code, GetPixel may well be the bottleneck here. Have you looked at ways of getting all the pixels in the image in one call rather than once per pixel? For instance, Bitmap.LockBits may be your friend...

On my netbook, a very simply loop iterating 640 * 480 * 200 times only take about 100 milliseconds - so if you're finding it's all going slowly, you should take another look at the bit inside the loop.

Another optimisation you might want to look at: avoid multi-dimensional arrays. They're significantly slower than single-dimensional arrays.

In particular, you can have a single-dimensional array of size Width * Height and just keep an index:

int index = 0;
for (int x = 0; x < Width; x++)
{
    for (int y = 0; y < Height; y++)
    {
        Byte pixelValue = image.GetPixel(x, y).B;
        this.sumOfPixelValues[index] += pixelValue;
        this.sumOfPixelValuesSquared[index] += pixelValue * pixelValue;
        index++;
    }
}

Using the same simple test harness, adding a write to a 2-D rectangular array took the total time of looping over 200 * 640 * 480 up to around 850ms; using a 1-D rectangular array took it back down to around 340ms - so it's somewhat significant, and currently you've got two of those per loop iteration.

like image 171
Jon Skeet Avatar answered Oct 02 '22 14:10

Jon Skeet


Read this article which also has some code and mentions about the slowness of GetPixel.

link text

From the article this is code to simply invert bits. This shows you the usage of LockBits as well.

It is important to note that unsafe code will not allow you to run your code remotely.

public static bool Invert(Bitmap b)
{

BitmapData bmData = b.LockBits(new Rectangle(0, 0, b.Width, b.Height), 
                               ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb); 

int stride = bmData.Stride; 
System.IntPtr Scan0 = bmData.Scan0; 
unsafe 
{ 
    byte * p = (byte *)(void *)Scan0;
    int nOffset = stride - b.Width*3; 
    int nWidth = b.Width * 3;
    for(int y=0;y < b.Height;++y)
    {
        for(int x=0; x < nWidth; ++x )
        {
            p[0] = (byte)(255-p[0]);
            ++p;
        }
        p += nOffset;
    }
}

b.UnlockBits(bmData);

return true;

}

like image 20
anirudhgarg Avatar answered Oct 02 '22 12:10

anirudhgarg


I recommend that you profile this code and find out what's taking the most time.

You may find that it's the subscripting operation, in which case you might want to change your data structures from:

long sumOfPixelValues[n,m];
long sumOfPixelValuesSquared[n,m];

to

struct Sums
{
    long sumOfPixelValues;
    long sumOfPixelValuesSquared;
}

Sums sums[n,m];

This would depend on what you find once you profile the code.

like image 37
John Saunders Avatar answered Oct 02 '22 13:10

John Saunders


Code profiling is the best place to start.

Matrix addition is a highly parallel operation and can be speed up by parallelizing the operation w/ multiple threads.

I would recommend using Intels IPP library that contains threaded highly optimized API for this sort of operation. Perhaps surprisingly it's only about $100 - but would add significant complexity to your project.

If you don't want to trouble yourself with mixed language programming and IPP, you could try out centerspace's C# math libraries. The NMath API contains easy to used, forward scaling, matrix operations.

Paul

like image 41
Paul Avatar answered Oct 02 '22 12:10

Paul


System.Drawing.Color is a structure, which on current versions of .NET kills most optimizations. Since you're only interested in the blue component anyway, use a method that only gets the data you need.

public byte GetPixelBlue(int x, int y)
{
    int offsetFromOrigin = (y * this.stride) + (x * 3);
    unsafe
    {
        return this.imagePtr[offsetFromOrigin];
    }
}

Now, exchange the order of iteration of x and y:

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{            
    for (int y = 0; y < Height; y++)
    {
        for (int x = 0; x < Width; x++)
        {
            Byte  pixelValue = image.GetPixelBlue(x, y);
            this.sumOfPixelValues[y, x] += pixelValue;
            this.sumOfPixelValuesSquared[y, x] += pixelValue * pixelValue;
        }
    }
}

Now you're accessing all values within a scan line sequentially, which will make much better use of CPU cache for all three matrices involved (image.imagePtr, sumOfPixelValues, and sumOfPixelValuesSquared. [Thanks to Jon for noticing that when I fixed access to image.imagePtr, I broke the other two. Now the output array indexing is swapped to keep it optimal.]

Next, get rid of the member references. Another thread could theoretically be setting sumOfPixelValues to another array midway through, which does horrible horrible things to optimizations.

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{          
    uint [,] sums = this.sumOfPixelValues;
    ulong [,] squares = this.sumOfPixelValuesSquared;
    for (int y = 0; y < Height; y++)
    {
        for (int x = 0; x < Width; x++)
        {
            Byte  pixelValue = image.GetPixelBlue(x, y);
            sums[y, x] += pixelValue;
            squares[y, x] += pixelValue * pixelValue;
        }
    }
}

Now the compiler can generate optimal code for moving through the two output arrays, and after inlining and optimization, the inner loop can step through the image.imagePtr array with a stride of 3 instead of recalculating the offset all the time. Now an unsafe version for good measure, doing the optimizations that I think .NET ought to be smart enough to do but probably isn't:

unsafe public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{          
    byte* scanline = image.imagePtr;
    fixed (uint* sums = &this.sumOfPixelValues[0,0])
    fixed (uint* squared = &this.sumOfPixelValuesSquared[0,0])
    for (int y = 0; y < Height; y++)
    {
        byte* blue = scanline;
        for (int x = 0; x < Width; x++)
        {
            byte pixelValue = *blue;
            *sums += pixelValue;
            *squares += pixelValue * pixelValue;
            blue += 3;
            sums++;
            squares++;
        }
        scanline += image.stride;
    }
}
like image 42
Ben Voigt Avatar answered Oct 02 '22 14:10

Ben Voigt


Where are images stored? If each is on disk, then a bit of your processing time issue may be in fetching them from the disk. You might examine this to see if it is an issue, and if so, then rewrite to pre-fetch the image data so that the array procesing code does not have to wait for the data...

If the overall application logic will allow it (Is each matrix addition independant, or dependant on output of a previous matrix addition?) If they are independant, I'd examine executing them all on separate threads, or in parallel..

like image 34
Charles Bretana Avatar answered Oct 02 '22 12:10

Charles Bretana


The only possible way I can think of to speed it up would be to try do some of the additions in parallel, which with your size might be beneficial over the threading overhead.

like image 30
Yuriy Faktorovich Avatar answered Oct 02 '22 13:10

Yuriy Faktorovich