I'm working on 10 megapixel images taken by a video camera.
The aim is to register in a matrix (a two-dimensional array) the grayscale values for each pixel.
I first used GetPixel but it took 25 seconds to do it. Now I use Lockbits but it sill takes 10 seconds, and 3 if I don't save the results in a text file.
My tutor said they don't need to register the results but 3 seconds is still too slow. So am I doing something wrong in my program or is there something faster than Lockbits for my application?
Here is my code:
public void ExtractMatrix()
{
Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp");
int[,] GRAY = new int[3840, 2748]; //Matrix with "grayscales" in INTeger values
unsafe
{
//create an empty bitmap the same size as original
Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height);
//lock the original bitmap in memory
BitmapData originalData = bmpPicture.LockBits(
new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);
//lock the new bitmap in memory
BitmapData newData = bmp.LockBits(
new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);
//set the number of bytes per pixel
// here is set to 3 because I use an Image with 24bpp
int pixelSize = 3;
for (int y = 0; y < bmpPicture.Height; y++)
{
//get the data from the original image
byte* oRow = (byte*)originalData.Scan0 + (y * originalData.Stride);
//get the data from the new image
byte* nRow = (byte*)newData.Scan0 + (y * newData.Stride);
for (int x = 0; x < bmpPicture.Width; x++)
{
//create the grayscale version
byte grayScale =
(byte)((oRow[x * pixelSize] * .114) + //B
(oRow[x * pixelSize + 1] * .587) + //G
(oRow[x * pixelSize + 2] * .299)); //R
//set the new image's pixel to the grayscale version
// nRow[x * pixelSize] = grayScale; //B
// nRow[x * pixelSize + 1] = grayScale; //G
// nRow[x * pixelSize + 2] = grayScale; //R
GRAY[x, y] = (int)grayScale;
}
}
Here are some more optimizations that may help:
Use jagged arrays ([][]
); in .NET, accessing them is faster than multidimensional;
Cache properties that will be used inside of a loop. Though this answer states that JIT will optimize it, we don't know what's happening internally;
Multiplication is (generally) slower than addition;
As others have stated, float
is faster than double
, which applies to older processors (~10+ years). The only upside here is that you're using them as constants, and thus consume less memory (especially because of the many iterations);
Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp");
// jagged instead of multidimensional
int[][] GRAY = new int[3840][]; //Matrix with "grayscales" in INTeger values
for (int i = 0, icnt = GRAY.Length; i < icnt; i++)
GRAY[i] = new int[2748];
unsafe
{
//create an empty bitmap the same size as original
Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height);
//lock the original bitmap in memory
BitmapData originalData = bmpPicture.LockBits(
new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);
//lock the new bitmap in memory
BitmapData newData = bmp.LockBits(
new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);
//set the number of bytes per pixel
// here is set to 3 because I use an Image with 24bpp
const int pixelSize = 3; // const because it doesn't change
// store Scan0 value for reuse...we don't know if BitmapData caches it internally, or recalculated it every time, or whatnot
int originalScan0 = originalData.Scan0;
int newScan0 = newData.Scan0;
// incrementing variables
int originalStride = originalData.Stride;
int newStride = newData.Stride;
// store certain properties, because accessing a variable is normally faster than a property (and we don't really know if the property recalculated anything internally)
int bmpwidth = bmpPicture.Width;
int bmpheight = bmpPicture.Height;
for (int y = 0; y < bmpheight; y++)
{
//get the data from the original image
byte* oRow = (byte*)originalScan0 + originalStride++; // by doing Variable++, you're saying "give me the value, then increment one" (Tip: DON'T add parenthesis around it!)
//get the data from the new image
byte* nRow = (byte*)newScan0 + newStride++;
int pixelPosition = 0;
for (int x = 0; x < bmpwidth; x++)
{
//create the grayscale version
byte grayScale =
(byte)((oRow[pixelPosition] * .114f) + //B
(oRow[pixelPosition + 1] * .587f) + //G
(oRow[pixelPosition + 2] * .299f)); //R
//set the new image's pixel to the grayscale version
// nRow[pixelPosition] = grayScale; //B
// nRow[pixelPosition + 1] = grayScale; //G
// nRow[pixelPosition + 2] = grayScale; //R
GRAY[x][y] = (int)grayScale;
pixelPosition += pixelSize;
}
}
Your code is converting from a row-major representation into a column-major representation.
In the bitmap, pixel (x,y) is followed by (x+1,y) in memory; but in your GRAY
array, pixel (x,y) is followed by (x,y+1).
This causes inefficient memory access when writing, as every write touches a different cache line; and you end up trashing the CPU cache if the image is big enough. This is especially bad if your image size is a power of two (see Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?).
Store your array in row-major order as well if possible to avoid the inefficient memory access (replace GRAY[x,y]
with GRAY[y,x]
).
If you really need it in column-major order, look at more cache-friendly algorithms for matrix transposition (e.g. A Cache Efficient Matrix Transpose Program?)
Your code may not be optimal, but a quick skim seems to show even this version should run in a fraction of a second. This suggests there's some other problem:
Are you:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With