For my project, I first load an image from file, and put every pixel into a 2D pixels[,]
array. Then, I want to examine each pixel and split them up into "bins" based on how they are colored, and then sort each bin. So I have a Bin
object, which encapsulates a List<Pixel>
, and I have a List<Bin>
containing a (relatively small) number of bins.
My problem is that when I try to fill these bins from very large images (1920x1200 = 2.3 million pixels, eg), the algorithm I'm using is slower than I'd like, and I've traced the problem down to some C# language-specific features which seem to be taking way longer than I was expecting. I'd like some advice on how to better use C# to remove these bottlenecks.
After loading an image, I call a function called "fillBinsFromSource", which takes an enumerable list of pixels, finds which bin they belong in, and puts them there:
public void fillBinsFromSource(IEnumerable<Pixel> source)
{
Stopwatch s = new Stopwatch();
foreach (Pixel p in source)
{
s.Start();
// algorithm removed for brevity
// involves a binary search of all Bins and a List.Add call
s.Stop();
}
}
For large images, it's expected that my algorithm will take a while, but when I put a Stopwatch outside the function call, it turns out that it takes about twice as long as the time accrued on s
, which means doing the enumeration using foreach
is taking up half the time of this function (about 800 ms of the 1.6 seconds for a 1920x1200 image).
The reason I need to pass an enumerable list is because sometimes users will add only a small region of a picture, not an entire picture. The time-consuming call passes down several iterators, first from an list of images, then from each image in the list, like so (simplified):
class ImageList : IEnumerable<Pixel>
{
private List<Image> imageList;
public IEnumerator<Pixel> GetEnumerator()
{
foreach (Image i in imageList)
foreach (Pixel p in i)
yield return p;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
class Image : IEnumerable<Pixel>
{
private Pixel[,] pixels; // all pixels in the image
private List<Pixel> selectedPixels;// all pixels in the user's selection
public IEnumerator<Pixel> GetEnumerator()
{
if (selectedPixels == null)
for (int i = 0; i < image.Width; i++)
for (int j = 0; j < image.Height; j++)
yield return pixels[i, j];
else
for (int i = 0; i < selectedPixels.Count; i++)
yield return selectedPixels[i];
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
Then finally I call this
ImageList list; // pretend it contains only 1 image, and it's large
fillBinsFromSource(list);
Question 1) Because of the need to enumerate over both the 2D array of pixels AND the selected region, depending on what the user has chosen, the enumeration is just really slow. How can I speed this up?
Then, after filling all these bins with Pixel
objects, I sort them. I call List<Pixel>.Sort()
and rely on IComparable
, like so:
ImageList list; // pretend it contains only 1 image, and it's large
fillBinsFromSource(list);
foreach(Bin b in allBins)
b.Sort(); // calls List<Pixel>.Sort()
class Pixel : IComparable
{
// store both HSV and RGB
float h, s, v;
byte r, g, b;
// we sort by HSV's value property
public int CompareTo(object obj)
{
// this is much faster than calling CompareTo on a float
Pixel rhs = obj as Pixel;
if (v < rhs.v)
return -1;
else if (v > rhs.v)
return 1;
return 0;
}
Question 2) Suppose allBins
has 7 elements; sorting 7 separate lists which have a total of 2.3 million Pixel
s in them takes about 2 seconds. Sorting one list of 2.3 million random int
s takes under 200 milliseconds. I can appreciate that there's some level of speed-up using primitive types, but is it really over 10x slower to use IComparable
? Are there any speedups to be had here?
I apologize for the long question, if anyone has any advice for me, I'd appreciate it!
You really need to profile your code and see what is slow.
Most obvious:
Pixel
does not implement generic IComparable<Pixel>
and as result Compare had to do much more work.Pixel
objects directly by index instead of iterators if you find that performance is below what you find acceptable. Iterators are nice, but not free.All kinds of indirection, like a visitor pattern or virtual inheritance, is poison if you want raw metal performance. Virtual calls, allocation, unpredictable branching do a lot of damage to the kind of algorithm where there is a small, tight, inner loop where 99,99% of the time is spent.
Why? Because the CPU likes to execute many (dozens) of instructions in parallel. It can do that only if it can peek ahead in the instruction stream. The aforementioned things prevent that.
You really need to get the innermost loop right. Don't allocate there, don't call virtual functions (or interface methods or delegates).
Probably, your innermost loop should process a rectangle of a given image with a hard-coded kernel. Instead of implementing your processing function per-pixel, implement it per-rectangle.
In contrast, it doesn't matter how you provide the stream of images. Use LINQ there all you want. It is a low-volume operation because there a millions of pixels per image.
Instead of using an iterator, or even building up an array / list of pixels to begin with, you could use the visitor pattern. Images, Image Lists and other objects representing arbitrary selections could all accept a visitor class with a single method VisitPixel
and call that method for each pixel the object represents. The visitor class would then be responsible for binning all the pixels as they're visited.
This might eliminate the need to extract all of your pixels into a separate array. It might also eliminate the creation of iterators in favor of simple for
loops.
Alexei Levenkov has some good points in regards to your second question. It might be even faster to use the Sort
overload that takes an instance of IComparer<T>
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With