Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the most common int in a 2d array of ints?

Tags:

c#

paint.net

OK, so I'm just starting to think how to implement a new graphical plugin for Paint.NET and I will need to know how to find the most common integer in a 2d array of integers. Is there a built-in to C# way to do this? Or, does anyone have a slick way to do it?

The array will look something like this:

300 300 300 300 300 300 300
  0 150 300 300 300 300 300
  0   0 150 300 300 300 300
  0   0   0   0 300 300 300
  0   0   0   0 150 300 300
  0   0   0   0   0 150 300
  0   0   0   0   0   0 300

I would need to know that 300 is the most common number in the array. If there is no "most common" then just return the center number (the array dimintions will always be odd x odd) 0.

I'll be implementing this using a "brute force" algorithm unless you experts can come up with something faster.

Any help would be very much appreciated.

Thanks!

EDIT: More info...

The values will almost always be VERY diverse (more diverse than my example array). The values will be in the range of 0-360. The size of the array will be 5x5 to about 17x17 depending on speed of the algorithm. The result will be calculate once for each pixel in a large image... so faster is better. ;)

like image 319
BoltBait Avatar asked Jan 29 '09 18:01

BoltBait


5 Answers

It's at least O(n*m) any way you slice it -- you are going to have to look at each cell at least once. The place to economize is in where you accumulate the counts of each value before looking for the most common; if your integers vary over a relatively small range (they are uint16, let's say), then you might be able to simply use a flat array instead of a map.

I guess you could also keep a running count x,y of the current top and second-closest candidate for "most common" and early-out as soon as you've less than (n*m)-(x-y) cells left to look at, since at that point there's no way the runner-up could outpace the top candidate.

Integer ops like this are pretty fast; even for a megapixel image the brute force algorithm should only take a couple milliseconds.

I notice you've edited your original question to say that the pixels value from 0..255 -- in that case, definitely go with a simple flat array; that's small enough to easily fit into the l1 dcache and a lookup in a flat array is trez quick.

[edit] : Dealing with the "no most common number" case is very simple once you've built the histogram array: all have you to do is walk through it to find the "most" and "second most" common numbers; if they're equally frequent, then by definition there is no one most common.

const int numLevels = 360; // you said each cell contains a number [0..360)
int levelFrequencyCounts[numLevels]; // assume this has been populated such that levelFrequencyCounts[i] = number of cells containing "i"
int mostCommon = 0, runnerUp = 0;
for (int i = 1 ; i < numLevels ; ++i)
{
  if ( levelFrequencyCounts[i] > levelFrequencyCounts[mostCommon] )
  {
    runnnerUp = mostCommon;
    mostCommon = i;
  }
}

if ( levelFrequencyCounts[mostCommon] != levelFrequencyCounts[runnerUp] )
{
   return mostCommon;
}
else
{
   return CenterOfInputData; // (something like InputData[n/2][m/2])
}
like image 188
Crashworks Avatar answered Nov 20 '22 17:11

Crashworks


how would I do something like this in C#?

Something like this:

Dictionary<int, int> d = new Dictionary<int, int>();
foreach (int value in matrix)
{
 if (!d.ContainsKey(value))
  d.Add(value, 1);
 else
  d[value] = d[value] + 1;
}
KeyValuePair<int, int> biggest = null;
foreach (KeyValuePair<int, int> found in d)
{
  if ((biggest == null) || (biggest.Value < found.Value))
    biggest = found;
}
like image 34
ChrisW Avatar answered Nov 20 '22 17:11

ChrisW


Take a look at the LocalHistogramEffect code in Paint.NET, notably LocalHistorgramEffect.RenderRect.

I walks the input image, maintaining a histogram of intensities for each source pixel withing 'r' pixels of the destination pixel. As the output pixels are traversed, it adds the leading edge to the histogram and subtracts the trailing edge. It handles all the edge cases well, and is quite fast. It's the basis for the Median, Unfocus, Outline, and Remove Noise effects.

Adapting this to support Hue instead of RGB intensity would be rather trivial.

The performance is quite good, and for your purposes it operates in O(r^2+wr+nw), where r is the radius, w is the width of the image, and n is the number of levels in the histogram.

-tjackson

like image 40
ddrcoder Avatar answered Nov 20 '22 16:11

ddrcoder


One option is LINQ - a bit inefficient, but OK for non-huge arrays:

    var max = (from cell in data.Cast<int>()
               group cell by cell into grp
               select new { Key = grp.Key, Count = grp.Count() } into agg
               orderby agg.Count descending
               select agg).First();
    Console.WriteLine(max.Key + ": " + max.Count);

Or with a jagged array:

    var max = (from row in data
              from cell in row
              group cell by cell into grp
              select new {Key = grp.Key, Count = grp.Count()} into agg
              orderby agg.Count descending
              select agg).First();
    Console.WriteLine(max.Key + ": " + max.Count);

In reality, I would probably use a dictionary/count. This example without LINQ, just "because":

    Dictionary<int, int> counts = new Dictionary<int, int>();
    foreach (int value in data)
    {
        int count;
        counts.TryGetValue(value, out count);
        counts[value] = count + 1;
    }
    int maxCount = -1, maxValue = 0;
    foreach (KeyValuePair<int, int> pair in counts)
    {
        if (pair.Value > maxCount)
        {
            maxCount = pair.Value;
            maxValue = pair.Key;
        }
    }
    Console.WriteLine(maxCount + ": " + maxValue);
like image 41
Marc Gravell Avatar answered Nov 20 '22 18:11

Marc Gravell


Your image:

300+ 300+ 300+ 300 300 300 300
  0+ 150+ 300+ 300 300 300 300
  0+   0+ 150+ 300 300 300 300
  0    0    0    0 300 300 300
  0    0    0    0 150 300 300
  0    0    0    0   0 150 300
  0    0    0    0   0   0 300

Marked (+) numbers are your window. w,h is your window dimensions. Apply bucket sorting (as other people suggested since your value ranges are quite limited). Don't cut your evaluation halfway as Crashworks suggests. Don't throw your result yet. This is the first step.

300- 300- 300- 300 300 300 300
  0. 150. 300. 300 300 300 300
  0.   0. 150. 300 300 300 300
  0+   0+   0+   0 300 300 300
  0    0    0    0 150 300 300
  0    0    0    0   0 150 300
  0    0    0    0   0   0 300

Shift your window. Instead of adding, subtract the buckets in the last row/column you passed and add the new buckets. This way you examine each pixel 2(w+h) times i.e. when it crosses the window boundary, instead of w*h times, i.e. while that pixel is in the window, in a naive implementation.

In other words, You need to move your window like this:

|  ^->|  ^
|  |  |  |
|  |  |  |
V->|  V->|

I assume you are trying to implement a nonlinear convolution filter.

Corrections welcome.

like image 1
artificialidiot Avatar answered Nov 20 '22 18:11

artificialidiot