Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.Net Dictionary<int,int> out of memory exception at around 6,000,000 entries

I am using a Dictionary<Int,Int> to store the frequency of colors in an image, where the key is the the color (as an int), and the value is the number of times the color has been found in the image.

When I process larger / more colorful images, this dictionary grows very large. I get an out of memory exception at just around 6,000,000 entries. Is this the expected capacity when running in 32-bit mode? If so, is there anything I can do about it? And what might be some alternative methods of keeping track of this data that won't run out of memory?

For reference, here is the code that loops through the pixels in a bitmap and saves the frequency in the Dictionary<int,int>:

Bitmap b; // = something...
Dictionary<int, int> count = new Dictionary<int, int>();
System.Drawing.Color color;

for (int i = 0; i < b.Width; i++)
{
    for (int j = 0; j < b.Height; j++)
    {
        color = b.GetPixel(i, j);
        int colorString = color.ToArgb();
        if (!count.Keys.Contains(color.ToArgb()))
        {
            count.Add(colorString, 0);                
        }
        count[colorString] = count[colorString] + 1;
    }
}

Edit: In case you were wondering what image has that many different colors in it: http://allrgb.com/images/mandelbrot.png

Edit: I also should mention that this is running inside an asp.net web application using .Net 4.0. So there may be additional memory restrictions.

Edit: I just ran the same code inside a console application and had no problems. The problem only happens in ASP.Net.

like image 978
Rafe Avatar asked Dec 17 '13 19:12

Rafe


2 Answers

Update: Given the OP's sample image, it seems that the maximum number of items would be over 16 million, and apparently even that is too much to allocate when instantiating the dictionary. I see three options here:

  • Resize the image down to a manageable size and work from that.
  • Try to convert to a color scheme with fewer color possibilities.
  • Go for an array of fixed size as others have suggested.

Previous answer: the problem is that you don't allocate enough space for your dictionary. At some point, when it is expanding, you just run out of memory for the expansion, but not necessarily for the new dictionary.

Example: this code runs out of memory at nearly 24 million entries (in my machine, running in 32-bit mode):

Dictionary<int, int> count = new Dictionary<int, int>();
for (int i = 0; ; i++)
     count.Add(i, i);

because with the last expansion it is currently using space for the entries already there, and tries to allocate new space for another so many million more, and that is too much.

Now, if we initially allocate space for, say, 40 million entries, it runs without problem:

Dictionary<int, int> count = new Dictionary<int, int>(40000000);

So try to indicate how many entries there will be when creating the dictionary.

From MSDN:

The capacity of a Dictionary is the number of elements that can be added to the Dictionary before resizing is necessary. As elements are added to a Dictionary, the capacity is automatically increased as required by reallocating the internal array. If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the Dictionary.

like image 63
Julián Urbano Avatar answered Sep 29 '22 15:09

Julián Urbano


Each dictionary entry holds two 4-byte integers: 8 bytes total. 8 bytes * 6 millions entries is only about 48MB, +/- some space for object overhead, alignment, etc. There's plenty of space in memory for this. .Net provides virtual address space of up to 2 GB per process. 48MB or so shouldn't cause a problem.

I expect what's actually happening here is related to how the dictionary auto-expands and how the garbage collector handles (or doesn't handle) compaction.

First, the auto-expanding part. Last time I checked (back around .Net 2.0*), collections in .Net tended to use arrays internally. They would allocated a reasonably-sized array in the collection constructor (say, 10 items), and then use a doubling algorithm to create additional space whenever the array filled up. All the existing items would have to be copied to the new array, but then the old array could be garbage collected. The garbage collector is pretty reliable about this, and so it means you're left using space for at most 2n - 1 items in the collection.

Now the Garbage Collector compaction part. After a certain size, these arrays end up in a section of memory called the Large Object Heap. Garbage Collection still works here (though less often). What doesn't really work here well is compaction (think memory defragmentation). The physical memory used by the old object will be released, returned to the operating system, and available for other processes. However, the virtual address space in your process... the table that maps program memory offsets to physical memory addresses, will still have the (empty) space reserved.

This is important, because remember: we're working with a rapidly growing object. It's possible for such an object to take up address space far larger than the final size of the object itself. An object grows enough, fast enough, and suddenly you get an OutOfMemoryException, even though your app isn't really using all that much RAM.

The first solution here is allocate enough space in the initial collection for all of your data. This allows you to skip all those re-allocations and copying. Your data will live in a single array, and use only the space you actually asked for. Most collections, including the Dictionary, have an overload for the constructor that allows you to give it the number of items you want the first array to use. Be careful here: you don't need to allocate an item for every pixel in your image. There will be a lot of repeated colors. You only need to allocate enough to have space for each color in your image. If it's only large images that give you problems, and you're almost handling them with six millions records, you might find that 8 million is plenty.

My next suggestion is to group your pixel colors. A human can't tell and doesn't care if two colors are just one bit apart in any of the rgb components. You might go as far as to look at the separate RGB values for each pixel and normalize the pixel so that you only care about changes of more than 5 or so for an R,G,or B value. That would get you from 16.5 million potential colors all the way down to only about 132,000, and the data will likely be more useful, too. That might look something like this:

var colorCounts = new Dictionary<Color, int>(132651);
foreach(Color c in GetImagePixels().Select( c=> Color.FromArgb( (c.R/5) * 5, (c.G/5) * 5, (c.B/5) * 5) )
{
    colorCounts[c] += 1;
}

* IIRC, somewhere in a recent or upcoming version of .Net both of these issues are being addressed. One by allowing you to force compaction of the LOH, and the other by using a set of arrays for collection backing stores, rather than trying to keep everything in one big array

like image 23
Joel Coehoorn Avatar answered Sep 29 '22 14:09

Joel Coehoorn