Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where is the maximum capacity of a C# Collection<T> defined?

I tried to add a large number of elements to a Collection, the elements each simple data transfer objects with five properties of basic data types, nothing special.

When adding new entries in a loop I always get an OutOfMemoryException. The interesting thing is that I always get the exception when trying to add the 8388608th element (which is 8*1024*1024). Therefore I assume that there is a built-in limit in terms of capacity (number of elements) allowed in such collections, but I couldn't find any information about it.

Does this limit indeed exist? Where would I find this documented?

like image 540
Gorgsenegger Avatar asked Dec 19 '12 14:12

Gorgsenegger


1 Answers

This is an OutOfMemoryException, so it's not the size or capacity of the collection at issue here: it is memory use in your application. The trick is that you don't have to use up the memory in your machine or even in your process to get this exception.

What I think is happening is that you're filling up the large object heap. As collections grow they need to add storage in the background to accommodate the new items. Once the new storage is allocated and the items are copied in, the old storage is released and should be eligible for garbage collection.

The issue is that once you get beyond a certain size (used to be 85000 bytes, but might be different now), the Garbage Collector (GC) tracks your memory using something called the Large Object Heap (LOH). When the GC frees memory from the LOH (which happens only rarely to begin with), the memory will return to your operating system and be available for other processes, but the virtual address space from that memory will still be in use within your own process. You'll have a great gaping hole in your program's address table, and because this hole is on the Large Object Heap it will never be compacted or reclaimed.

The reason you see this exception on an exact power of two is that most .Net collections use a doubling algorithm for adding storage to the collection. It will always throw at the point where it needs to double again, because up until that point the RAM was already allocated.

A quick solution, then, is to take advantage of a little-used feature of most .Net Collections. If you look at the constructor overloads, most of the collection types will have one that allows you to set the capacity during initial construction. This capacity is not a hard limit — it's just a starting point — but it is useful in a few cases, including when you have collections that will grow very large. You can set the initial capacity to something obscene... hopefully something just large enough to hold all your items, or at least only need to "double" once or twice.

You can kind of see this effect by running the following code in a Console Application:

var x = new List<int>();
for (long y = 0; y < long.MaxValue; y++)
    x.Add(0);

On my system, that throws an OutOfMemory exception after 134217728 items. 134217728 * 4 bytes per int is only (and exactly) 512MB of RAM. It shouldn't throw yet, because that's the only thing of any real size in the process, but it does anyway because of address space lost to old versions of the collection.

Now let's change the code to set the capacity like this:

var x = new List<int>(134217728 * 2);
for (long y = 0; y < long.MaxValue; y++)
    x.Add(0);

Now my system makes it all the way to 268435456 items (1GB of RAM) when it throws, which it does because it can't double that 1GB thanks to other ram used by the process eating part of the 2GB virutal address table limit (ie: the loop counter and any overhead from the collection object and process itself).

What I can't explain is that it does not allow me to use 3 as a multiplier, even though that would be only(!) 1.5GB. A little experiment using different multipliers trying to find out just how large I could get showed that the number is not consistent. At one point I was able to get above 2.6, but then had to back off to under 2.4. Something new to discover, I guess.

If this solution does get enough space for you, there is also a trick you can use to get 3GB of virtual address space, or you can force your app to compile for x64 rather than x86 or AnyCPU. If you're using a version of the framework based on the 2.0 runtime (anything up through .Net 3.5) you might try updating to .Net 4.0 or later, which is reportedly a little better about this. Failing those, you will have to look at a complete re-write of how you handle your data that likely involves keeping it on disk, and only holding a single item or small sample of the items (cache) in memory at a time. I really recommend this last option, because anything else is likely to eventually break again unexpectedly (and if you're dataset is this large to begin with, it's likely growing as well).

like image 157
Joel Coehoorn Avatar answered Oct 16 '22 16:10

Joel Coehoorn