Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I call GC.Collect immediately after using the large object heap to prevent fragmentation

My application does a good deal of binary serialization and compression of large objects. Uncompressed the serialized dataset is about 14 MB. Compressed it is arround 1.5 MB. I find that whenever I call the serialize method on my dataset my large object heap performance counter jumps up from under 1 MB to about 90 MB. I also know that under a relatively heavy loaded system, usually after a while of running (days) in which this serialization process happens a few time, the application has been known to throw out of memory excpetions when this serialization method is called even though there seems to be plenty of memory. I'm guessing that fragmentation is the issue (though i can't say i'm 100% sure, i'm pretty close)

The simplest short term fix (i guess i'm looking for both a short term and a long term answer) i can think of is to call GC.Collect right after i'm done the serialization process. This, in my opinion, will garbage collect the object from the LOH and will do so likely BEFORE other objects can be added to it. This will allow other objects to fit tightly tightly against the remaining objects in the heap without causing much fragmentation.

Other than this ridiculous 90MB allocation i don't think i have anything else that uses a lost of the LOH. This 90 MB allocation is also relatively rare (arround every 4 hours). We of course will still have the 1.5 MB array in there and maybe some other smaller serialized objects.

Any ideas?

Update as a result of good responses

Here is my code which does the work. I've actually tried changing this to compress WHILE serializing so that serialization serializes to a stream at the same time and i don't get much better result. I've also tried preallocating the memory stream to 100 MB and trying to use the same stream twice in a row, the LOH goes up to 180 MB anyways. I'm using Process Explorer to monitor it. It's insane. I think i'm going to try the UnmanagedMemoryStream idea next.

I would encourage you guys to try it out if you wont. It doesn't have to be this exact code. Just serialize a large dataset and you will get surprising results (mine has lots of tables, arround 15 and lots of strings and columns)

        byte[] bytes;
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter serializer =
        new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();            
        System.IO.MemoryStream memStream = new System.IO.MemoryStream();
        serializer.Serialize(memStream, obj);
        bytes = CompressionHelper.CompressBytes(memStream.ToArray());
        memStream.Dispose();
        return bytes;

Update after trying binary serialization with UnmanagedMemoryStream

Even if I serialize to an UnmanagedMemoryStream the LOH jumps up to the same size. It seems that no matter what i do, called the BinaryFormatter to serialize this large object will use the LOH. As for pre-allocating, it doesn't seem to help much. Say i pre-allocate say i preallocate 100MB, then i serialize, it will use 170 MB. Here is the code for that. Even simpler than the above code

BinaryFormatter serializer  = new BinaryFormatter();
MemoryStream memoryStream = new MemoryStream(1024*1024*100);
GC.Collect();
serializer.Serialize(memoryStream, assetDS);

The GC.Collect() in the middle there is just to update the LOH performance counter. You will see that it will allocate the correct 100 MB. But then when you call the serialize, you will notice that it seems to add that on top of the 100 that you have already allocated.

like image 917
Mark Avatar asked Dec 18 '09 21:12

Mark


People also ask

When should I call GC collect?

If you have code in your finalizers, it's possible that you will need to call GC. Collect() twice, as the first time will cause the finalizers to execute, but the actual memory cannot be cleaned until after the finalizer has completed, which means the subsequent call will catch the object.

When you request GC to run it will start to run immediately?

Calling GC. Collect() will do a complete garbage collection and wait for it to finish, but it will NOT wait for any pending finalizers to run.

Which method does not guarantee immediate garbage collection of an object?

The Collect(Int32, GCCollectionMode, Boolean) method requests a background collection, but this is not guaranteed; depending on the circumstances, a blocking collection may still be performed. The garbage collector tries to provide optimal performance.

What does GC collect () do?

Garbage collection ensures that a program does not exceed its memory quota or reach a point that it can no longer function. It also frees up developers from having to manually manage a program's memory, which, in turn, reduces the potential for memory-related bugs.


2 Answers

Beware of the way collection classes and streams like MemoryStream work in .NET. They have an underlying buffer, a simple array. Whenever the collection or stream buffer grows beyond the allocated size of the array, the array gets re-allocated, now at double the previous size.

This can cause many copies of the array in the LOH. Your 14MB dataset will start using the LOH at 128KB, then take another 256KB, then another 512KB, etcetera. The last one, the one actually used, will be around 16MB. The LOH contains the sum of these, around 30MB, only one of which is in actual use.

Do this three times without a gen2 collection and your LOH has grown to 90MB.

Avoid this by pre-allocating the buffer to the expected size. MemoryStream has a constructor that takes an initial capacity. So do all collection classes. Calling GC.Collect() after you've nulled all references can help unclog the LOH and purge those intermediate buffers, at the cost of clogging the gen1 and gen2 heaps too soon.

like image 102
Hans Passant Avatar answered Oct 05 '22 22:10

Hans Passant


Unfortunately, the only way I could fix this was to break up the data in chunks so as not to allocate large chunks on the LOH. All the proposed answers here were good and were expected to work but they did not. It seems that the binary serialization in .NET (using .NET 2.0 SP2) does its own little magic under the hood which prevents users from having control over memory allocation.

Answer then to the question would be "this is not likely to work". When it comes to using .NET serialization, your best bet is to serialize the large objects in smaller chunks. For all other scenarios, the answers mentioned above are great.

like image 42
Mark Avatar answered Oct 06 '22 00:10

Mark