Should I call GC.Collect immediately after using the large object heap to prevent fragmentation

Tags:

My application does a good deal of binary serialization and compression of large objects. Uncompressed the serialized dataset is about 14 MB. Compressed it is arround 1.5 MB. I find that whenever I call the serialize method on my dataset my large object heap performance counter jumps up from under 1 MB to about 90 MB. I also know that under a relatively heavy loaded system, usually after a while of running (days) in which this serialization process happens a few time, the application has been known to throw out of memory excpetions when this serialization method is called even though there seems to be plenty of memory. I'm guessing that fragmentation is the issue (though i can't say i'm 100% sure, i'm pretty close)

The simplest short term fix (i guess i'm looking for both a short term and a long term answer) i can think of is to call GC.Collect right after i'm done the serialization process. This, in my opinion, will garbage collect the object from the LOH and will do so likely BEFORE other objects can be added to it. This will allow other objects to fit tightly tightly against the remaining objects in the heap without causing much fragmentation.

Other than this ridiculous 90MB allocation i don't think i have anything else that uses a lost of the LOH. This 90 MB allocation is also relatively rare (arround every 4 hours). We of course will still have the 1.5 MB array in there and maybe some other smaller serialized objects.

Any ideas?

Update as a result of good responses

Here is my code which does the work. I've actually tried changing this to compress WHILE serializing so that serialization serializes to a stream at the same time and i don't get much better result. I've also tried preallocating the memory stream to 100 MB and trying to use the same stream twice in a row, the LOH goes up to 180 MB anyways. I'm using Process Explorer to monitor it. It's insane. I think i'm going to try the UnmanagedMemoryStream idea next.

I would encourage you guys to try it out if you wont. It doesn't have to be this exact code. Just serialize a large dataset and you will get surprising results (mine has lots of tables, arround 15 and lots of strings and columns)

        byte[] bytes;
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter serializer =
        new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();            
        System.IO.MemoryStream memStream = new System.IO.MemoryStream();
        serializer.Serialize(memStream, obj);
        bytes = CompressionHelper.CompressBytes(memStream.ToArray());
        memStream.Dispose();
        return bytes;

Update after trying binary serialization with UnmanagedMemoryStream

Even if I serialize to an UnmanagedMemoryStream the LOH jumps up to the same size. It seems that no matter what i do, called the BinaryFormatter to serialize this large object will use the LOH. As for pre-allocating, it doesn't seem to help much. Say i pre-allocate say i preallocate 100MB, then i serialize, it will use 170 MB. Here is the code for that. Even simpler than the above code

BinaryFormatter serializer  = new BinaryFormatter();
MemoryStream memoryStream = new MemoryStream(1024*1024*100);
GC.Collect();
serializer.Serialize(memoryStream, assetDS);

The GC.Collect() in the middle there is just to update the LOH performance counter. You will see that it will allocate the correct 100 MB. But then when you call the serialize, you will notice that it seems to add that on top of the 100 that you have already allocated.

917

asked Dec 18 '09 21:12

Mark

2 Answers

Beware of the way collection classes and streams like MemoryStream work in .NET. They have an underlying buffer, a simple array. Whenever the collection or stream buffer grows beyond the allocated size of the array, the array gets re-allocated, now at double the previous size.

This can cause many copies of the array in the LOH. Your 14MB dataset will start using the LOH at 128KB, then take another 256KB, then another 512KB, etcetera. The last one, the one actually used, will be around 16MB. The LOH contains the sum of these, around 30MB, only one of which is in actual use.

Do this three times without a gen2 collection and your LOH has grown to 90MB.

Avoid this by pre-allocating the buffer to the expected size. MemoryStream has a constructor that takes an initial capacity. So do all collection classes. Calling GC.Collect() after you've nulled all references can help unclog the LOH and purge those intermediate buffers, at the cost of clogging the gen1 and gen2 heaps too soon.

102

answered Oct 05 '22 22:10

Hans Passant

Unfortunately, the only way I could fix this was to break up the data in chunks so as not to allocate large chunks on the LOH. All the proposed answers here were good and were expected to work but they did not. It seems that the binary serialization in .NET (using .NET 2.0 SP2) does its own little magic under the hood which prevents users from having control over memory allocation.

Answer then to the question would be "this is not likely to work". When it comes to using .NET serialization, your best bet is to serialize the large objects in smaller chunks. For all other scenarios, the answers mentioned above are great.

answered Oct 06 '22 00:10

Mark

Related questions
                            
                                web service can't serialize an interface
                            
                                Enforcing dependencies in IoC via a constructor?
                            
                                How to use .Net assembly from Win32 without registration?
                            
                                What's up with the [OptionalField] Attribute?
                            
                                Change Windows Service user programmatically
                            
                                DateTime format mismatch on importing from Excel Sheet
                            
                                Windows Forms DataGridView control have different control types in the same column
                            
                                Accessing files beyond MAX_PATH in C#/.NET
                            
                                Show treeview items connected with lines?
                            
                                C# Get progID from COM object
                            
                                ASMX Web service not serializing abstract base class
                            
                                DynamicMethod and out-parameters?
                            
                                Reading PDF in C# [closed]
                            
                                How do you wait for a Network Stream to have data to read?
                            
                                LINQ and JSON.NET when the property names vary
                            
                                C# For how long was user inactive
                            
                                What are the security risks in running a Windows Service as "Local System"?
                            
                                How should I design my object model so that my DAL can populate read-only fields?
                            
                                Creating a COM Automation Server in C#
                            
                                WPF: How to detect Key repetition, in Key* events?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should I call GC.Collect immediately after using the large object heap to prevent fragmentation

Tags:

memory-management

c#

out-of-memory

fragmentation

large-object-heap

Mark

People also ask

2 Answers

Hans Passant

Mark

Recent Activity

Donate For Us