Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large Object Heap and String Objects coming from a queue

I have a windows console app that is supposed to run without restarts for days and months. The app retrieves "work" from an MSMQ and process it. There are 30 threads that process a work chunk simultaneously.

Each work chunk coming from the MSMQ is approximately 200kb most of which is allocated in a single String object.

I have noticed that after processing about 3-4 thousands of these work chunks the memory consumption of the application is ridiculously high consuming 1 - 1.5 gb of memory.

I run the app through a profiler and noticed that most of this memory (maybe a gig or so) is unused in the large object heap but the structure is fragmented.

I have found that 90% of these unused (garbage collected) bytes were previously allocated String. I started suspecting then that the strings coming in from the MSMQ were allocated, used and then deallocated and are therefore the cause of the fragmentation.

I understand that things like GC.Collect(2 or GC.Max...) wont help since they gc the large object heap but don't compact it (which is the problem here). So I think that what I need is to cache these Strings and re-use them somehow but since Strings are immutable I would have to use StringBuilders.

My question is: Is there anyway to not change the underlying structure (i.e. using the MSMQ as this is something I cant change) and still avoid initializing a new String everytime to avoid fragmenting the LOH?

Thanks, Yannis

UPDATE: About how these "work" chunks are currently retrieved

Currently these are stored as WorkChunk objects in the MSMQ. Each of these objects contains a String called Contents and another String called Headers. These are actual textual data. I can change the storage structure to something else if needed and potentially the underlying storage mechanism if needed to something else than an MSMQ.

On the worker nodes side currently we do

WorkChunk chunk = _Queue.Receive();

So there is little we can cache at this stage. If we changed the structure(s) somehow then I suppose we could do a bit of progress. In any case, we will have to sort out this problem so we will do whatever is needed to avoid throwing out months of work.

UPDATE: I went on to try some of the suggestions below and noticed that this issue cannot be reproduced on my local machine (running Windows 7 x64 and 64bit app). this makes things so much more difficult - if anyone knows why then it would really help repdocung this issue locally.

like image 316
Yannis Avatar asked Oct 14 '11 10:10

Yannis


People also ask

What size of objects are put in the large object heap?

If an object is greater than or equal to 85,000 bytes in size, it's considered a large object. This number was determined by performance tuning. When an object allocation request is for 85,000 or more bytes, the runtime allocates it on the large object heap.

What is managed heap in c#?

The managed heapAfter the CLR initializes the garbage collector, it allocates a segment of memory to store and manage objects. This memory is called the managed heap, as opposed to a native heap in the operating system.

What is small object heap?

Small Object Heap has generations that are checked from time to time. At the end of collection this heap is fragmented so it need to be compacte. If Large Objects were in this heep it would take long time for defragmentation.


2 Answers

Your problem appears to be due to memory allocation on the large object heap - the large object heap is not compacted and so can be a source of fragmentation. There is a good article here that goes into more detail including some debugging steps that you can follow to confirm that fragmentation of the large object heap is happening:

Large Object Heap Uncovered

You appear to have two three solutions:

  1. Alter your application to perform processing on chunks / shorter strings, where each chunk is smaller than 85,000 bytes - this avoids the allocation of large objects.
  2. Alter your application to allocate a few large chunks of memory up-front and re-use those chunks by copying new messages into the allocated memory instead. See Heap fragmentation when using byte arrays.
  3. Leave things as they are - As long as you don't experience out of memory exceptions and the application isn't interfering with other applications running on the system you should probably leave things as they are.

Its important here to understand the distinction between virtual memory and physical memory - even though the process is using a large amount of virtual memory, if the number of objects allocated is relatively low then it cam be that the physical memory use of that process is low (the un-used memory is paged to disk) meaning little impact on other processes on the system. You may also find that the "VM Hoarding" option helps - read "Large Object Heap Uncovered" article for more information.

Either change involves changing your application to perform either some or all of its processing using byte arrays and short substrings instead of a single large string - how difficult this is going to be for you will depend on what sort of processing it is that you are doing.

like image 150
Justin Avatar answered Sep 28 '22 08:09

Justin


When there is fragmentation on the LOH, it means that there are allocated objects on it. If you can affort the delay, you can once in a while wait till all currently running tasks are finished and call GC.Collect(). When there are no referenced large objects, they will all be collected, effectively removing the fragmentation of the LOH. Of course this only works if (allmost) all large objects are unreferenced.

Also, moving to a 64 bit OS might also help, since out of memory due to fragmentation is much less likely to be a problem on 64 bits systems, because the virtual space is almost unlimited.

like image 25
Steven Avatar answered Sep 28 '22 10:09

Steven