Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing collection of long lived objects

Background: I have a service whose purpose in life is to provide objects to requestors - it basically gets complicated data from a database and transforms it once (a bit like a view over data) to produce a simplified record. This then services requests from other services by providing up to 100k records (depending on the nature of the request) on demand.

The idea is that the complicated transformation is done once and is cached by the service - it works out quicker than letting the database work it out each time a view is accessed and for my purposes works just fine. (I believe this is called SSOS by some)

The way data is being cached is in a list of objects which are property bags for standard .Net types. These objects have no references to anything else. Periodically a record will change, and the cache must be updated which means that the original record must be located, thrown away and replaced.

Now the record in the cache will have been in there for a long time and will have been marked for a Gen 2 collection; pretty much all the collections will happen in the Gen2 phase as these objects are hanging around for ages (on purpose).

So my understanding of Gen2 collections is that they are slow, and if the collections are mainly working on Gen2 then the optimizer is going to do this more often.

I would like to be able to de-reference an object in the list in a way that doesn't end up triggering a full Gen2 collection... I was thinking that maybe there is a way of marking it as Gen0 and then de-referencing it before replacing it - but I don't think that is possible.

I am constrained to using .Net 4 for this and the application is a service which serves data to up to 100 clients who request full lists or changes to the list over a period of time.

Question: Can anyone suggest a way to de-reference long lived objects in a GC friendly way or perhaps another way to approach this problem?

like image 652
Jay Avatar asked Feb 18 '23 06:02

Jay


1 Answers

There is no simple answer to this. If you have lots of long-lived objects, then full collections really can hurt, as I discussed here. Since a picture tells a thousand words:

enter image description here

Those vertical spikes are where garbage collection happens and slaughters the response times.

The way we reduced the impact of this was: don't have a gazillion long-lived objects. What we did was to change the classes to structs, which meant that the only object was the array that contained them. We were fortunate here is that the data was simple and didn't involve strings, which would of course themselves be objects. We also did some crazy fixed-size buffer work to reduce things that were previously collections, and changed what were references to indices (into the array). If you do have to use string data, perhaps try to ensure you don't have 20,000 different string instancs with the same value - some kind of manual interner (a Dictionary<string,string> would suffice) can be really useful there.

Note that this needn't impact your public API, since you can always create the old class data from the struct storage - the difference is that this class will only exist briefly as a DTO - so will be collected cheaply in the next gen-0 sweep.

YMMV, but this worked enough well for us.

The problem is: you need to be really careful when working with structs; I strongly advise making them immutable.

like image 55
Marc Gravell Avatar answered Feb 26 '23 18:02

Marc Gravell