Some facts: We have developed wcf service that acts as a layer between clients and the database. It's selfhosted and runs as a windows service.
The service keeps several caches, where the largest are about 1-2gb in memory. Total memory usage is usually about 5-8gb. Connections are duplex and uses tcp protocol and the serialization is made with protobuf-net. Our connected client count usually range from 1000-1500. The server is a 8-core xeon of newish model with 64gb memory and runs nothing more then the service.
The problem: After x amount of time, it has been everywhere from a day to a week the service gets extremely slow. Requests that takes 0.5 seconds can take over a minute. This behaviour goes on for 15-40 minutes or til the service is restarted.
What we have done : We have checked the network and network connection to the server and there is no problem. CPU utilization goes up somewhat during this time from f.eks. 30% avg to 40-50% avg. We have taken memory dumps and there are no logical locks in code that blocks the users and not much activity at all. Our latest lead is the Garbage collector. In perfmon we can see that "% time in gc" is constantly over 90%,(90-97%) and the collection counts rises. Both GC0 and GC1. We suspect there is a blocking GC2 running also but we had to restart the service as this is in production so it didn't count up during the 5min window we ran perfmon. Memory usage was 7,6 Gb. Note : Calls outstanding rises so the calls get there but the service does not handle them.
My questions are, Can the garbage collector get in a state where it runs and blocks constantly for over 15minutes? or are the problem probably related to some other issue?
Our service ran GC in workstation mode and latencymode : Interactive We have now changed this to Server and SustainedLowLatency and hopes this will help somewhat. Are there anything else we can do if its the garbage collector?
Edit : The large memory usage is by design, the data in the caches is that large and there is lots of more memory available.
Excessive garbage collection is often caused by code issues. You either create too many objects in a short time, or you keep allocating memory without releasing it.
There is actually an extensive checklist available on MSDN that should help you diagnose the problem.
A very large GC2 means that the objects in there survived multiple garbage collections, which means they are kept in memory for a longer period of time. That could be the root cause of your issue. Maybe there is a caching mechanism that could use some tuning / retention policy (remove data that isn't used for a long time).
I have a similar situation. Large database data cache in a service using protobuf with WCF for client communication. The cache is not purely just for clients, the business layer uses the cache to perform operations. The memory footprint of the service can be anywhere between 2 and 10 GB. I release a segment of the cache after 8 hours of inactivity. The machine has 8 virtual cores and 32 GB of memory. I am using .Net 4.5.1.
The GC would consume 98% of the CPU for an hour as soon as I loaded the cache from the database. The interesting point here in both our cases there is no memory pressure what so ever.
I think the GC is performed regardless because something was changed where the GC tries to keep available memory for all threads. Since one thread allocated a large amount of memory when loading the cache, the GC kicked in. I had to do several things to fix it.
1) Removed Tuples from the cache. I was using them as dictionary keys and their implementation of StructuralEquality is horrible. It compares all properties as objects so there is a lot of boxing going on for properties that are values and these will have to be garbage collected at some point.
2) When replacing Tuples used as keys I could not simply replace them with structures without implementing Equals as the value comparison uses reflection and it is too expensive so I ended up creating a Generic Pair structure. I decided to use structures to remove the number of objects when they were in arrays.
3) To remove the Tuples I had to create my own Pair structure that compares the properties using default equals for property types. Eseentially the same thing that PowerCollections created.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With