Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much time is the garbage collector using?

My python program has a curious performance behavior: The longer it runs, the slower it gets. Early on, it cranks out tens of work units per minute. After an hour of so it is taking tens of minutes per work unit. My suspicion is that this is cause by a congested garbage collector.

The catch is that my script is too memory hungry for cProfile to work on large runs. (see: cProfile taking a lot of memory)

We have written our own performance plugin and we can observe most parts of our system and none of them seem to be the problem. The one rock that is still unturned is the GC.

Is there some other way (besides profile or cProfile) to see how much time is going to the GC?

like image 974
Matthew Scouten Avatar asked May 24 '11 18:05

Matthew Scouten


People also ask

What is garbage collection time?

The more live objects are found, the longer the suspension, which has a direct impact on response time and throughput. This fundamental tenet of garbage collection and the resulting effect on application execution is called the garbage-collection pause or GC pause time.

What is the acceptable number for of time spent in garbage collection?

Typically, your JVM should: Spend less than 0.5 seconds in each GC cycle. The percentage of time in garbage collection should be less than 3% - This percentage can be calculated by dividing the sum of the garbage collection times over an interval by the interval.

Is garbage collection slow or fast?

Impacts On Performance If it was, every language would use it. GC is slow, mostly because it needs to pause program execution to collect garbage. Think of it like this — your CPU can only work on one thing at a time.

What causes long garbage collection time?

CPU usage will be high during a garbage collection. If a significant amount of process time is spent in a garbage collection, the number of collections is too frequent or the collection is lasting too long. An increased allocation rate of objects on the managed heap causes garbage collection to occur more frequently.


1 Answers

In Python, most garbage is collected using reference counting. One would expect this to be quick and painless, and it seems unlikely that this is what you're after. I assume you're asking about the collector referred to by the gc module, which is only used for circular references.

There are a few things that might be of use: http://docs.python.org/library/gc.html

Although there doesn't appear to be a direct method to time the garbage collector, you can turn it on and off, enable debugging, look at collection count etc. All of this might be helpful in your quest.

For example, on my system gc prints out the elapsed time if you turn on the debug flags:

In [1]: import gc

In [2]: gc.set_debug(gc.DEBUG_STATS)

In [3]: gc.collect()
gc: collecting generation 2...
gc: objects in each generation: 159 2655 7538
gc: done, 10 unreachable, 0 uncollectable, 0.0020s elapsed.

All of this aside, the first thing I would look at is the evolution of the memory usage of your program as it runs. One possibility is that it is simply reaching the limit of available physical RAM and is slowing down due to excessive page faults, rather than due to anything to do with the garbage collector.

like image 135
NPE Avatar answered Oct 06 '22 01:10

NPE