Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Garbage Collection so Slow?

Profiling my code in IPython using %prun, I've noticed that the majority of the function time is spent in garbage collection (0.334s vs. 0.428 total time).

79254 function calls (77408 primitive calls) in 0.428 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     5    0.334    0.067    0.334    0.067 {gc.collect}
 15757    0.005    0.000    0.007    0.000 {isinstance}
  1584    0.002    0.000    0.004    0.000 dtypes.py:68(is_dtype)

I've tried disabling/enabling the garbage collection before calling the function and after returning its value, but the timing is virtually identical.

import gc

gc.disable()
x = foo()
gc.disable()

Does anyone know why this is such a bottleneck and how to speed it up?

My Python/Pandas versions are listed below:

Python 2.7.11 |Continuum Analytics, Inc.| (default, Dec  6 2015, 18:57:58) 
Pandas 0.17.1
like image 206
Alexander Avatar asked Dec 20 '15 06:12

Alexander


People also ask

How can I speed up my garbage collection?

Short of avoiding garbage collection altogether, there is only one way to make garbage collection faster: ensure that as few objects as possible are reachable during the garbage collection. The fewer objects that are alive, the less there is to be marked.

Why does garbage collection take so long?

If your application's object creation rate is very high, then to keep up with it, the garbage collection rate will also be very high. A high garbage collection rate will increase the GC pause time as well. Thus, optimizing the application to create fewer objects is THE EFFECTIVE strategy to reduce long GC pauses.

How do I fix long garbage collection time?

One way is to increase the Java heap size. Look at the Garbage Collection subtab to estimate the heap size used by the application and change Xms and Xmx to a higher value. The bigger the Java heap, the longer time it is between GCs.

What is the problem with garbage collection?

When the garbage collector runs, it can introduce delays into your application. This is because of the way GC is implemented. G1GC will pause your app while it frees unused memory objects and compacts memory regions to reduce wasted space. These GC pauses can introduce visible delays while your app is running.


1 Answers

Garbage collection is a high level feature/abstraction of many modern languages. It makes programs slower, but it also makes programs much less error-prone and easier to create.

Here are some good articles about this specific topic:

Python Garbage
Only slow if you use it wrong

like image 165
RFV Avatar answered Nov 02 '22 19:11

RFV