Profiling my code in IPython using %prun, I've noticed that the majority of the function time is spent in garbage collection (0.334s vs. 0.428 total time).
79254 function calls (77408 primitive calls) in 0.428 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5 0.334 0.067 0.334 0.067 {gc.collect}
15757 0.005 0.000 0.007 0.000 {isinstance}
1584 0.002 0.000 0.004 0.000 dtypes.py:68(is_dtype)
I've tried disabling/enabling the garbage collection before calling the function and after returning its value, but the timing is virtually identical.
import gc
gc.disable()
x = foo()
gc.disable()
Does anyone know why this is such a bottleneck and how to speed it up?
My Python/Pandas versions are listed below:
Python 2.7.11 |Continuum Analytics, Inc.| (default, Dec 6 2015, 18:57:58)
Pandas 0.17.1
Short of avoiding garbage collection altogether, there is only one way to make garbage collection faster: ensure that as few objects as possible are reachable during the garbage collection. The fewer objects that are alive, the less there is to be marked.
If your application's object creation rate is very high, then to keep up with it, the garbage collection rate will also be very high. A high garbage collection rate will increase the GC pause time as well. Thus, optimizing the application to create fewer objects is THE EFFECTIVE strategy to reduce long GC pauses.
One way is to increase the Java heap size. Look at the Garbage Collection subtab to estimate the heap size used by the application and change Xms and Xmx to a higher value. The bigger the Java heap, the longer time it is between GCs.
When the garbage collector runs, it can introduce delays into your application. This is because of the way GC is implemented. G1GC will pause your app while it frees unused memory objects and compacts memory regions to reduce wasted space. These GC pauses can introduce visible delays while your app is running.
Garbage collection is a high level feature/abstraction of many modern languages. It makes programs slower, but it also makes programs much less error-prone and easier to create.
Here are some good articles about this specific topic:
Python Garbage
Only slow if you use it wrong
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With