I have a problem with my python application, and I think it's related to the python garbage collection, even if I'm not sure...
The problem is that my application takes a lot of time to exit and to switch to one function to the next one.
In my application I handle very large dictionaries, containing thousands of large objects which are instantiated from wrapped C++ classes.
I put some timestamp outputs in my program, and I saw that at the end of each function, when objects created inside the function should go out of scope, a lot of time is spent by the interpreter before calling the next function. And I observe the same problem at the end of the application, when the program should exit: a lot of time (~ hours!) is spent between the last timestamp on screen and the appearance of the fresh prompt.
The memory usage is stable, so I don't really have memory leaks.
Any suggestions?
Can be the garbage collection of thousands of large C++ objects that slow?
Is there a method to speed up that?
UPDATE:
Thanks a lot for all your answers, you gave me a lot of hints to debug my code :-)
I use Python 2.6.5 on Scientific Linux 5, a customized distribution based on Red Hat Enterprise 5. And actually I'm not using SWIG to get Python bindings for our C++ code, but the Reflex/PyROOT framework. I know, it's not very known outside particle physics (but still open source and freely available) and I have to use it because it's the default for our main framework.
And in this context the DEL command from the Python side does not work, I had already tried it. DEL only deletes the python variable linked to the C++ object, not the object itself in memory, which is still owned by the C++ side...
...I know, it's not-standard I guess, and a bit complicated, sorry :-P
But following your hints, I'll profile my code and I'll come back to you with more details, as you suggested.
ADDITIONAL UPDATE:
Ok, following your suggestions, I instrumented my code with cProfile
, and I discovered that actually the gc.collect()
function is the function taking the most of the running time!!
Here the output from cProfile
+ pstats
print_stats():
>>> p.sort_stats("time").print_stats(20) Wed Oct 20 17:46:02 2010 mainProgram.profile 547303 function calls (542629 primitive calls) in 548.060 CPU seconds Ordered by: internal time List reduced from 727 to 20 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 4 345.701 86.425 345.704 86.426 {gc.collect} 1 167.115 167.115 200.946 200.946 PlotD3PD_v3.2.py:2041(PlotSamplesBranches) 28 12.817 0.458 13.345 0.477 PlotROOTUtils.py:205(SaveItems) 9900 10.425 0.001 10.426 0.001 PlotD3PD_v3.2.py:1973(HistoStyle) 6622 5.188 0.001 5.278 0.001 PlotROOTUtils.py:403(__init__) 57 0.625 0.011 0.625 0.011 {built-in method load} 103 0.625 0.006 0.792 0.008 dbutils.py:41(DeadlockWrap) 14 0.475 0.034 0.475 0.034 {method 'dump' of 'cPickle.Pickler' objects} 6622 0.453 0.000 5.908 0.001 PlotROOTUtils.py:421(CreateCanvas) 26455 0.434 0.000 0.508 0.000 /opt/root/lib/ROOT.py:215(__getattr__) [...] >>> p.sort_stats("cumulative").print_stats(20) Wed Oct 20 17:46:02 2010 mainProgram.profile 547303 function calls (542629 primitive calls) in 548.060 CPU seconds Ordered by: cumulative time List reduced from 727 to 20 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 548.068 548.068 PlotD3PD_v3.2.py:2492(main) 4 0.000 0.000 346.756 86.689 /usr/lib//lib/python2.5/site-packages/guppy/heapy/Use.py:171(heap) 4 0.005 0.001 346.752 86.688 /usr/lib//lib/python2.5/site-packages/guppy/heapy/View.py:344(heap) 1 0.002 0.002 346.147 346.147 PlotD3PD_v3.2.py:2537(LogAndFinalize) 4 345.701 86.425 345.704 86.426 {gc.collect} 1 167.115 167.115 200.946 200.946 PlotD3PD_v3.2.py:2041(PlotBranches) 28 12.817 0.458 13.345 0.477 PlotROOTUtils.py:205(SaveItems) 9900 10.425 0.001 10.426 0.001 PlotD3PD_v3.2.py:1973(HistoStyle) 13202 0.336 0.000 6.818 0.001 PlotROOTUtils.py:431(PlottingCanvases) 6622 0.453 0.000 5.908 0.001 /root/svn_co/rbianchi/SoftwareDevelopment [...] >>>
So, in both outputs, sorted by "time" and by "cumulative" time respectively, gc.collect()
is the function consuming the most of the running time of my program! :-P
And this is the output of the memory profiler Heapy
, just before returning the main()
program.
memory usage before return: Partition of a set of 65901 objects. Total size = 4765572 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 25437 39 1452444 30 1452444 30 str 1 6622 10 900592 19 2353036 49 dict of PlotROOTUtils.Canvas 2 109 0 567016 12 2920052 61 dict of module 3 7312 11 280644 6 3200696 67 tuple 4 6622 10 238392 5 3439088 72 0xa4ab74c 5 6622 10 185416 4 3624504 76 PlotROOTUtils.Canvas 6 2024 3 137632 3 3762136 79 types.CodeType 7 263 0 129080 3 3891216 82 dict (no owner) 8 254 0 119024 2 4010240 84 dict of type 9 254 0 109728 2 4119968 86 type Index Count % Size % Cumulative % Kind (class / dict of class) 10 1917 3 107352 2 4264012 88 function 11 3647 5 102116 2 4366128 90 ROOT.MethodProxy 12 148 0 80800 2 4446928 92 dict of class 13 1109 2 39924 1 4486852 93 __builtin__.wrapper_descriptor 14 239 0 23136 0 4509988 93 list 15 87 0 22968 0 4532956 94 dict of guppy.etc.Glue.Interface 16 644 1 20608 0 4553564 94 types.BuiltinFunctionType 17 495 1 19800 0 4573364 94 __builtin__.weakref 18 23 0 11960 0 4585324 95 dict of guppy.etc.Glue.Share 19 367 1 11744 0 4597068 95 __builtin__.method_descriptor
Any idea why, or how to optimize the garbage collection?
Is there any more detailed check I can do?
GC is slow, mostly because it needs to pause program execution to collect garbage. Think of it like this — your CPU can only work on one thing at a time.
The most common performance problem associated with Java™ relates to the garbage collection mechanism. If the size of the Java heap is too large, the heap must reside outside main memory. This causes increased paging activity, which affects Java performance.
Short of avoiding garbage collection altogether, there is only one way to make garbage collection faster: ensure that as few objects as possible are reachable during the garbage collection. The fewer objects that are alive, the less there is to be marked.
Python has an automated garbage collection. It has an algorithm to deallocate objects which are no longer needed. Python has two ways to delete the unused objects from the memory.
This is known garbage collector issue in Python 2.6 causing quadratic time for garbage collection when many objects are being allocated without deallocating any of them ie. population of large list.
There are two simple solutions:
either disable garbage collection before populating large lists and enable it afterwards
l = [] gc.disable() for x in xrange(10**6): l.append(x) gc.enable()
or update to Python 2.7, where the issue has been solved
I prefer the second solution, but it's not always an option;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With