Python: garbage collection fails?

Tags:

garbage-collection

Consider the following script:

l = [i for i in range(int(1e8))]
l = []
import gc
gc.collect()
# 0
gc.get_referrers(l)
# [{'__builtins__': <module '__builtin__' (built-in)>, 'l': [], '__package__': None, 'i': 99999999, 'gc': <module 'gc' (built-in)>, '__name__': '__main__', '__doc__': None}]
del l
gc.collect()
# 0

The point is, after all these steps the memory usage of this python process is around 30 % on my machine (Python 2.6.5, any more details on request?). Here's an excerpt of the output of top:

Click to copy

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND  
5478 moooeeeep 20   0 2397m 2.3g 3428 S    0 29.8   0:09.15 ipython

resp. ps aux:

Click to copy

moooeeeep 5478  1.0 29.7 2454720 2413516 pts/2 S+   12:39   0:09 /usr/bin/python /usr/bin/ipython gctest.py

According to the docs for gc.collect:

Not all items in some free lists may be freed due to the particular implementation, in particular int and float.

Does this mean, if I (temporarily) need a large number of different int or float numbers, I need to export this to C/C++ because the Python GC fails to release the memory?

Update

Probably the interpreter is to blame, as this article suggests:

It’s that you’ve created 5 million integers simultaneously alive, and each int object consumes 12 bytes. “For speed”, Python maintains an internal free list for integer objects. Unfortunately, that free list is both immortal and unbounded in size. floats also use an immortal & unbounded free list.

The problem however remains, as I cannot avoid this amount of data (timestamp/value pairs from an external source). Am I really forced to drop Python and go back to C/C++ ?

Update 2

Probably it's indeed the case, that the Python implementation causes the problem. Found this answer conclusively explaining the issue and a possible workaround.

987

asked Mar 08 '12 11:03

1 Answers

Found this also to be answered by Alex Martelli in another thread.

Unfortunately (depending on your version and release of Python) some types of objects use "free lists" which are a neat local optimization but may cause memory fragmentation, specifically by making more an more memory "earmarked" for only objects of a certain type and thereby unavailable to the "general fund".

The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it's done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates. Under such conditions, the operating system WILL do its job, and gladly recycle all the resources the subprocess may have gobbled up. Fortunately, the multiprocessing module makes this kind of operation (which used to be rather a pain) not too bad in modern versions of Python.

In your use case, it seems that the best way for the subprocesses to accumulate some results and yet ensure those results are available to the main process is to use semi-temporary files (by semi-temporary I mean, NOT the kind of files that automatically go away when closed, just ordinary files that you explicitly delete when you're all done with them).

Fortunately I was able to split the memory intensive work into separate chunks that enabled the interpreter to actually free the temporary memory after each iteration . I used the following wrapper to run the memory intensive function as a subprocess:

Click to copy

import multiprocessing

def run_as_process(func, *args):
    p = multiprocessing.Process(target=func, args=args)
    try:
        p.start()
        p.join()
    finally:
        p.terminate()

195

answered Nov 11 '22 19:11

moooeeeep

Related questions
                            
                                How do I prevent execution of arbitrary commands from a Django app making system calls?
                            
                                Case-insensitive comparison of sets in Python
                            
                                Why does my Python class claim that I have 2 arguments instead of 1?
                            
                                python regular expression across multiple lines
                            
                                How to redirect complete output of a cron script
                            
                                C#, Pass Array As Function Parameters
                            
                                Can you really scale up with Django...given that you can only use one database? (In the models.py and settings.py)
                            
                                Really weird issue with shelve (python)
                            
                                Python instances and attributes: is this a bug or i got it totally wrong?
                            
                                convert a list of booleans to string
                            
                                installing simplejson on the google appengine
                            
                                Use Twisted's getPage as urlopen?
                            
                                Forcing to make floating point calculations
                            
                                Fastest Text search method in a large text file
                            
                                sorting tuples in python with a custom key
                            
                                Testing all combinations in Python
                            
                                How to display images using different color maps in different figures in matplotlib?
                            
                                Testing if Python string variable holds number (int,float) or non-numeric str?
                            
                                Python's random module made inaccessible by Numpy's random module
                            
                                matching all characters in any order in regex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: garbage collection fails?

Tags:

python

garbage-collection

moooeeeep

People also ask

1 Answers

moooeeeep

Recent Activity

Donate For Us