Python list of Objects taking up too much memory

Tags:

I have the following code, that creates a million objects of a class foo:

for i in range(1000000):
    bar = foo()
    list_bar.append(bar)

The bar object is only 96 bytes, as determined by getsizeof(). However, the append step takes almost 8GB of ram. Once the code exits the loop, the ram usage drops to expected amounts (size of the list + some overhead ~103MB). Only while the loop is running does the ram usage skyrocket. Why does this happen? Any workarounds? PS: Using a generator is not an option, it has to be a list.

EDIT: xrange doesn't help, using Python 3. The memory usage stays high only during the loop execution, and drops after the loop is through. Could append have some non-obvious overhead?

559

asked Mar 11 '15 11:03

Srivathsan Jayaraman

1 Answers

Most probably this is due to some unintended cyclical references made by the foo() constructor; as normally Python objects will release memory instantly when the reference count drops to zero; now these would be freed later when the garbage collector gets a chance to run.

You can try to force the GC run after say 10000 iterations to see if it keeps the memory usage constant.

import gc
n = 1000000
list_bar = [ None ] * n
for i in range(n):
    list_bar[i] = foo()
    if i % 10000 == 0:
        gc.collect()

If this relieves memory pressure then the memory usage is because of some reference cycles.

The resizing of a list has some overhead. If you know how many elements, then you can create the list beforehand, e.g.:

list_bar = [ foo() for _ in xrange(1000000) ]

should know the size of the array and not need to resize it; or create the list filled with None:

n = 1000000
list_bar = [ None ] * n
for i in range(n):
    list_bar[i] = foo()

append should be using realloc to grow the list, but old memory ought to be released as soon as possible; and all in all the overhead of all memory allocated should not sum to 8G for a list that is 100 MB at the end; it can be possible that the operating system is miscalculating the memory used.

answered Nov 15 '22 05:11

Antti Haapala -- Слава Україні

Related questions
                            
                                Set Working Directory to Notebook Directory
                            
                                Celery pickle type content disallowed error
                            
                                How to run a single deploy when Travis builds succeeds?
                            
                                Pandas read csv replacing #DIV/0! and #VALUE! with NaN
                            
                                Django: Passing variable from get_context_data() to post()
                            
                                Multiple characters in Python ord function
                            
                                What does this error mean "idf vector not fitted"
                            
                                Numpy Array to Graph
                            
                                Pandas: Aggregate by month for every subgroup
                            
                                Sorting dictionary keys by value, then those with the same value alphabetically
                            
                                pandas plot doesn't show in ipython notebook as inline
                            
                                Vectorized spherical bessel functions in python?
                            
                                ImportError when using console_scripts in setuptools
                            
                                Python RESTful client like Guzzle from PHP
                            
                                Boto-like library for Google Cloud Storage
                            
                                axes.fmt_xdata in matplotlib not being called
                            
                                matplotlib: Set width or height of figure without changing aspect ratio
                            
                                Prevent matplotlib from interpreting underscore as subscript in plot title
                            
                                Pycharm 3.4.1 - "AppRegistryNotReady: Models aren't loaded yet". Django Rest framewrok
                            
                                US Census API - Get The Population of Every City in a State Using Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python list of Objects taking up too much memory

Tags:

python

list

memory

python-3.x

Srivathsan Jayaraman

People also ask

1 Answers

Antti Haapala -- Слава Україні

Recent Activity

Donate For Us