Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python list of Objects taking up too much memory

I have the following code, that creates a million objects of a class foo:

for i in range(1000000):
    bar = foo()
    list_bar.append(bar)

The bar object is only 96 bytes, as determined by getsizeof(). However, the append step takes almost 8GB of ram. Once the code exits the loop, the ram usage drops to expected amounts (size of the list + some overhead ~103MB). Only while the loop is running does the ram usage skyrocket. Why does this happen? Any workarounds? PS: Using a generator is not an option, it has to be a list.

EDIT: xrange doesn't help, using Python 3. The memory usage stays high only during the loop execution, and drops after the loop is through. Could append have some non-obvious overhead?

like image 559
Srivathsan Jayaraman Avatar asked Mar 11 '15 11:03

Srivathsan Jayaraman


People also ask

How much memory does a list take up Python?

When you create a list object, the list object by itself takes 64 bytes of memory, and each item adds 8 bytes of memory to the size of the list because of references to other objects.

Why does my Python program use so much memory?

Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers. Why? Because Python integers are objects, and objects have a lot of memory overhead.

How do I check memory list size?

getsizeof() function includes the marginal space usage, which includes the garbage collection overhead for the object. Meaning it returns the total space occupied by the object in addition to the garbage collection overhead for the spaces being used.


1 Answers

Most probably this is due to some unintended cyclical references made by the foo() constructor; as normally Python objects will release memory instantly when the reference count drops to zero; now these would be freed later when the garbage collector gets a chance to run.

You can try to force the GC run after say 10000 iterations to see if it keeps the memory usage constant.

import gc
n = 1000000
list_bar = [ None ] * n
for i in range(n):
    list_bar[i] = foo()
    if i % 10000 == 0:
        gc.collect()

If this relieves memory pressure then the memory usage is because of some reference cycles.


The resizing of a list has some overhead. If you know how many elements, then you can create the list beforehand, e.g.:

list_bar = [ foo() for _ in xrange(1000000) ]

should know the size of the array and not need to resize it; or create the list filled with None:

n = 1000000
list_bar = [ None ] * n
for i in range(n):
    list_bar[i] = foo()

append should be using realloc to grow the list, but old memory ought to be released as soon as possible; and all in all the overhead of all memory allocated should not sum to 8G for a list that is 100 MB at the end; it can be possible that the operating system is miscalculating the memory used.