I have the following code, that creates a million objects of a class foo:
for i in range(1000000):
bar = foo()
list_bar.append(bar)
The bar object is only 96 bytes, as determined by getsizeof()
. However, the append step takes almost 8GB of ram. Once the code exits the loop, the ram usage drops to expected amounts (size of the list + some overhead ~103MB). Only while the loop is running does the ram usage skyrocket. Why does this happen? Any workarounds?
PS: Using a generator is not an option, it has to be a list.
EDIT: xrange
doesn't help, using Python 3. The memory usage stays high only during the loop execution, and drops after the loop is through. Could append
have some non-obvious overhead?
When you create a list object, the list object by itself takes 64 bytes of memory, and each item adds 8 bytes of memory to the size of the list because of references to other objects.
Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers. Why? Because Python integers are objects, and objects have a lot of memory overhead.
getsizeof() function includes the marginal space usage, which includes the garbage collection overhead for the object. Meaning it returns the total space occupied by the object in addition to the garbage collection overhead for the spaces being used.
Most probably this is due to some unintended cyclical references made by the foo()
constructor; as normally Python objects will release memory instantly when the reference count drops to zero; now these would be freed later when the garbage collector gets a chance to run.
You can try to force the GC run after say 10000 iterations to see if it keeps the memory usage constant.
import gc
n = 1000000
list_bar = [ None ] * n
for i in range(n):
list_bar[i] = foo()
if i % 10000 == 0:
gc.collect()
If this relieves memory pressure then the memory usage is because of some reference cycles.
The resizing of a list has some overhead. If you know how many elements, then you can create the list beforehand, e.g.:
list_bar = [ foo() for _ in xrange(1000000) ]
should know the size of the array and not need to resize it; or create the list filled with None
:
n = 1000000
list_bar = [ None ] * n
for i in range(n):
list_bar[i] = foo()
append
should be using realloc
to grow the list, but old memory ought to be released as soon as possible; and all in all the overhead of all memory allocated should not sum to 8G for a list that is 100 MB at the end; it can be possible that the operating system is miscalculating the memory used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With