I was doing a foolish thing like:
from itertools import *
rows = combinations(range(0, 1140), 17)
all_rows = []
for row in rows:
all_rows.append(row)
No surprise; I run out of memory address space (32 bit python 3.1
)
My question is: how do I calculate how much memory address space I will need for a large list? In this case the list is on the order of 2.3X10^37
.
Is there a function in Python that returns the information I am looking for, or actually the size of a smaller but similar list? What are those tools?
When you create a list object, the list object by itself takes 64 bytes of memory, and each item adds 8 bytes of memory to the size of the list because of references to other objects.
Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers. Why? Because Python integers are objects, and objects have a lot of memory overhead.
The issue is that 32-bit python only has access to ~4GB of RAM. This can shrink even further if your operating system is 32-bit, because of the operating system overhead. For example, in Python 2 zip function takes in multiple iterables and returns a single iterator of tuples.
Python keeps a pointer to this array and the array's length is stored in a list head structure. This makes indexing of a list independent of the size of the list or the value of the index. When items are appended or inserted the array of references is resized.
There's a handy function sys.getsizeof()
(since Python 2.6) that helps with this:
>>> import sys
>>> sys.getsizeof(1) # integer
12
>>> sys.getsizeof([]) # empty list
36
>>> sys.getsizeof(()) # empty tuple
28
>>> sys.getsizeof((1,)) # tuple with one element
32
From that you can see that each integer takes up 12 bytes, and the memory for each reference in a list or tuple is 4 bytes (on a 32-bit machine) plus the overhead (36 or 28 bytes respectively).
If your result has tuples of length 17 with integers, then you'd have 17*(12+4)+28
or 300 bytes per tuple. The result itself is a list, so 36 bytes plus 4 per reference. Find out how long the list will be (call it N) and you have 36+N*(4+300)
as the total bytes required.
Edit: There's one other thing that could significantly affect that result. Python creates new integer objects as required for most integer values, but for small ones (empirically determined to be the range [-5, 256] on Python 2.6.4 on Windows) it pre-creates them all and re-uses them. If a large portion of your values are less than 257 this would significantly reduce the memory consumption. (On Python 257 is not 257+0
;-)).
Well first rather than writing:
all_rows = []
for row in rows:
all_rows.append(row)
You can simply write:
all_rows = list(rows)
which will be quite a bit more efficient.
Then, there are two things to consider for memory consumption of a list:
By the way, in recent Python versions you can use sys.getsizeof()
to get the size of an object:
>>> import sys
>>> sys.getsizeof([None] * 100)
872
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With