As seen in the picture. 50 000 000 records only take 404M memory, why? Since one record takes 83 Bytes, 50 000 000 records should take 3967M memory.
>>> import sys
>>> a=[]
>>> for it in range(5*10**7):a.append("miJ8ZNFG9iFqiQQohvyTWwqsij2rJCiZ7v"+str(it))
...
>>> print(sys.getsizeof(a)/1024**2)
404.4306411743164
>>> print(sys.getsizeof("miJ8ZNFG9iFqiQQohvyTWwqsij2rJCiZ7v"))
83
>>> print(83*5*10**7/1024**2)
3957.7484130859375
>>>
When you create a list object, the list object by itself takes 64 bytes of memory, and each item adds 8 bytes of memory to the size of the list because of references to other objects.
Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers. Why? Because Python integers are objects, and objects have a lot of memory overhead.
The list is based on an array. An array is a set of elements ① of the same size, ② located in memory one after another, without gaps. Since elements are the same size and placed contiguously, it is easy to get an array item by index. All we need is the memory address of the very first element (the “head” of the array).
sys.getsizeof
only reports the cost of the list
itself, not its contents. So you're seeing the cost of storing the list
object header, plus (a little over) 50M pointers; you're likely on a 64 bit (eight byte) pointer system, thus storage for 50M pointers is ~400 MB. Getting the true size would require sys.getsizeof
to be called for each object, each object's __dict__
(if applicable), etc., recursively, and it won't be 100% accurate since some of the objects (e.g. small int
s) are likely shared; this is not a rabbit hole you want to go down.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With