I just experimented with the size of python data structures in memory. I wrote the following snippet:
import sys lst1=[] lst1.append(1) lst2=[1] print(sys.getsizeof(lst1), sys.getsizeof(lst2))
I tested the code on the following configurations:
52 40
so lst1 has 52 bytes and lst2 has 40 bytes.48 32
48 36
Can anyone explain to me why the two sizes differ although both are lists containing a 1?
In the python documentation for the getsizeof function I found the following: ...adds an additional garbage collector overhead if the object is managed by the garbage collector.
Could this be the case in my little example?
In order to determine the size of the list, we have passed the list object to the getsizeof() function which on execution return the sizes of list1 and list2 in bytes along with garbage collector overhead. List1 and list2 are occupying 112 bytes and 104 bytes in the memory.
An empty list takes 56 bytes, but each additional int adds just 8 bytes, where the size of an int is 28 bytes. A list that contains a long string takes just 64 bytes.
List has a large memory. Tuple is stored in a single block of memory. Creating a tuple is faster than creating a list. Creating a list is slower because two memory blocks need to be accessed.
Python keeps a pointer to this array and the array's length is stored in a list head structure. This makes indexing of a list independent of the size of the list or the value of the index. When items are appended or inserted the array of references is resized.
Here's a fuller interactive session that will help me explain what's going on (Python 2.6 on Windows XP 32-bit, but it doesn't matter really):
>>> import sys >>> sys.getsizeof([]) 36 >>> sys.getsizeof([1]) 40 >>> lst = [] >>> lst.append(1) >>> sys.getsizeof(lst) 52 >>>
Note that the empty list is a bit smaller than the one with [1]
in it. When an element is appended, however, it grows much larger.
The reason for this is the implementation details in Objects/listobject.c
, in the source of CPython.
When an empty list []
is created, no space for elements is allocated - this can be seen in PyList_New
. 36 bytes is the amount of space required for the list data structure itself on a 32-bit machine.
When a list with a single element [1]
is created, space for one element is allocated in addition to the memory required by the list data structure itself. Again, this can be found in PyList_New
. Given size
as argument, it computes:
nbytes = size * sizeof(PyObject *);
And then has:
if (size <= 0) op->ob_item = NULL; else { op->ob_item = (PyObject **) PyMem_MALLOC(nbytes); if (op->ob_item == NULL) { Py_DECREF(op); return PyErr_NoMemory(); } memset(op->ob_item, 0, nbytes); } Py_SIZE(op) = size; op->allocated = size;
So we see that with size = 1
, space for one pointer is allocated. 4 bytes (on my 32-bit box).
When calling append
on an empty list, here's what happens:
PyList_Append
calls app1
app1
asks for the list's size (and gets 0 as an answer)app1
then calls list_resize
with size+1
(1 in our case)list_resize
has an interesting allocation strategy, summarized in this comment from its source.Here it is:
/* This over-allocates proportional to the list size, making room * for additional growth. The over-allocation is mild, but is * enough to give linear-time amortized behavior over a long * sequence of appends() in the presence of a poorly-performing * system realloc(). * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... */ new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6); /* check for integer overflow */ if (new_allocated > PY_SIZE_MAX - newsize) { PyErr_NoMemory(); return -1; } else { new_allocated += newsize; }
Let's see how the numbers I quoted in the session in the beginning of my article are reached.
So 36 bytes is the size required by the list data structure itself on 32-bit. With a single element, space is allocated for one pointer, so that's 4 extra bytes - total 40 bytes. OK so far.
When app1
is called on an empty list, it calls list_resize
with size=1
. According to the over-allocation algorithm of list_resize
, the next largest available size after 1 is 4, so place for 4 pointers will be allocated. 4 * 4 = 16 bytes, and 36 + 16 = 52.
Indeed, everything makes sense :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With