A <code>tuple</code> takes less memory space in Python: <pre class="prettyprint"><code>>>> a = (1,2,3) >>> a.__sizeof__() 48 </code></pre> whereas <code>list</code>s takes more memory space: <pre class="prettyprint"><code>>>> b = [1,2,3] >>> b.__sizeof__() 64 </code></pre> What happens internally on the Python memory management?

I assume you're using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python. Regardless of the implementation, <code>list</code>s are variable-sized while <code>tuple</code>s are fixed-size. So <code>tuple</code>s can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes. But there's another thing that <code>list</code>s do: They over-allocate. Otherwise <code>list.append</code> would be an <code>O(n)</code> operation always - to make it amortized <code>O(1)</code> (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (<code>tuple</code>s only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes. So <code>list</code>s need at least 16 bytes more memory than <code>tuple</code>s. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history: <pre class="prettyprint"><code>>>> l = [1,2,3] >>> l.__sizeof__() 64 >>> l.append(4) # triggers re-allocation (with over-allocation), because the original list is full >>> l.__sizeof__() 96 >>> l = [] >>> l.__sizeof__() 40 >>> l.append(1) # re-allocation with over-allocation >>> l.__sizeof__() 72 >>> l.append(2) # no re-alloc >>> l.append(3) # no re-alloc >>> l.__sizeof__() 72 >>> l.append(4) # still has room, so no over-allocation needed (yet) >>> l.__sizeof__() 72 </code></pre> <h3>Images</h3> I decided to create some images to accompany the explanation above. Maybe these are helpful This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles: <img src="https://i.stack.imgur.com/ZRop8.png" alt="enter image description here"> That's actually just an approximation because <code>int</code> objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be: <img src="https://i.stack.imgur.com/lmR8d.png" alt="enter image description here"> Useful links: <ul> <li><code>tuple</code> struct in CPython repository for Python 2.7</li> <li><code>list</code> struct in CPython repository for Python 2.7</li> <li><code>int</code> struct in CPython repository for Python 2.7</li> </ul> Note that <code>__sizeof__</code> doesn't really return the "correct" size! It only returns the size of the stored values. However when you use <code>sys.getsizeof</code> the result is different: <pre class="prettyprint"><code>>>> import sys >>> l = [1,2,3] >>> t = (1, 2, 3) >>> sys.getsizeof(l) 88 >>> sys.getsizeof(t) 72 </code></pre> There are 24 "extra" bytes. These are real, that's the garbage collector overhead that isn't accounted for in the <code>__sizeof__</code> method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: <code>sys.getsizeof</code> (which actually adds the GC overhead to the value returned from <code>__sizeof__</code>).

I'll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won't touch on that. I'm going to use 64-bit values here, as you are. <hr> The size for <code>list</code>s is calculated from the following function, <code>list_sizeof</code>: <pre class="prettyprint lang-c prettyprint-override"><code>static PyObject * list_sizeof(PyListObject *self) { Py_ssize_t res; res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*); return PyInt_FromSsize_t(res); } </code></pre> Here <code>Py_TYPE(self)</code> is a macro that grabs the <code>ob_type</code> of <code>self</code> (returning <code>PyList_Type</code>) while <code>_PyObject_SIZE</code> is another macro that grabs <code>tp_basicsize</code> from that type. <code>tp_basicsize</code> is calculated as <code>sizeof(PyListObject)</code> where <code>PyListObject</code> is the instance struct. The <code>PyListObject</code> structure has three fields: <pre class="prettyprint lang-c prettyprint-override"><code>PyObject_VAR_HEAD # 24 bytes PyObject **ob_item; # 8 bytes Py_ssize_t allocated; # 8 bytes </code></pre> these have comments (which I trimmed) explaining what they are, follow the link above to read them. <code>PyObject_VAR_HEAD</code> expands into three 8 byte fields (<code>ob_refcount</code>, <code>ob_type</code> and <code>ob_size</code>) so a <code>24</code> byte contribution. So for now <code>res</code> is: <pre class="prettyprint"><code>sizeof(PyListObject) + self->allocated * sizeof(void*) </code></pre> or: <pre class="prettyprint"><code>40 + self->allocated * sizeof(void*) </code></pre> If the list instance has elements that are allocated. the second part calculates their contribution. <code>self->allocated</code>, as it's name implies, holds the number of allocated elements. Without any elements, the size of lists is calculated to be: <pre class="prettyprint"><code>>>> [].__sizeof__() 40 </code></pre> i.e the size of the instance struct. <hr> <code>tuple</code> objects don't define a <code>tuple_sizeof</code> function. Instead, they use <code>object_sizeof</code> to calculate their size: <pre class="prettyprint lang-c prettyprint-override"><code>static PyObject * object_sizeof(PyObject *self, PyObject *args) { Py_ssize_t res, isize; res = 0; isize = self->ob_type->tp_itemsize; if (isize > 0) res = Py_SIZE(self) * isize; res += self->ob_type->tp_basicsize; return PyInt_FromSsize_t(res); } </code></pre> This, as for <code>list</code>s, grabs the <code>tp_basicsize</code> and, if the object has a non-zero <code>tp_itemsize</code> (meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via <code>Py_SIZE</code>) with <code>tp_itemsize</code>. <code>tp_basicsize</code> again uses <code>sizeof(PyTupleObject)</code> where the <code>PyTupleObject</code> struct contains: <pre class="prettyprint lang-c prettyprint-override"><code>PyObject_VAR_HEAD # 24 bytes PyObject *ob_item[1]; # 8 bytes </code></pre> So, without any elements (that is, <code>Py_SIZE</code> returns <code>0</code>) the size of empty tuples is equal to <code>sizeof(PyTupleObject)</code>: <pre class="prettyprint"><code>>>> ().__sizeof__() 24 </code></pre> huh? Well, here's an oddity which I haven't found an explanation for, the <code>tp_basicsize</code> of <code>tuple</code>s is actually calculated as follows: <pre class="prettyprint lang-c prettyprint-override"><code>sizeof(PyTupleObject) - sizeof(PyObject *) </code></pre> why an additional <code>8</code> bytes is removed from <code>tp_basicsize</code> is something I haven't been able to find out. (See MSeifert's comment for a possible explanation) <hr> But, this is basically the difference in your specific example. <code>list</code>s also keep around a number of allocated elements which helps determine when to over-allocate again. Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert's covers nicely in his answer.

Why do tuples take less space in memory than lists?

Tags:

python

list

python-internals

tuples

python-2.7

A tuple takes less memory space in Python:

>>> a = (1,2,3) >>> a.__sizeof__() 48

whereas lists takes more memory space:

>>> b = [1,2,3] >>> b.__sizeof__() 64

What happens internally on the Python memory management?

757

asked Oct 10 '17 10:10

JON

Video Answer

2 Answers

I assume you're using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.

Regardless of the implementation, lists are variable-sized while tuples are fixed-size.

So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes.

But there's another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always - to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes.

So lists need at least 16 bytes more memory than tuples. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:

>>> l = [1,2,3] >>> l.__sizeof__() 64 >>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full >>> l.__sizeof__() 96  >>> l = [] >>> l.__sizeof__() 40 >>> l.append(1)  # re-allocation with over-allocation >>> l.__sizeof__() 72 >>> l.append(2)  # no re-alloc >>> l.append(3)  # no re-alloc >>> l.__sizeof__() 72 >>> l.append(4)  # still has room, so no over-allocation needed (yet) >>> l.__sizeof__() 72

Images

I decided to create some images to accompany the explanation above. Maybe these are helpful

This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:

enter image description here

That's actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:

enter image description here

Useful links:

tuple struct in CPython repository for Python 2.7
list struct in CPython repository for Python 2.7
int struct in CPython repository for Python 2.7

Note that __sizeof__ doesn't really return the "correct" size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:

>>> import sys >>> l = [1,2,3] >>> t = (1, 2, 3) >>> sys.getsizeof(l) 88 >>> sys.getsizeof(t) 72

There are 24 "extra" bytes. These are real, that's the garbage collector overhead that isn't accounted for in the __sizeof__ method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).

154

answered Oct 20 '22 00:10

MSeifert

I'll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won't touch on that.

I'm going to use 64-bit values here, as you are.

The size for lists is calculated from the following function, list_sizeof:

static PyObject * list_sizeof(PyListObject *self) {     Py_ssize_t res;      res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);     return PyInt_FromSsize_t(res); }

Here Py_TYPE(self) is a macro that grabs the ob_type of self (returning PyList_Type) while _PyObject_SIZE is another macro that grabs tp_basicsize from that type. tp_basicsize is calculated as sizeof(PyListObject) where PyListObject is the instance struct.

The PyListObject structure has three fields:

PyObject_VAR_HEAD     # 24 bytes  PyObject **ob_item;   #  8 bytes Py_ssize_t allocated; #  8 bytes

these have comments (which I trimmed) explaining what they are, follow the link above to read them. PyObject_VAR_HEAD expands into three 8 byte fields (ob_refcount, ob_type and ob_size) so a 24 byte contribution.

So for now res is:

sizeof(PyListObject) + self->allocated * sizeof(void*)

or:

40 + self->allocated * sizeof(void*)

If the list instance has elements that are allocated. the second part calculates their contribution. self->allocated, as it's name implies, holds the number of allocated elements.

Without any elements, the size of lists is calculated to be:

>>> [].__sizeof__() 40

i.e the size of the instance struct.

tuple objects don't define a tuple_sizeof function. Instead, they use object_sizeof to calculate their size:

static PyObject * object_sizeof(PyObject *self, PyObject *args) {     Py_ssize_t res, isize;      res = 0;     isize = self->ob_type->tp_itemsize;     if (isize > 0)         res = Py_SIZE(self) * isize;     res += self->ob_type->tp_basicsize;      return PyInt_FromSsize_t(res); }

This, as for lists, grabs the tp_basicsize and, if the object has a non-zero tp_itemsize (meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via Py_SIZE) with tp_itemsize.

tp_basicsize again uses sizeof(PyTupleObject) where the PyTupleObject struct contains:

PyObject_VAR_HEAD       # 24 bytes  PyObject *ob_item[1];   # 8  bytes

So, without any elements (that is, Py_SIZE returns 0) the size of empty tuples is equal to sizeof(PyTupleObject):

>>> ().__sizeof__() 24

huh? Well, here's an oddity which I haven't found an explanation for, the tp_basicsize of tuples is actually calculated as follows:

sizeof(PyTupleObject) - sizeof(PyObject *)

why an additional 8 bytes is removed from tp_basicsize is something I haven't been able to find out. (See MSeifert's comment for a possible explanation)

But, this is basically the difference in your specific example. lists also keep around a number of allocated elements which helps determine when to over-allocate again.

Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert's covers nicely in his answer.