Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

__sizeof__ str is larger than __sizeof__ a tuple containing that string

The following code produces the given output.

import sys

print('ex1:')
ex1 = 'Hello'
print('\t', ex1.__sizeof__())

print('\nex2:')
ex2 = ('Hello', 53)
print('\t', ex2.__sizeof__())

Output:

ex1:
     54    
ex2:
     40

Why does __sizeof__() print a smaller result when a second element is considered? Shouldn't the output be larger? I realize from this answer that I should be using sys.getsizeof(), but the behavior seems odd nonetheless. I'm using Python 3.5.2.

Also, as @Herbert pointed out, 'Hello' takes up more memory than ('Hello',), which is a tuple. Why is this?

like image 326
Clint Avatar asked Nov 22 '16 16:11

Clint


People also ask

What is __ sizeof __ in Python?

The Python __sizeof__() method returns the size of the object in bytes. The sys. getsizeof() method internally call's __sizeof__() and adds some additional byte overhead, e.g., for garbage collection.

How big can a tuple be?

If you mean what the maximum size of a tuple or list is, I assume that it is very large. Most likely you would run out of memory before hitting some limit. Someone else can add to this if they have specific knowledge of the indexing, possible 2.1 or 4.2 billion items in 32-bit and 8 or 16 sextillion in 64-bit.

Can we increase the size of tuple in Python?

Python has two similar sequence types such as tuples and lists. The most well-known difference between them is that tuples are immutable, that is, you cannot change their size as well as their immutable objects.

How many dimensions is a tuple?

The dimensions of the tuple are not known, but are either one, or two dimensions. Tuples can take the form: One dimensional, from 1 element to n, examples. Two dimensional examples.


1 Answers

This is due to the fact that tuple objects (and I'm pretty sure all containers except from string) assess their size not by including the actual sizes of their respective contents but, rather, by calculating the size of pointers to PyObjects times the elements they contain. That is, they hold pointers to the (generic) PyObjects contained and that's what contributes to its overall size.

This is hinted in the Data Model chapter of the Python Reference manual:

Some objects contain references to other objects; these are called containers. Examples of containers are tuples, lists and dictionaries. The references are part of a container’s value.

(I'm emphasizing the word references.)

In PyTupleType, a struct where the information on the tuple type is contained, we see that the tp_itemsize field has sizeof(PyObject *) as its value:

PyTypeObject PyTuple_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "tuple",
    sizeof(PyTupleObject) - sizeof(PyObject *),
    sizeof(PyObject *),  // <-- sizeof pointer to PyObject's

32bit builds and 64bit builds of Python have a sizeof(PyObject *) equal to 8 bytes.

This is the value that is going to be multiplied by the number of items contained in the tuple instance. When we look at object_size, the __sizeof__ method that tuples inherit from object (examine object.__sizeof__ is tuple.__sizeof__), we see this clearly:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;  // <-- num_elements * tp_itemsize
    res += self->ob_type->tp_basicsize;

    return PyLong_FromSsize_t(res);
}

see how isize (obtained from tp_itemsize) is multiplied by Py_SIZE(self), which, is another macro that grabs the ob_size value indicating the number of elements inside the tuple.

This is why, even if we create a somewhat large string inside a tuple instance:

t = ("Hello" * 2 ** 10,)

with the element inside it having a size of:

t[0].__sizeof__()         # 5169

the size of the tuple instance:

t.__sizeof__()            # 32

equals that of one with simply "Hello" inside:

t2 = ("Hello",)
t[0].__sizeof__()         # 54
t2.__sizeof__()           # 32 Tuple size stays the same.

For strings, each individual character increases the value returned from str.__sizeof__. This, along with the fact that tuples only store pointers, gives a misleading impression that "Hello" has a larger size than the tuple containing it.

Just for completeness, unicode__sizeof__ is the one that computes this. It really just multiplies the length of the string with the character size (which depends on what kind the character is 1, 2 and 4 byte chars).

The only thing I'm not getting with tuples is why it's basic size (indicated by tb_basicsize) is listed as sizeof(PyTupleObject) - sizeof(PyObject *). This sheds 8 bytes from the overall size returned; I haven't found any explanation for this (yet).

like image 193
Dimitris Fasarakis Hilliard Avatar answered Sep 28 '22 11:09

Dimitris Fasarakis Hilliard