Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Python ensure the return value of __len__ is an integer when len is called?

class foo:
    def __init__(self, data):
        self.data = data
    def __len__(self):
        return self.data

If I run this by passing a string in for data I get an error when calling len on an instance of this class. Specifically I get 'str' object cannot be interpreted as an integer.

So does the return statement in __len__ have to be an integer? I would think if I am overriding it, it should be able to output whatever I want, so why is this not possible?

like image 549
Nate Stemen Avatar asked Mar 01 '17 01:03

Nate Stemen


People also ask

Does Len in Python return an integer?

Return: It returns an integer which is the length of the string.

What does __ len __ mean in Python?

Python __len__ is one of the various magic methods in Python programming language, it is basically used to implement the len() function in Python because whenever we call the len() function then internally __len__ magic method is called.

What does Len return in Python?

The function len() is one of Python's built-in functions. It returns the length of an object. For example, it can return the number of items in a list. You can use the function with many different data types.

How do you use LEN in integer in Python?

To get the length of an integer in Python: Use the str() class to convert the integer to a string, e.g. result = str(my_int) . Pass the string to the len() function, e.g. len(my_str) . The len() function will return the length of the string.


1 Answers

SHORT ANSWER

At the C-level, Python inserts __len__ into a special slot that catches the output of the call to __len__ and does some validation on it to ensure it is correct.


LONG ANSWER

In order to answer this, we have to go a bit down the rabbit hole of what happens under the hood when len is called in Python.

First, let's establish some behavior.

>>> class foo:
...     def __init__(self, data):
...         self.data = data
...     def __len__(self):
...         return self.data
...
>>> len(foo(-1))
Traceback:
...
ValueError: __len__() should return >= 0
>>> len(foo('5'))
Traceback:
...
TypeError: 'str' object cannot be interpreted as an integer
>>> len(foo(5))
5

When you call len, the C function builtin_len gets called. Let's take a look at this.

static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
    Py_ssize_t res;

    res = PyObject_Size(obj);  // <=== THIS IS WHAT IS IMPORTANT!!!
    if (res < 0 && PyErr_Occurred())
        return NULL;
    return PyLong_FromSsize_t(res);
}

You will notice that the PyObject_Size function is being called - this function will return the size of an arbitrary Python object. Let's move further down the rabbit hole.

Py_ssize_t
PyObject_Size(PyObject *o)
{
    PySequenceMethods *m;

    if (o == NULL) {
        null_error();
        return -1;
    }

    m = o->ob_type->tp_as_sequence;
    if (m && m->sq_length)
        return m->sq_length(o);  // <==== THIS IS WHAT IS IMPORTANT!!!

    return PyMapping_Size(o);
}

It checks if the type defines the sq_length function (sequence length), and if so, calls it to get the length. It appears that at the C level, Python categorizes all objects that define __len__ as either sequences or mappings (even if that's not how we would think of them at the Python level); in our case, Python thinks of this class a sequence, so it calls sq_length.


Let's take a quick aside: for builtin types (such as list, set, etc.) Python does not actually call a function to calculate the length, but accesses a value stored in a C struct, making this very fast. Each of these builtin types defines how to access this by assigning an accessor method to sq_length. Let's take a quick peek at how this is implemented for lists:

static Py_ssize_t
list_length(PyListObject *a)
{
    return Py_SIZE(a);  // <== THIS IS A MACRO for (PyVarObject*) a->ob_size;
}

static PySequenceMethods list_as_sequence = {
    ...
    (lenfunc)list_length,                       /* sq_length */
    ...
};

ob_size stores the object's size (i.e. number of elements in the list). So, when sq_length is called, it is sent to the list_length function to get the value of ob_size.


OK, so that's how it is done for a builtin type... how does it work for a custom class like our foo? Since the "dunder methods" (such as __len__) are special, Python detects them in our classes and treats them specially (specifically, inserting them into special slots).

Most of this is handled in typeobject.c. The __len__ function is intercepted and assigned to the sq_length slot (just like a builtin!) near the bottom of the file.

SQSLOT("__len__", sq_length, slot_sq_length, wrap_lenfunc,
       "__len__($self, /)\n--\n\nReturn len(self)."),

The slot_sq_length function is where we can finally answer your question.

static Py_ssize_t
slot_sq_length(PyObject *self)
{
    PyObject *res = call_method(self, &PyId___len__, NULL);
    Py_ssize_t len;

    if (res == NULL)
        return -1;
    len = PyNumber_AsSsize_t(res, PyExc_OverflowError);  // <=== HERE!!!
    Py_DECREF(res);
    if (len < 0) {  // <== AND HERE!!!
        if (!PyErr_Occurred())
            PyErr_SetString(PyExc_ValueError,
                            "__len__() should return >= 0");
        return -1;
    }
    return len;
}

Two things of note here:

  1. If a negative number is returned, a ValueError is raised with the message "__len__() should return >= 0". This is exactly the error received when I tried to call len(foo(-1))!
  2. Python tries to coerce the return value of __len__ to a Py_ssize_t before returning (Py_ssize_t is a signed version of size_t, which is like a special type of integer that is guaranteed to be able to index things in a container).

OK, let's look at the implementation of PyNumber_AsSsize_t. It's a bit long so I will omit the non-relevant stuff.

Py_ssize_t
PyNumber_AsSsize_t(PyObject *item, PyObject *err)
{
    Py_ssize_t result;
    PyObject *runerr;
    PyObject *value = PyNumber_Index(item);
    if (value == NULL)
        return -1;    
    /* OMITTED FOR BREVITY */

The relevant bit here is in PyNumber_Index, which Python uses to convert arbitrary objects to integers suitable for indexing. Here is where the actual answer to your question lies. I have annotated a bit.

PyObject *
PyNumber_Index(PyObject *item)
{
    PyObject *result = NULL;
    if (item == NULL) {
        return null_error();
    }

    if (PyLong_Check(item)) {  // IS THE OBJECT ALREADY AN int? IF SO, RETURN IT NOW.
        Py_INCREF(item);
        return item;
    }
    if (!PyIndex_Check(item)) {  // DOES THE OBJECT DEFINE __index__? IF NOT, FAIL.
        PyErr_Format(PyExc_TypeError,
                     "'%.200s' object cannot be interpreted "
                     "as an integer", item->ob_type->tp_name);
        return NULL;
    }
    result = item->ob_type->tp_as_number->nb_index(item);
    if (!result || PyLong_CheckExact(result))
        return result;
    if (!PyLong_Check(result)) {  // IF __index__ DOES NOT RETURN AN int, FAIL.
        PyErr_Format(PyExc_TypeError,
                     "__index__ returned non-int (type %.200s)",
                     result->ob_type->tp_name);
        Py_DECREF(result);
        return NULL;
    }
    /* Issue #17576: warn if 'result' not of exact type int. */
    if (PyErr_WarnFormat(PyExc_DeprecationWarning, 1,
            "__index__ returned non-int (type %.200s).  "
            "The ability to return an instance of a strict subclass of int "
            "is deprecated, and may be removed in a future version of Python.",
            result->ob_type->tp_name)) {
        Py_DECREF(result);
        return NULL;
    }
    return result;
}

Based on the error that you received, we can see that '5' does not define __index__. We can verify that for ourselves:

>>> '5'.__index__()
Traceback:
...
AttributeError: 'str' object has no attribute '__index__'
like image 137
SethMMorton Avatar answered Oct 01 '22 12:10

SethMMorton