<pre class="prettyprint"><code>class foo: def __init__(self, data): self.data = data def __len__(self): return self.data </code></pre> If I run this by passing a string in for <code>data</code> I get an error when calling <code>len</code> on an instance of this class. Specifically I get <code>'str' object cannot be interpreted as an integer</code>. So does the <code>return</code> statement in <code>__len__</code> have to be an integer? I would think if I am overriding it, it should be able to output whatever I want, so why is this not possible?

SHORT ANSWER At the C-level, Python inserts <code>__len__</code> into a special slot that catches the output of the call to <code>__len__</code> and does some validation on it to ensure it is correct. <hr> LONG ANSWER In order to answer this, we have to go a bit down the rabbit hole of what happens under the hood when <code>len</code> is called in Python. First, let's establish some behavior. <pre class="prettyprint"><code>>>> class foo: ... def __init__(self, data): ... self.data = data ... def __len__(self): ... return self.data ... >>> len(foo(-1)) Traceback: ... ValueError: __len__() should return >= 0 >>> len(foo('5')) Traceback: ... TypeError: 'str' object cannot be interpreted as an integer >>> len(foo(5)) 5 </code></pre> When you call <code>len</code>, the C function <code>builtin_len</code> gets called. Let's take a look at this. <pre class="prettyprint"><code>static PyObject * builtin_len(PyObject *module, PyObject *obj) /*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/ { Py_ssize_t res; res = PyObject_Size(obj); // <=== THIS IS WHAT IS IMPORTANT!!! if (res < 0 && PyErr_Occurred()) return NULL; return PyLong_FromSsize_t(res); } </code></pre> You will notice that the <code>PyObject_Size</code> function is being called - this function will return the size of an arbitrary Python object. Let's move further down the rabbit hole. <pre class="prettyprint"><code>Py_ssize_t PyObject_Size(PyObject *o) { PySequenceMethods *m; if (o == NULL) { null_error(); return -1; } m = o->ob_type->tp_as_sequence; if (m && m->sq_length) return m->sq_length(o); // <==== THIS IS WHAT IS IMPORTANT!!! return PyMapping_Size(o); } </code></pre> It checks if the type defines the <code>sq_length</code> function (sequence length), and if so, calls it to get the length. It appears that at the C level, Python categorizes all objects that define <code>__len__</code> as either sequences or mappings (even if that's not how we would think of them at the Python level); in our case, Python thinks of this class a sequence, so it calls <code>sq_length</code>. <hr> Let's take a quick aside: for builtin types (such as <code>list</code>, <code>set</code>, etc.) Python does not actually call a function to calculate the length, but accesses a value stored in a C struct, making this very fast. Each of these builtin types defines how to access this by assigning an accessor method to <code>sq_length</code>. Let's take a quick peek at how this is implemented for lists: <pre class="prettyprint"><code>static Py_ssize_t list_length(PyListObject *a) { return Py_SIZE(a); // <== THIS IS A MACRO for (PyVarObject*) a->ob_size; } static PySequenceMethods list_as_sequence = { ... (lenfunc)list_length, /* sq_length */ ... }; </code></pre> <code>ob_size</code> stores the object's size (i.e. number of elements in the list). So, when <code>sq_length</code> is called, it is sent to the <code>list_length</code> function to get the value of <code>ob_size</code>. <hr> OK, so that's how it is done for a builtin type... how does it work for a custom class like our <code>foo</code>? Since the "dunder methods" (such as <code>__len__</code>) are special, Python detects them in our classes and treats them specially (specifically, inserting them into special slots). Most of this is handled in typeobject.c. The <code>__len__</code> function is intercepted and assigned to the <code>sq_length</code> slot (just like a builtin!) near the bottom of the file. <pre class="prettyprint"><code>SQSLOT("__len__", sq_length, slot_sq_length, wrap_lenfunc, "__len__($self, /)\n--\n\nReturn len(self)."), </code></pre> The <code>slot_sq_length</code> function is where we can finally answer your question. <pre class="prettyprint"><code>static Py_ssize_t slot_sq_length(PyObject *self) { PyObject *res = call_method(self, &PyId___len__, NULL); Py_ssize_t len; if (res == NULL) return -1; len = PyNumber_AsSsize_t(res, PyExc_OverflowError); // <=== HERE!!! Py_DECREF(res); if (len < 0) { // <== AND HERE!!! if (!PyErr_Occurred()) PyErr_SetString(PyExc_ValueError, "__len__() should return >= 0"); return -1; } return len; } </code></pre> Two things of note here: <ol> <li>If a negative number is returned, a <code>ValueError</code> is raised with the message <code>"__len__() should return >= 0"</code>. This is exactly the error received when I tried to call <code>len(foo(-1))</code>!</li> <li>Python tries to coerce the return value of <code>__len__</code> to a <code>Py_ssize_t</code> before returning (<code>Py_ssize_t</code> is a signed version of <code>size_t</code>, which is like a special type of integer that is guaranteed to be able to index things in a container).</li> </ol> OK, let's look at the implementation of <code>PyNumber_AsSsize_t</code>. It's a bit long so I will omit the non-relevant stuff. <pre class="prettyprint"><code>Py_ssize_t PyNumber_AsSsize_t(PyObject *item, PyObject *err) { Py_ssize_t result; PyObject *runerr; PyObject *value = PyNumber_Index(item); if (value == NULL) return -1; /* OMITTED FOR BREVITY */ </code></pre> The relevant bit here is in <code>PyNumber_Index</code>, which Python uses to convert arbitrary objects to integers suitable for indexing. Here is where the actual answer to your question lies. I have annotated a bit. <pre class="prettyprint"><code>PyObject * PyNumber_Index(PyObject *item) { PyObject *result = NULL; if (item == NULL) { return null_error(); } if (PyLong_Check(item)) { // IS THE OBJECT ALREADY AN int? IF SO, RETURN IT NOW. Py_INCREF(item); return item; } if (!PyIndex_Check(item)) { // DOES THE OBJECT DEFINE __index__? IF NOT, FAIL. PyErr_Format(PyExc_TypeError, "'%.200s' object cannot be interpreted " "as an integer", item->ob_type->tp_name); return NULL; } result = item->ob_type->tp_as_number->nb_index(item); if (!result || PyLong_CheckExact(result)) return result; if (!PyLong_Check(result)) { // IF __index__ DOES NOT RETURN AN int, FAIL. PyErr_Format(PyExc_TypeError, "__index__ returned non-int (type %.200s)", result->ob_type->tp_name); Py_DECREF(result); return NULL; } /* Issue #17576: warn if 'result' not of exact type int. */ if (PyErr_WarnFormat(PyExc_DeprecationWarning, 1, "__index__ returned non-int (type %.200s). " "The ability to return an instance of a strict subclass of int " "is deprecated, and may be removed in a future version of Python.", result->ob_type->tp_name)) { Py_DECREF(result); return NULL; } return result; } </code></pre> Based on the error that you received, we can see that <code>'5'</code> does not define <code>__index__</code>. We can verify that for ourselves: <pre class="prettyprint"><code>>>> '5'.__index__() Traceback: ... AttributeError: 'str' object has no attribute '__index__' </code></pre>

How does Python ensure the return value of len is an integer when len is called?

Tags:

python

class

operator-overloading

class foo:
    def __init__(self, data):
        self.data = data
    def __len__(self):
        return self.data

If I run this by passing a string in for data I get an error when calling len on an instance of this class. Specifically I get 'str' object cannot be interpreted as an integer.

So does the return statement in __len__ have to be an integer? I would think if I am overriding it, it should be able to output whatever I want, so why is this not possible?

549

asked Mar 01 '17 01:03

Nate Stemen

1 Answers

SHORT ANSWER

At the C-level, Python inserts __len__ into a special slot that catches the output of the call to __len__ and does some validation on it to ensure it is correct.

LONG ANSWER

In order to answer this, we have to go a bit down the rabbit hole of what happens under the hood when len is called in Python.

First, let's establish some behavior.

>>> class foo:
...     def __init__(self, data):
...         self.data = data
...     def __len__(self):
...         return self.data
...
>>> len(foo(-1))
Traceback:
...
ValueError: __len__() should return >= 0
>>> len(foo('5'))
Traceback:
...
TypeError: 'str' object cannot be interpreted as an integer
>>> len(foo(5))
5

When you call len, the C function builtin_len gets called. Let's take a look at this.

static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
    Py_ssize_t res;

    res = PyObject_Size(obj);  // <=== THIS IS WHAT IS IMPORTANT!!!
    if (res < 0 && PyErr_Occurred())
        return NULL;
    return PyLong_FromSsize_t(res);
}

You will notice that the PyObject_Size function is being called - this function will return the size of an arbitrary Python object. Let's move further down the rabbit hole.

Py_ssize_t
PyObject_Size(PyObject *o)
{
    PySequenceMethods *m;

    if (o == NULL) {
        null_error();
        return -1;
    }

    m = o->ob_type->tp_as_sequence;
    if (m && m->sq_length)
        return m->sq_length(o);  // <==== THIS IS WHAT IS IMPORTANT!!!

    return PyMapping_Size(o);
}

It checks if the type defines the sq_length function (sequence length), and if so, calls it to get the length. It appears that at the C level, Python categorizes all objects that define __len__ as either sequences or mappings (even if that's not how we would think of them at the Python level); in our case, Python thinks of this class a sequence, so it calls sq_length.

Let's take a quick aside: for builtin types (such as list, set, etc.) Python does not actually call a function to calculate the length, but accesses a value stored in a C struct, making this very fast. Each of these builtin types defines how to access this by assigning an accessor method to sq_length. Let's take a quick peek at how this is implemented for lists:

static Py_ssize_t
list_length(PyListObject *a)
{
    return Py_SIZE(a);  // <== THIS IS A MACRO for (PyVarObject*) a->ob_size;
}

static PySequenceMethods list_as_sequence = {
    ...
    (lenfunc)list_length,                       /* sq_length */
    ...
};

ob_size stores the object's size (i.e. number of elements in the list). So, when sq_length is called, it is sent to the list_length function to get the value of ob_size.

OK, so that's how it is done for a builtin type... how does it work for a custom class like our foo? Since the "dunder methods" (such as __len__) are special, Python detects them in our classes and treats them specially (specifically, inserting them into special slots).

Most of this is handled in typeobject.c. The __len__ function is intercepted and assigned to the sq_length slot (just like a builtin!) near the bottom of the file.

SQSLOT("__len__", sq_length, slot_sq_length, wrap_lenfunc,
       "__len__($self, /)\n--\n\nReturn len(self)."),

The slot_sq_length function is where we can finally answer your question.

static Py_ssize_t
slot_sq_length(PyObject *self)
{
    PyObject *res = call_method(self, &PyId___len__, NULL);
    Py_ssize_t len;

    if (res == NULL)
        return -1;
    len = PyNumber_AsSsize_t(res, PyExc_OverflowError);  // <=== HERE!!!
    Py_DECREF(res);
    if (len < 0) {  // <== AND HERE!!!
        if (!PyErr_Occurred())
            PyErr_SetString(PyExc_ValueError,
                            "__len__() should return >= 0");
        return -1;
    }
    return len;
}

Two things of note here:

If a negative number is returned, a ValueError is raised with the message "__len__() should return >= 0". This is exactly the error received when I tried to call len(foo(-1))!
Python tries to coerce the return value of __len__ to a Py_ssize_t before returning (Py_ssize_t is a signed version of size_t, which is like a special type of integer that is guaranteed to be able to index things in a container).

OK, let's look at the implementation of PyNumber_AsSsize_t. It's a bit long so I will omit the non-relevant stuff.

Py_ssize_t
PyNumber_AsSsize_t(PyObject *item, PyObject *err)
{
    Py_ssize_t result;
    PyObject *runerr;
    PyObject *value = PyNumber_Index(item);
    if (value == NULL)
        return -1;    
    /* OMITTED FOR BREVITY */

The relevant bit here is in PyNumber_Index, which Python uses to convert arbitrary objects to integers suitable for indexing. Here is where the actual answer to your question lies. I have annotated a bit.

PyObject *
PyNumber_Index(PyObject *item)
{
    PyObject *result = NULL;
    if (item == NULL) {
        return null_error();
    }

    if (PyLong_Check(item)) {  // IS THE OBJECT ALREADY AN int? IF SO, RETURN IT NOW.
        Py_INCREF(item);
        return item;
    }
    if (!PyIndex_Check(item)) {  // DOES THE OBJECT DEFINE __index__? IF NOT, FAIL.
        PyErr_Format(PyExc_TypeError,
                     "'%.200s' object cannot be interpreted "
                     "as an integer", item->ob_type->tp_name);
        return NULL;
    }
    result = item->ob_type->tp_as_number->nb_index(item);
    if (!result || PyLong_CheckExact(result))
        return result;
    if (!PyLong_Check(result)) {  // IF __index__ DOES NOT RETURN AN int, FAIL.
        PyErr_Format(PyExc_TypeError,
                     "__index__ returned non-int (type %.200s)",
                     result->ob_type->tp_name);
        Py_DECREF(result);
        return NULL;
    }
    /* Issue #17576: warn if 'result' not of exact type int. */
    if (PyErr_WarnFormat(PyExc_DeprecationWarning, 1,
            "__index__ returned non-int (type %.200s).  "
            "The ability to return an instance of a strict subclass of int "
            "is deprecated, and may be removed in a future version of Python.",
            result->ob_type->tp_name)) {
        Py_DECREF(result);
        return NULL;
    }
    return result;
}

Based on the error that you received, we can see that '5' does not define __index__. We can verify that for ourselves:

>>> '5'.__index__()
Traceback:
...
AttributeError: 'str' object has no attribute '__index__'

137

answered Oct 01 '22 12:10

SethMMorton

Related questions
                            
                                Most pythonic way to plot multiple signals
                            
                                How to use async/await in python 3.5+
                            
                                Python Pandas - how to get top n values and the sum of all other values
                            
                                How to update a list of variables in python?
                            
                                Drawing convexHull in openCV2 Python
                            
                                MySQL: django.db.utils.OperationalError: (1698, "Access denied for user 'root'@'localhost'") with correct username and pw
                            
                                Python tkinter: What are the correct values for the anchor option in the message widget?
                            
                                How to install numpy to Python 3.5?
                            
                                Plotting a choropleth map (with geopandas) using a user_defined classification scheme
                            
                                Make a contour plot by using three 1D arrays in python
                            
                                matplotlib: Can I interrupt an `axhline` with text?
                            
                                Inserting a list holding multiple values in MySQL using pymysql
                            
                                Python - how to pass a dictionary into defaultdict as value and not as a reference
                            
                                Incrementing class variables dynamically in Python
                            
                                python: can statement be inside expression?
                            
                                Why numpy converts 20000001 int to float32 as 20000000.?
                            
                                How to parse protobuf packets in Wireshark
                            
                                How the function dimshuffle works in Theano
                            
                                Django - Generating random, unique slug field for each model object
                            
                                Showing total on stacked bar Plotly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does Python ensure the return value of len is an integer when len is called?

Tags:

python

class

operator-overloading

Nate Stemen

People also ask

1 Answers

SethMMorton

Recent Activity

Donate For Us

How does Python ensure the return value of __len__ is an integer when len is called?

Tags:

python

class

operator-overloading

Nate Stemen

People also ask

1 Answers

SethMMorton

Related questions

Recent Activity

Donate For Us

How does Python ensure the return value of len is an integer when len is called?