class foo:
def __init__(self, data):
self.data = data
def __len__(self):
return self.data
If I run this by passing a string in for data
I get an error when calling len
on an instance of this class. Specifically I get 'str' object cannot be interpreted as an integer
.
So does the return
statement in __len__
have to be an integer? I would think if I am overriding it, it should be able to output whatever I want, so why is this not possible?
Return: It returns an integer which is the length of the string.
Python __len__ is one of the various magic methods in Python programming language, it is basically used to implement the len() function in Python because whenever we call the len() function then internally __len__ magic method is called.
The function len() is one of Python's built-in functions. It returns the length of an object. For example, it can return the number of items in a list. You can use the function with many different data types.
To get the length of an integer in Python: Use the str() class to convert the integer to a string, e.g. result = str(my_int) . Pass the string to the len() function, e.g. len(my_str) . The len() function will return the length of the string.
SHORT ANSWER
At the C-level, Python inserts __len__
into a special slot that catches the output of the call to __len__
and does some validation on it to ensure it is correct.
LONG ANSWER
In order to answer this, we have to go a bit down the rabbit hole of what happens under the hood when len
is called in Python.
First, let's establish some behavior.
>>> class foo:
... def __init__(self, data):
... self.data = data
... def __len__(self):
... return self.data
...
>>> len(foo(-1))
Traceback:
...
ValueError: __len__() should return >= 0
>>> len(foo('5'))
Traceback:
...
TypeError: 'str' object cannot be interpreted as an integer
>>> len(foo(5))
5
When you call len
, the C function builtin_len
gets called. Let's take a look at this.
static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
Py_ssize_t res;
res = PyObject_Size(obj); // <=== THIS IS WHAT IS IMPORTANT!!!
if (res < 0 && PyErr_Occurred())
return NULL;
return PyLong_FromSsize_t(res);
}
You will notice that the PyObject_Size
function is being called - this function will return the size of an arbitrary Python object. Let's move further down the rabbit hole.
Py_ssize_t
PyObject_Size(PyObject *o)
{
PySequenceMethods *m;
if (o == NULL) {
null_error();
return -1;
}
m = o->ob_type->tp_as_sequence;
if (m && m->sq_length)
return m->sq_length(o); // <==== THIS IS WHAT IS IMPORTANT!!!
return PyMapping_Size(o);
}
It checks if the type defines the sq_length
function (sequence length), and if so, calls it to get the length. It appears that at the C level, Python categorizes all objects that define __len__
as either sequences or mappings (even if that's not how we would think of them at the Python level); in our case, Python thinks of this class a sequence, so it calls sq_length
.
Let's take a quick aside: for builtin types (such as list
, set
, etc.) Python does not actually call a function to calculate the length, but accesses a value stored in a C struct, making this very fast. Each of these builtin types defines how to access this by assigning an accessor method to sq_length
. Let's take a quick peek at how this is implemented for lists:
static Py_ssize_t
list_length(PyListObject *a)
{
return Py_SIZE(a); // <== THIS IS A MACRO for (PyVarObject*) a->ob_size;
}
static PySequenceMethods list_as_sequence = {
...
(lenfunc)list_length, /* sq_length */
...
};
ob_size
stores the object's size (i.e. number of elements in the list). So, when sq_length
is called, it is sent to the list_length
function to get the value of ob_size
.
OK, so that's how it is done for a builtin type... how does it work for a custom class like our foo
? Since the "dunder methods" (such as __len__
) are special, Python detects them in our classes and treats them specially (specifically, inserting them into special slots).
Most of this is handled in typeobject.c. The __len__
function is intercepted and assigned to the sq_length
slot (just like a builtin!) near the bottom of the file.
SQSLOT("__len__", sq_length, slot_sq_length, wrap_lenfunc,
"__len__($self, /)\n--\n\nReturn len(self)."),
The slot_sq_length
function is where we can finally answer your question.
static Py_ssize_t
slot_sq_length(PyObject *self)
{
PyObject *res = call_method(self, &PyId___len__, NULL);
Py_ssize_t len;
if (res == NULL)
return -1;
len = PyNumber_AsSsize_t(res, PyExc_OverflowError); // <=== HERE!!!
Py_DECREF(res);
if (len < 0) { // <== AND HERE!!!
if (!PyErr_Occurred())
PyErr_SetString(PyExc_ValueError,
"__len__() should return >= 0");
return -1;
}
return len;
}
Two things of note here:
ValueError
is raised with the message "__len__() should return >= 0"
. This is exactly the error received when I tried to call len(foo(-1))
!__len__
to a Py_ssize_t
before returning (Py_ssize_t
is a signed version of size_t
, which is like a special type of integer that is guaranteed to be able to index things in a container).OK, let's look at the implementation of PyNumber_AsSsize_t
. It's a bit long so I will omit the non-relevant stuff.
Py_ssize_t
PyNumber_AsSsize_t(PyObject *item, PyObject *err)
{
Py_ssize_t result;
PyObject *runerr;
PyObject *value = PyNumber_Index(item);
if (value == NULL)
return -1;
/* OMITTED FOR BREVITY */
The relevant bit here is in PyNumber_Index
, which Python uses to convert arbitrary objects to integers suitable for indexing. Here is where the actual answer to your question lies. I have annotated a bit.
PyObject *
PyNumber_Index(PyObject *item)
{
PyObject *result = NULL;
if (item == NULL) {
return null_error();
}
if (PyLong_Check(item)) { // IS THE OBJECT ALREADY AN int? IF SO, RETURN IT NOW.
Py_INCREF(item);
return item;
}
if (!PyIndex_Check(item)) { // DOES THE OBJECT DEFINE __index__? IF NOT, FAIL.
PyErr_Format(PyExc_TypeError,
"'%.200s' object cannot be interpreted "
"as an integer", item->ob_type->tp_name);
return NULL;
}
result = item->ob_type->tp_as_number->nb_index(item);
if (!result || PyLong_CheckExact(result))
return result;
if (!PyLong_Check(result)) { // IF __index__ DOES NOT RETURN AN int, FAIL.
PyErr_Format(PyExc_TypeError,
"__index__ returned non-int (type %.200s)",
result->ob_type->tp_name);
Py_DECREF(result);
return NULL;
}
/* Issue #17576: warn if 'result' not of exact type int. */
if (PyErr_WarnFormat(PyExc_DeprecationWarning, 1,
"__index__ returned non-int (type %.200s). "
"The ability to return an instance of a strict subclass of int "
"is deprecated, and may be removed in a future version of Python.",
result->ob_type->tp_name)) {
Py_DECREF(result);
return NULL;
}
return result;
}
Based on the error that you received, we can see that '5'
does not define __index__
. We can verify that for ourselves:
>>> '5'.__index__()
Traceback:
...
AttributeError: 'str' object has no attribute '__index__'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With