Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is Python's sequence protocol?

Python does a lot with magic methods and most of these are part of some protocol. I am familiar with the "iterator protocol" and the "number protocol" but recently stumbled over the term "sequence protocol". But even after some research I'm not exactly sure what the "sequence protocol" is.

For example the C API function PySequence_Check checks (according to the documentation) if some object implements the "sequence protocol". The source code indicates that this is a class that's not a dict but implements a __getitem__ method which is roughly identical to what the documentation on iter also states:

[...]must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).[...]

But the requirement to start with 0 isn't something that's "implemented" in PySequence_Check.

Then there is also the collections.abc.Sequence type, which basically says the instance has to implement __reversed__, __contains__, __iter__ and __len__.

But by that definition a class implementing the "sequence protocol" isn't necessarily a Sequence, for example the "data model" and the abstract class guarantee that a sequence has a length. But a class just implementing __getitem__ (passing the PySequence_Check) throws an exception when using len(an_instance_of_that_class).

Could someone please clarify for me the difference between a sequence and the sequence protocol (if there's a definition for the protocol besides reading the source code) and when to use which definition?

like image 225
MSeifert Avatar asked Apr 23 '17 00:04

MSeifert


People also ask

What is sequence protocol in Python?

Sequence type, which basically says the instance has to implement __reversed__ , __contains__ , __iter__ and __len__ . But by that definition a class implementing the "sequence protocol" isn't necessarily a Sequence, for example the "data model" and the abstract class guarantee that a sequence has a length.

Why does Python use sequencing?

Sequences allow you to store multiple values in an organized and efficient fashion. There are seven sequence types: strings, bytes, lists, tuples, bytearrays, buffers, and range objects.

What does it mean for a Python object to be a sequence?

In Python, a sequence is any ordered collection of objects whose contents can be accessed via “indexing”. A sub-sequence can be accessed by “slicing” the sequence. You saw, in the required reading, that Python's lists and strings are both examples of sequences.


1 Answers

It's not really consistent.

Here's PySequence_Check:

int
PySequence_Check(PyObject *s)
{
    if (PyDict_Check(s))
        return 0;
    return s != NULL && s->ob_type->tp_as_sequence &&
        s->ob_type->tp_as_sequence->sq_item != NULL;
}

PySequence_Check checks if an object provides the C sequence protocol, implemented through a tp_as_sequence member in the PyTypeObject representing the object's type. This tp_as_sequence member is a pointer to a struct containing a bunch of functions for sequence behavior, such as sq_item for item retrieval by numeric index and sq_ass_item for item assignment.

Specifically, PySequence_Check requires that its argument is not a dict, and that it provides sq_item.

Types with a __getitem__ written in Python will provide sq_item regardless of whether they're conceptually sequences or mappings, so a mapping written in Python that doesn't inherit from dict will pass PySequence_Check.


On the other hand, collections.abc.Sequence only checks whether an object concretely inherits from collections.abc.Sequence or whether its class (or a superclass) is explicitly registered with collections.abc.Sequence. If you just implement a sequence yourself without doing either of those things, it won't pass isinstance(your_sequence, Sequence). Also, most classes registered with collections.abc.Sequence don't support all of collections.abc.Sequence's methods. Overall, collections.abc.Sequence is a lot less reliable than people commonly expect it to be.


As for what counts as a sequence in practice, it's usually anything that supports __len__ and __getitem__ with integer indexes starting at 0 and isn't a mapping. If the docs for a function say it takes any sequence, that's almost always all it needs. Unfortunately, "isn't a mapping" is hard to test for, for reasons similar to how "is a sequence" is hard to pin down.

like image 179
user2357112 supports Monica Avatar answered Oct 14 '22 23:10

user2357112 supports Monica