Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly is a Sequence?

The Python docs are a bit ambiguous

sequence

An iterable which supports efficient element access using integer indices via the __getitem__() special method and defines a __len__() method that returns the length of the sequence. Some built-in sequence types are list, str, tuple, and bytes. Note that dict also supports __getitem__() and __len__(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.

The collections.abc.Sequence abstract base class defines a much richer interface that goes beyond just __getitem__() and __len__(), adding count(), index(), __contains__(), and __reversed__(). Types that implement this expanded interface can be registered explicitly using register().

In particular, using abc.collections.Sequence as the gold standard as recommended by some would mean that, for example, numpy arrays are not sequences:

isinstance(np.arange(6),collections.abc.Sequence)
# False

There is also something called the Sequence Protocol but that appears to be exposed only at the C-API. There the criterion is

int PySequence_Check(PyObject *o)

Return 1 if the object provides sequence protocol, and 0 otherwise. Note that it returns 1 for Python classes with a __getitem__() method unless they are dict subclasses since in general case it is impossible to determine what the type of keys it supports. This function always succeeds.

Finally, I don't follow this new (-ish) type annotation business too closely but I would imagine this also would benefit from a clear concept of what a sequence is.

So my question has both a philosophical and a practical side: What exactly is a sequence? and How do I test whether something is a sequence or not? Ideally, in a way that makes numpy arrays sequences. And if I ever start annotating, how would I approach sequences?

like image 662
Paul Panzer Avatar asked Jul 18 '20 15:07

Paul Panzer


1 Answers

Brief introduction to typing in Python

Skip ahead if you know what structural typing, nominal typing and duck typing are.

I think much of the confusion arises from the fact that typing was a provisional module between versions 3.5 and 3.6. And was still subject to change between versions 3.7 and 3.8. This means there has been a lot of flux in how Python has sought to deal with typing through type annotations.

It also doesn't help that python is both duck-typed and nominally typed. That is, when accessing an attribute of an object, Python is duck-typed. The object will only be checked to see if it has an attribute at runtime, and only when immediately requested. However, Python also has nominal typing features (eg. isinstance()and issubclass()). Nominal typing is where one type is declared to be a subclass of another. This can be through inheritance, or with the register() method of ABCMeta.

typing originally introduced its types using the idea of nominal typing. As of 3.8 it is trying to allow for the more pythonic structural typing. Structural typing is related to duck-typing, except that it is taken into consideration at "compile time" rather than runtime. For instance, when a linter is trying to detect possible type errors -- such as if you were to pass a dict to a function that only accepts sequences like tuples or list. With structural typing, a class B should be considered a subtype of A if it implements the all the methods of A, regardless of whether it has been declared to be a subtype of A (as in nominal typing).

Answer

sequences (little s) are a duck type. A sequence is any ordered collection of objects that provides random access to its members. Specifically, if it defines __len__ and __getitem__ and uses integer indices between 0 and n-1 then it is a sequence. A Sequence (big s) is a nominal type. That is, to be a Sequence, a class must be declared as such, either by inheriting from Sequence or being registered as a subclass.

A numpy array is a sequence, but it is not a Sequence as it is not registered as a subclass of Sequence. Nor should it be, as it does not implement the full interface promised by Sequence (things like count() and index() are missing).

It sounds like you want is a structured type for a sequence (small s). As of 3.8 this is possible by using protocols. Protocols define a set of methods which a class must implement to be considered a subclass of the protocol (a la structural typing).

from typing import Protocol
import numpy as np

class MySequence(Protocol):
    def __getitem__(self, index):
        raise NotImplementedError
    def __len__(self):
        raise NotImplementedError
    def __contains__(self, item):
        raise NotImplementedError
    def __iter__(self):
        raise NotImplementedError

def f(s: MySequence):
    for i in range(len(s)):
        print(s[i], end=' ')
    print('end')

f([1, 2, 3, 4]) # should be fine
arr: np.ndarray = np.arange(5)
f(arr) # also fine
f({}) # might be considered fine! Depends on your type checker

Protocols are fairly new, so not all IDEs/type checkers might support them yet. The IDE I use, PyCharm, does. It doesn't like f({}), but it is happy to consider a numpy array a Sequence (big S) though (perhaps not ideal). You can enable runtime checking of protocols by using the runtime_checkable decorator of typing. Be warned, all this does is individually check that each of the Protocols methods can be found on the given object/class. As a result, it can become quite expensive if your protocol has a lot of methods.

like image 116
Dunes Avatar answered Oct 03 '22 06:10

Dunes