Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python typing: Concatenate sequences

In python, concatenation of two sequences is typically done by the + operator. However, mypy complains about the following:

from typing import Sequence

def concat1(a: Sequence, b: Sequence) -> Sequence:
    return a + b

And it's right: Sequence has no __add__. However, the function works perfectly fine for the "usual" sequence types list, str, tuple. Obviously, there are other sequence types where it doesn't work (e.g. numpy.ndarray). A solution could be to following:

from itertools import chain

def concat2(a: Sequence, b: Sequence) -> Sequence:
    return list(chain(a, b))

Now, mypy doesn't complain. But concatenating strings or tuples always gives a list. There seems to be an easy fix:

def concat3(a: Sequence, b: Sequence) -> Sequence:
    T = type(a)
    return T(chain(a, b))

But now mypy is unhappy because the constructor for T get's too many arguments. Even worse, the function doesn't return a Sequence anymore, but it returns a generator.

What is the proper way of doing this? I feel that part of the issue is that a and b should have the same type and that the output will be the same type too, but the type annotations don't convey it.

Note: I am aware that concatenating strings is more efficiently done using ''.join(a, b). However, I picked this example more for illustration purposes.

like image 764
Ingo Avatar asked Dec 18 '20 17:12

Ingo


People also ask

How do you concatenate a sequence in Python?

You can concatenate sequences of the same type with the + operator. You can multiply a sequence S by an integer n with the * operator. S * n or n * S is the concatenation of n copies of S . When n <=0 , S * n is an empty sequence of the same type as S .

What is the difference between concatenation and repetition?

If you concatenate a list with 2 items and a list with 4 items, you will get a new list with 6 items (not a list with two sublists). Similarly, repetition of a list of 2 items 4 times will give a list with 8 items. One way for us to make this more clear is to run a part of this example in codelens.

Does Python enforce type hints?

Python will always remain a dynamically typed language. However, PEP 484 introduced type hints, which make it possible to also do static type checking of Python code. Unlike how types work in most other statically typed languages, type hints by themselves don't cause Python to enforce types.


2 Answers

There is no general way to solve this: Sequence includes types which cannot be concatenated in a generic way. For example, there is no way to concatenate arbitrary range objects to create a new range and keep all elements.

One must decide on a concrete means of concatenation, and restrict the accepted types to those providing the required operations.

The simplest approach is for the function to only request the operations needed. In case the pre-built protocols in typing are not sufficient, one can fall back to define a custom typing.Protocol for the requested operations.


Since concat1/concat_add requires the + implementation, a Protocol with __add__ is needed. Also, since addition usually works on similar types, __add__ must be parameterized over the concrete type – otherwise, the Protocol asks for all addable types that can be added to all other addable types.

# TypeVar to parameterize for specific types
SA = TypeVar('SA', bound='SupportsAdd')


class SupportsAdd(Protocol):
    """Any type T where +(:T, :T) -> T"""
    def __add__(self: SA, other: SA) -> SA: ...


def concat_add(a: SA, b: SA) -> SA:
    return a + b

This is sufficient to type-safely concatenate the basic sequences, and reject mixed-type concatenation.

reveal_type(concat_add([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_add("abc", "xyz"))        # note: Revealed type is 'builtins.str*'
reveal_type(concat_add([1, 2, 3], "xyz"))    # error: ...

Be aware that this allows concatenating any type that implements __add__, for example int. If further restrictions are desired, define the Protocol more closely – for example by requiring __len__ and __getitem__.


Typing concatenation via chaining is a bit more complex, but follows the same approach: A Protocol defines the capabilities needed by the function, but in order to be type-safe the elements should be typed as well.

# TypeVar to parameterize for specific types and element types
C = TypeVar('C', bound='Chainable')
T = TypeVar('T', covariant=True)


# Parameterized by the element type T
class Chainable(Protocol[T]):
    """Any type C[T] where C[T](:Iterable[T]) -> C[T] and iter(:C[T]) -> Iterable[T]"""
    def __init__(self, items: Iterable[T]): ...

    def __iter__(self) -> Iterator[T]: ...


def concat_chain(a: C, b: C) -> C:
    T = type(a)
    return T(chain(a, b))

This is sufficient to type-safely concatenate sequences constructed from themselves, and reject mixed-type concatenation and non-sequences.

reveal_type(concat_chain([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_chain("abc", "xyz"))        # note: Revealed type is 'builtins.str*'
reveal_type(concat_chain([1, 2, 3], "xyz"))    # error: ...
reveal_type(concat_chain(1, 2))                # error: ...
like image 56
MisterMiyagi Avatar answered Sep 18 '22 15:09

MisterMiyagi


Sequence does not support add, so you cannot use sequence. Instead, use a TypeVar that is bound to the types that you allow, or use overloading. Overloading is more general than needed here (though you may disagree) but you can read about it here https://docs.python.org/3/library/typing.html#typing.overload. Let's just use a TypeVar

from typing import TypeVar

ConcatableSequence = TypeVar('ConcatableSequence ', list, str, tuple)

def concat1(a: ConcatableSequence, b: ConcatableSequence) -> ConcatableSequence:
    return a + b

Note here that when the type check runs, ConcatableSequence may be list, str, or tuple, but all three of a, b, and the return value must be the same choice, which differs from how Union would work.

like image 28
mCoding Avatar answered Sep 18 '22 15:09

mCoding