In python, concatenation of two sequences is typically done by the +
operator. However, mypy complains about the following:
from typing import Sequence
def concat1(a: Sequence, b: Sequence) -> Sequence:
return a + b
And it's right: Sequence
has no __add__
. However, the function works perfectly fine for the "usual" sequence types list
, str
, tuple
. Obviously, there are other sequence types where it doesn't work (e.g. numpy.ndarray
). A solution could be to following:
from itertools import chain
def concat2(a: Sequence, b: Sequence) -> Sequence:
return list(chain(a, b))
Now, mypy doesn't complain. But concatenating strings or tuples always gives a list. There seems to be an easy fix:
def concat3(a: Sequence, b: Sequence) -> Sequence:
T = type(a)
return T(chain(a, b))
But now mypy is unhappy because the constructor for T get's too many arguments. Even worse, the function doesn't return a Sequence anymore, but it returns a generator.
What is the proper way of doing this? I feel that part of the issue is that a and b should have the same type and that the output will be the same type too, but the type annotations don't convey it.
Note: I am aware that concatenating strings is more efficiently done using ''.join(a, b)
. However, I picked this example more for illustration purposes.
You can concatenate sequences of the same type with the + operator. You can multiply a sequence S by an integer n with the * operator. S * n or n * S is the concatenation of n copies of S . When n <=0 , S * n is an empty sequence of the same type as S .
If you concatenate a list with 2 items and a list with 4 items, you will get a new list with 6 items (not a list with two sublists). Similarly, repetition of a list of 2 items 4 times will give a list with 8 items. One way for us to make this more clear is to run a part of this example in codelens.
Python will always remain a dynamically typed language. However, PEP 484 introduced type hints, which make it possible to also do static type checking of Python code. Unlike how types work in most other statically typed languages, type hints by themselves don't cause Python to enforce types.
There is no general way to solve this: Sequence
includes types which cannot be concatenated in a generic way. For example, there is no way to concatenate arbitrary range
objects to create a new range
and keep all elements.
One must decide on a concrete means of concatenation, and restrict the accepted types to those providing the required operations.
The simplest approach is for the function to only request the operations needed. In case the pre-built protocols in typing
are not sufficient, one can fall back to define a custom typing.Protocol
for the requested operations.
Since concat1
/concat_add
requires the +
implementation, a Protocol
with __add__
is needed. Also, since addition usually works on similar types, __add__
must be parameterized over the concrete type – otherwise, the Protocol asks for all addable types that can be added to all other addable types.
# TypeVar to parameterize for specific types
SA = TypeVar('SA', bound='SupportsAdd')
class SupportsAdd(Protocol):
"""Any type T where +(:T, :T) -> T"""
def __add__(self: SA, other: SA) -> SA: ...
def concat_add(a: SA, b: SA) -> SA:
return a + b
This is sufficient to type-safely concatenate the basic sequences, and reject mixed-type concatenation.
reveal_type(concat_add([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_add("abc", "xyz")) # note: Revealed type is 'builtins.str*'
reveal_type(concat_add([1, 2, 3], "xyz")) # error: ...
Be aware that this allows concatenating any type that implements __add__
, for example int
. If further restrictions are desired, define the Protocol more closely – for example by requiring __len__
and __getitem__
.
Typing concatenation via chaining is a bit more complex, but follows the same approach: A Protocol
defines the capabilities needed by the function, but in order to be type-safe the elements should be typed as well.
# TypeVar to parameterize for specific types and element types
C = TypeVar('C', bound='Chainable')
T = TypeVar('T', covariant=True)
# Parameterized by the element type T
class Chainable(Protocol[T]):
"""Any type C[T] where C[T](:Iterable[T]) -> C[T] and iter(:C[T]) -> Iterable[T]"""
def __init__(self, items: Iterable[T]): ...
def __iter__(self) -> Iterator[T]: ...
def concat_chain(a: C, b: C) -> C:
T = type(a)
return T(chain(a, b))
This is sufficient to type-safely concatenate sequences constructed from themselves, and reject mixed-type concatenation and non-sequences.
reveal_type(concat_chain([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_chain("abc", "xyz")) # note: Revealed type is 'builtins.str*'
reveal_type(concat_chain([1, 2, 3], "xyz")) # error: ...
reveal_type(concat_chain(1, 2)) # error: ...
Sequence does not support add, so you cannot use sequence. Instead, use a TypeVar
that is bound to the types that you allow, or use overloading. Overloading is more general than needed here (though you may disagree) but you can read about it here https://docs.python.org/3/library/typing.html#typing.overload. Let's just use a TypeVar
from typing import TypeVar
ConcatableSequence = TypeVar('ConcatableSequence ', list, str, tuple)
def concat1(a: ConcatableSequence, b: ConcatableSequence) -> ConcatableSequence:
return a + b
Note here that when the type check runs, ConcatableSequence
may be list
, str
, or tuple
, but all three of a
, b
, and the return value must be the same choice, which differs from how Union
would work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With