Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Type hint function accepting a Union

Here is my (much simplified) code:

def myfun(X:list[str|int]):
    for x in X:
        print(x)


X = [1,2,3]
myfun(X)

Pyright complains on the last line because I provide a list of int while the function requires list[int|str].

  • What is the best way to deal with that case?
  • Is there a way to say pyright to accept "subtypes"?

Constraints:

  • I do not want to define X as X:list[str|int]=[1,2,3] because, in my real case, I want X to be understood as list of int.

  • I can call the function with myfun(list[str|int](X)) but it is really annoying.

like image 386
Laurent Claessens Avatar asked Mar 06 '26 19:03

Laurent Claessens


1 Answers

Program to interfaces, not implementations

-- Gang of Four

This has been a staple chesnut of OOP lore for a long time (it dates back about 35 years now). But if you've never worked in a statically typed system before, it can be confusing what that means. After all, Python has been object oriented since its inception, and until type hinting sayings like that were never really applicable other than in a very weak sense like using duck typing instead of isinstance.

But now we really need to make sure that we study the history of how to build software well in a statically typed system lest we be doomed to repeat it, and your question offers a great lens to examine this (thanks, btw).

First lets get a little background out of the way. We can't talk about why your code errors without defining generics. The technical definition is that a generic type is a type that is parameterized by other types. Which is cool and all, but perhaps a more intuitive way is to think of them as types that are incomplete, a sort of fill-in-the-blank type. I realize you may know this already, but not everyone reading this will get it.

When we talk about a value like a list [] it makes sense on it's own: it's a container and we can put all sorts of things inside it, the fact it's empty right now doesn't really matter. But types are categories, so it really doesn't make sense to talk about the type (category) of 'List' without saying a List of something specific. Types like List are generic in that you need another type to complete them, you can have a List[int] or a List[Tuple[int, int]] or whatever.

Depending on the circumstances generics vary in how they handle subtype relationships in their parameters, this variance is the source of your problem as commenter jonrsharpe already helpfully pointed out. Explaining variance is outside the scope of this answer, but in addition to the official mypy docs you might find this resource helpful. Because you use the union type str|int that means that str and int are both subtypes of that union, and so the variance rules apply when the union is passed as a parameter to a generic type.

TL;DR because of how mypy treats the variance of different generic container types you want to use a different one than List, the error message helpfully suggests Sequence instead, as another commenter STerliakov points out you only really need Iterable.

But let's get back to my opening quote and talk about why the system is structured that way.

Part of the reason I presume is to encourage the best practice I opened with: when designing an API down to and including function signatures, you really want to specify the bare minimum contract that the function needs to do it's work. If the only thing your function needs is an object with a getTimeStamp method that returns an integer, you really don't want to write def my_fun(x: SomeClass): because now my_fun is tightly coupled to that class. If you want to refactor SomeClass and move the getTimeStamp functionality to SomeOtherClass now every call site of my_fun is broken and needs to be changed.

Refactoring tools are helpful but not really a solution: what if this is a published library on PyPI? Now what should be an internal implementation detail has leaked and you have a breaking change major version semver bump. Instead, you want to use the type system to say "this function expects that 'x' will be an object with a getTime method that returns an integer", and any object that qualifies is fine (N.B. this also greatly simplifies writing unit tests for the code and obviates the need for a lot of unnecessary mocking).

So for your code which iterates a linear number of integers or strings you want to use a type that describes "something that contains integers or strings and works with a for loop":

from typing import Iterable

def my_fun(xs: Iterable[int|str]):
    for x in xs:
        print(x)

foo = {1, 2, 3} # set
bar = [1, 2, 3] # list
baz = {"a": 1, "b": 2, "c": 3} # dict

my_fun(foo)
my_fun(bar)
my_fun(baz)

You can see now that the concrete data structure doesn't matter, we can pass in anything that conforms, even a custom user-defined object with __iter__ and __next__ methods! The typechecker encourages you to do this by making it more cumbersome if you don't and instead use a concrete data type like List, via the generic variance rules.

Well, almost. See I'm assuming you typed it that way because you want to be able to pass in [1, 'a'] and not just [1, 2] or ['a', 'b'], and this cool trick fails in mypy when we use a heterogeneous structure:

oops = [1, 'a']
my_fun(oops)

You will get an error that you supplied List[object] where it expected Iterable[str|int]. Which unlike the best-practice-encouraging generic variance rules is arguably a flaw in the mypy typechecker, I note that my IDE gets this correct and does not show that error because it's using pyright as the LSP (I see you tagged the question pyright, so you may be ok). So that stinks, and you would need to explicitly hint oops to be oops: Iterable[str|int] = [1, 'a'], which is unfortunate.

like image 187
Jared Smith Avatar answered Mar 09 '26 08:03

Jared Smith