Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I get a warning when concatenating lists of mixed types in Pycharm?

In Pycharm, the following code produces a warning:

from typing import List

list1: List[int] = [1, 2, 3]
list2: List[str] = ["1", "2", "3"]
list3: List[object] = list1 + list2
#                             ↳ Expected type List[int] (matched generic type List[_T]),
#                               got List[str] instead.

Why? Should I not be concatenating two lists of mixed, hinted types?

like image 794
Cai Avatar asked Jun 24 '19 14:06

Cai


People also ask

How do you concatenate multiple lists in Python?

You can concatenate multiple lists into one list by using the * operator. For Example, [*list1, *list2] – concatenates the items in list1 and list2 and creates a new resultant list object. What is this? Usecase: You can use this method when you want to concatenate multiple lists into a single list in one shot.

Can you concatenate lists in Python?

Python's extend() method can be used to concatenate two lists in Python. The extend() function does iterate over the passed parameter and adds the item to the list thus, extending the list in a linear fashion. All the elements of the list2 get appended to list1 and thus the list1 gets updated and results as output.

Which operator is used for concatenating the list?

The most conventional method to perform the list concatenation, the use of “+” operator can easily add the whole of one list behind the other list and hence perform the concatenation. List comprehension can also accomplish this task of list concatenation.

How do I combine lists and strings?

You can concatenate a list of strings into a single string with the string method, join() . Call the join() method from 'String to insert' and pass [List of strings] . If you use an empty string '' , [List of strings] is simply concatenated, and if you use a comma , , it makes a comma-delimited string.


1 Answers

As requested in the comments, here are some reasons why type checkers don't allow this.

The first reason is somewhat prosaic: the type signature of list.__add__ simply doesn't allow for anything other then a list containing the same type to be passed in:

_T = TypeVar('_T')

# ...snip...

class list(MutableSequence[_T], Generic[_T]):

    # ...snip...

    def __add__(self, x: List[_T]) -> List[_T]: ...

And Pycharm, which supports PEP 484, uses (in part) data from Typeshed.

It's possible that we could broaden this type signature in some way (e.g. overload it to also accept a List[_S] and return List[Union[_T, _S]] in that case), but I don't think anybody's bothered to investigate the feasibility of this approach: this sort of thing isn't too useful in practice, makes life harder for people who want strictly homogeneous lists or want to subclass them, and would potentially disrupt a lot of existing code that relies on the current type signature.

This type signature is also probably a reflection of the broader choice made during the initial design of PEP 484 to assume that lists are always homogenous -- always contains values of the same type.

The designers of PEP 484 strictly speaking didn't need to make this choice: they could have required type checkers to special-case interactions with it, like we currently do for tuples. But it's overall simpler not to do this, I think. (And also arguably better style, but whatever.)


The second reason has to do with a fundamental limitation of the PEP 484 type system: there's no way to declare that some function or method does not modify state.

Basically, the behavior you want is safe only if lst1.__add__(lst2) is guaranteed to not mutate either operands. But there's no way of actually guaranteeing this -- what if lst1 is some weird list subclass that copies items from lst2 to itself? Then temporarily relaxing lst1's type from SomeListSubtype[int] to SomeListSubtype[object] would be unsafe: lst1 would no longer contain only ints after adding/injecting the strings from lst2.

Of course, actually writing such a subclass is also bad practice, but type checkers don't have the luxury of assuming users will follow best practices if they're not enforced: type checkers, compilers, and similar tools are fundamentally conservative beasts.


And finally, it's worth noting that none of these problems are intrinsically insurmountable. There are several things type checker implementers could do, such as:

  1. Tinkering with the type signature of list (and making sure it doesn't break any existing code)
  2. Introduce some sort of way of declaring that a method is pure -- does no mutation. Basically, generalize the ideas behind PEP 591 to also apply to functions. (But this would require writing a PEP, modifying typeshed to use the new typing construct, doing a lot of careful design and implementation work...)
  3. Maybe special-case this interaction when we know for certain the two variables are not subclasses of lists. (But realistically, the number of times we'd know this for certain is pretty limited.)

...and so forth.

But all of these things take time and energy to do: it's a matter of prioritization. The issue tracker for PyCharm (and mypy, etc) are pretty long, and there's no shortages of other bugs/feature requests to work through.

like image 97
Michael0x2a Avatar answered Oct 13 '22 20:10

Michael0x2a