Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding len function with iterators

Reading the documentation I have noticed that the built-in function len doesn't support all iterables but just sequences and mappings (and sets). Before reading that, I always thought that the len function used the iteration protocol to evaluate the length of an object, so I was really surprised reading that.

I read the already-posted questions (here and here) but I am still confused, I'm still not getting the real reason why not allow len to work with all iterables in general.

Is it a more conceptual/logical reason than an implementational one? I mean when I'm asking the length of an object, I'm asking for one property (how many elements it has), a property that objects as generators don't have because they do not have elements inside, the produce elements.

Furthermore generator objects can yield infinite elements bring to an undefined length, something that can not happen with other objects as lists, tuples, dicts, etc...

So am I right, or are there more insights/something more that I'm not considering?

like image 424
zer0uno Avatar asked Sep 08 '14 15:09

zer0uno


1 Answers

The biggest reason is that it reduces type safety.

How many programs have you written where you actually needed to consume an iterable just to know how many elements it had, throwing away anything else?

I, in quite a few years of coding in Python, never needed that. It's a non-sensical operation in normal programs. An iterator may not have a length (e.g. infinite iterators or generators that expects inputs via send()), so asking for it doesn't make much sense. The fact that len(an_iterator) produces an error means that you can find bugs in your code. You can see that in a certain part of the program you are calling len on the wrong thing, or maybe your function actually needs a sequence instead of an iterator as you expected.

Removing such errors would create a new class of bugs where people, calling len, erroneously consume an iterator, or use an iterator as if it were a sequence without realizing.

If you really need to know the length of an iterator, what's wrong with len(list(iterator))? The extra 6 characters? It's trivial to write your own version that works for iterators, but, as I said, 99% of the time this simply means that something with your code is wrong, because such an operation doesn't make much sense.

The second reason is that, with that change, you are violating two nice properties of len that currently hold for all (known) containers:

  • It is known to be cheap on all containers ever implemented in Python (all built-ins, standard library, numpy & scipy and all other big third party libraries do this on both dynamically sized and static sized containers). So when you see len(something) you know that the len call is cheap. Making it work with iterators would mean that suddenly all programs might become inefficient due to computations of the length.

    Also note that you can, trivially, implement O(1) __len__ on every container. The cost to pre-compute the length is often negligible, and generally worth paying. The only exception would be if you implement immutable containers that have part of their internal representation shared with other instances (to save memory). However, I don't know of any implementation that does this, and most of the time you can achieve better than O(n) time anyway.

    In summary: currently everybody implements __len__ in O(1) and it's easy to continue to do so. So there is an expectation for calls to len to be O(1). Even if it's not part of the standard. Python developers intentionally avoid C/C++'s style legalese in their documentation and trust the users. In this case, if your __len__ isn't O(1), it's expected that you document that.

  • It is known to be not destructive. Any sensible implementation of __len__ doesn't change its argument. So you can be sure that len(x) == len(x), or that n = len(x);len(list(x)) == n.

    Even this property is not defined in the documentation, however it's expected by everyone, and currently, nobody violates it.

Such properties are good, because you can reason and make assumptions about code using them. They can help you ensure the correctness of a piece of code, or understand its asymptotic complexity. The change you propose would make it much harder to look at some code and understand whether it's correct or what would be it's complexity, because you have to keep in mind the special cases.

In summary, the change you are proposing has one, really small, pro: saving few characters in very particular situations, but it has several, big, disadvantages which would impact a huge portion of existing code.


An other minor reason. If len consumes iterators I'm sure that some people would start to abuse this for its side-effects (replacing the already ugly use of map or list-comprehensions). Suddenly people can write code like:

len(print(something) for ... in ...)

to print text, which is really just ugly. It doesn't read well. Stateful code should be relagated to statements, since they provide a visual cue of side-effects.

like image 69
Bakuriu Avatar answered Sep 22 '22 04:09

Bakuriu