Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PEP 424 __length_hint__() - Is there a way to do the same for generators or zips?

Just came across this awesome __length_hint__() method for iterators from PEP 424 (https://www.python.org/dev/peps/pep-0424/). Wow! A way to get the iterator length without exhausting the iterator.

My questions:

  1. Is there a simple explanation how does this magic work? I'm just curious.
  2. Are there limitations and cases where it wouldn't work? ("hint" just sounds a bit suspicious).
  3. Is there a way to get the hint for zips and generators as well? Or is it something fundamental only to iterators?

Edit: BTW, I see that the __length__hint__() counts from current position to the end. i.e. partially consumed iterator will report the remaining length. Interesting.

like image 953
Aguy Avatar asked Jul 14 '16 22:07

Aguy


2 Answers

Wow! A way to get the iterator length without exhausting the iterator.

No. It's a way to get a vague hint about what the length might be. There is no requirement that it be in any way accurate.

Is there a simple explanation how does this magic work?

The iterator implements a __length_hint__ method that uses some sort of iterator-specific information to make a guess about how many elements it will output. This guess could be pretty decent, or it could suck horribly. For example, a list iterator knows where it is in the list and how long the list is, so it can report how many elements are left in the list.

Are there limitations and cases where it wouldn't work?

If the iterator doesn't have enough information to guess when it will run out, it can't implement a useful __length_hint__. This is why generators don't have one, for example. Infinite iterators also can't implement a useful __length_hint__, as there is no way to signal an infinite length.

Is there a way to get the hint for zips and generators as well? Or is it something fundamental only to iterators?

zip instances and generators are both kinds of iterators. Neither zip nor the generator type provide a __length_hint__ method, though.

like image 176
user2357112 supports Monica Avatar answered Nov 17 '22 14:11

user2357112 supports Monica


The purpose of this is basically just to facilitate more performant allocation of memory in Cython/C code. For example, imagine that a Cython module exposes a function that takes an iterable of custom MyNetworkConnection() objects and, internally, needs to create and allocate memory for data structures to represent them in the Cython/C code. If we can get a rough estimate of the number of items in the iterator, we can allocate a large enough slab of memory in one operation to accommodate all of them with minimal resizing.

If __len__() is implemented, we know the exact length and can use that for memory allocation. But often times we won't actually know the exact length, so the estimate helps us improve performance by giving us a "ballpark figure".

It's also surely useful in pure-Python code as well, for example, maybe a user-facing completion time estimate for an operation?

For question 2, well, it's a hint, so you can't rely on it to be exact. You must still account for allocating new memory if the hint is too low, or cleaning up if the hint is too high. I'm not personally aware of other limitations or potential problems.

For question 3, I see no reason why it wouldn't work for Generators, since a Generator is an Iterator:

>>> import collections
>>> def my_generator(): yield
>>> gen = my_generator()
>>> isinstance(gen, collections.Iterator)
True
like image 3
Will Avatar answered Nov 17 '22 13:11

Will