Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Count Elements in Iterator Without Consuming

Tags:

python

Given an iterator it, I would like a function it_count that returns the count of elements that iterator produces, without destroying the iterator. For example:

ita = iter([1, 2, 3])
print(it_count(ita))
print(it_count(ita))

should print

3
3

It has been pointed out that this may not be a well-defined question for all iterators, so I am not looking for a completely general solution, but it should function as anticipated on the example given.


Okay, let me clarify further to my specific case. Given the following code:

ita = iter([1, 2, 3])
itb, itc = itertools.tee(ita)
print(sum(1 for _ in itb))
print(sum(1 for _ in itc))

...can we write the it_count function described above, so that it will function in this manner? Even if the answer to the question is "That cannot be done," that's still a perfectly valid answer. It doesn't make the question bad. And the proof that it is impossible would be far from trivial...

like image 747
Apollys supports Monica Avatar asked Dec 10 '22 11:12

Apollys supports Monica


2 Answers

Not possible. Until the iterator has been completely consumed, it doesn't have a concrete element count.

like image 90
user2357112 supports Monica Avatar answered Dec 20 '22 03:12

user2357112 supports Monica


The only way to get the length of an arbitary iterator is by iterating over it, so the basic question here is ill-defined. You can't get the length of any iterator without iterating over it.

Also the iterator itself may change it's contents while being iterated over, so the count may not be constant anyway.


But there are possibilities that might do what you ask, be warned none of them is foolproof or really efficient:

When using python 3.4 or later you can use operator.length_hint and hope the iterator supports it (be warned: not many iterators do! And it's only meant as a hint, the actual length might be different!):

>>> from operator import length_hint

>>> it_count = length_hint

>>> ita = iter([1, 2, 3])
>>> print(it_count(ita))
3
>>> print(it_count(ita))
3

As alternative: You can use itertools.tee but read the documentation of that carefully before using it. It may solve your issue but it won't really solve the underlying problem.

import itertools

def it_count(iterator):
    return sum(1 for _ in iterator)

ita = iter([1, 2, 3])
it1, it2 = itertools.tee(ita, 2)
print(it_count(it1))  # 3
print(it_count(it2))  # 3

But this is less efficient (memory and speed) than casting it to a list and using len on it.

like image 44
MSeifert Avatar answered Dec 20 '22 05:12

MSeifert