Is there an uniform way of knowing if an iterable object will be consumed by the iteration?
Suppose you have a certain function crunch
which asks for an iterable object for parameter, and uses it many times. Something like:
def crunch (vals):
for v in vals:
chomp(v)
for v in vals:
yum(v)
(note: merging together the two for
loops is not an option).
An issue arises if the function gets called with an iterable which is not a list. In the following call the yum
function is never executed:
crunch(iter(range(4))
We could in principle fix this by redefining the crunch
function as follows:
def crunch (vals):
vals = list(vals)
for v in vals:
chomp(v)
for v in vals:
yum(v)
But this would result in using twice the memory if the call to crunch
is:
hugeList = list(longDataStream)
crunch(hugeList)
We could fix this by defining crunch
like this:
def crunch (vals):
if type(vals) is not list:
vals = list(vals)
for v in vals:
chomp(v)
for v in vals:
yum(v)
But still there colud be the case in which the calling code stores data in something which
For instance:
from collections import deque
hugeDeque = deque(longDataStream)
crunch(hugeDeque)
It would be nice to have a isconsumable
predicate, so that we can define crunch
like this:
def crunch (vals):
if isconsumable(vals):
vals = list(vals)
for v in vals:
chomp(v)
for v in vals:
yum(v)
Is there a solution for this problem?
One possibility is to test whether the item is a Sequence, using isinstance(val, collections.Sequence)
. Non-consumability still isn't totally guaranteed but I think it's about the best you can get. A Python sequence has to have a length, which means that at least it can't be an open-ended iterator, and in general implies that the elements have to be known ahead of time, which in turn implies that they can be iterated over without consuming them. It's still possible to write pathological classes that fit the sequence protocol but aren't re-iterable, but you'll never be able to handle those.
Note that neither Iterable
nor Iterator
is the appropriate choice, because these types don't guarantee a length, and hence can't guarantee that the iteration will even be finite, let alone repeatable. You could, however, check for both Sized
and Iterable
.
The important thing is to document that your function will iterate over its argument twice, thus warning users that they must pass in an object that supports this.
Another, additional option could be to query if the iterable is its own iterator:
if iter(vals) is vals:
vals = list(vals)
because in this case, it is just an iterator.
This works with generators, iterators, files and many other objects which are designed for "one run", in other words, all iterables which are iterators by itself, because an iterator returns self
from its __iter__()
.
But this might not be enough, because there are objects which empty themselves on iteration without being their own iterator.
Normally, a self-consuming object will be its own iterator, but there are cases where this might not be allowed.
Imagine a class which wraps a list and empties this list on iteration, such as
class ListPart(object):
"""Liste stückweise zerlegen."""
def __init__(self, data=None):
if data is None: data = []
self.data = data
def next(self):
try:
return self.data.pop(0)
except IndexError:
raise StopIteration
def __iter__(self):
return self
def __len__(self): # doesn't work with __getattr__...
return len(self.data)
which you call like
l = [1, 2, 3, 4]
lp = ListPart(l)
for i in lp: process(i)
# now l is empty.
If I add now additional data to that list and iterate over the same object again, I'll get the new data which is a breach of the protocol:
The intention of the protocol is that once an iterator’s
next()
method raisesStopIteration
, it will continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken. (This constraint was added in Python 2.3; in Python 2.2, various iterators are broken according to this rule.)
So in this case, the object would have to return an iterator distinct from itself despite of being self-consuming. In this case, this could be done with
def __iter__(self):
while True:
try:
yield l.pop(0)
except IndexError: # pop from empty list
return
which returns a new generator on each iteration - something which would fall though the mash in the case we are discussing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With