I just discovered that the various itertools functions return class types which are not considered generators by the Python type system.
First, the setup:
import collections
import glob
import itertools
import types
ig = glob.iglob('*')
iz = itertools.izip([1,2], [3,4])
Then:
>>> isinstance(ig, types.GeneratorType)
True
>>> isinstance(iz, types.GeneratorType)
False
The glob.iglob()
result, or any other typical generator, is of type types.GeneratorType
. But itertools results are not. This leads to a great deal of confusion if I want to write a function whose input sequence must be eagerly evaluated--I need to know if it's a generator or not.
I found this alternative:
>>> isinstance(ig, collections.Iterator)
True
>>> isinstance(iz, collections.Iterator)
True
But it's not ideal, because iter(x)
is an Iterator
regardless of whether x
was a concrete (eagerly evaluated) sequence, or a generator (lazily evaluated).
The end goal is something like this:
def foo(self, sequence):
"""Store the sequence, making sure it is fully
evaluated before this function returns."""
if isinstance(sequence, types.GeneratorType):
self.sequence = list(sequence)
else:
self.sequence = sequence
An example of why I'd want to do this would be if the evaluation of the sequence might raise an exception, and I want that exception to be raised from foo()
and not from subsequent use of self.sequence
.
I don't like the types.GeneratorType
approach because it produces some false positives--I don't want to construct a copy of the input list unnecessarily, as it may be large.
I'm willing to ignore "unusual" iterators, meaning if someone implements a custom one that doesn't qualify as a generator, but I'm not as willing to have the wrong behavior for itertools, because they're rather popular.
Think of a generators as being one of many possible ways to implement an iterator. The itertools are all custom iterators written in C. Most of the could have been been implemented with slower code using generators, but they were designed for speed.
The types.GeneratorType is specified to be "The type of generator-iterator objects, produced by calling a generator function." Since the iterator returned by glob.iglob() is produced by calling a generator function, it will match the generator type. However, the iterator returned by itertools.izip() is produced by C code, so it will not match the generator type.
In other words, types.GeneratorType isn't useful for recognizing all lazily evaluated iterators, it is only useful for recognizing actual generator-iterators.
It sounds like the goal is to distinguish between "eagerly evaluated" collections (like list, tuple, dict, and set) versus "lazily evaluated" iterators. Using collections.Iterator is likely the way to go:
>>> isinstance([], collections.Iterator)
False
>>> isinstance((), collections.Iterator)
False
>>> isinstance({}, collections.Iterator)
False
>>> isinstance(set(), collections.Iterator)
False
>>> isinstance(iter([]), collections.Iterator)
True
>>> isinstance(iter(()), collections.Iterator)
True
>>> isinstance(iter({}), collections.Iterator)
True
>>> isinstance(iter(set()), collections.Iterator)
True
>>> isinstance(glob.iglob('.'), collections.Iterator)
True
>>> isinstance(itertools.izip('abc', 'def'), collections.Iterator)
True
>>> isinstance((x**2 for x in range(5)), collections.Iterator)
True
If you've already called iter() on any of "eager" collections, then it is too late to figure-out the nature of the upstream iterable without resorting to shenanigans such as type(x) in {type(iter(s)) for s in ([], (), {}, set())}
.
The stated goal is "store the sequence, making sure it is fully evaluated before this function returns". The usual way to do this is just list(sequence)
with no surrounding checks to see if it is already a list, tuple, deque or some other fully-evaluated sequence. This may seem wasteful, but the list() call is very fast (it just copies the object pointers at C-speed).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With