Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are Python itertools not classified as generators (GeneratorType)?

I just discovered that the various itertools functions return class types which are not considered generators by the Python type system.

First, the setup:

import collections
import glob  
import itertools
import types

ig = glob.iglob('*')
iz = itertools.izip([1,2], [3,4])

Then:

>>> isinstance(ig, types.GeneratorType) 
True
>>> isinstance(iz, types.GeneratorType)
False

The glob.iglob() result, or any other typical generator, is of type types.GeneratorType. But itertools results are not. This leads to a great deal of confusion if I want to write a function whose input sequence must be eagerly evaluated--I need to know if it's a generator or not.

I found this alternative:

>>> isinstance(ig, collections.Iterator)
True
>>> isinstance(iz, collections.Iterator)
True

But it's not ideal, because iter(x) is an Iterator regardless of whether x was a concrete (eagerly evaluated) sequence, or a generator (lazily evaluated).

The end goal is something like this:

def foo(self, sequence):
    """Store the sequence, making sure it is fully
    evaluated before this function returns."""

    if isinstance(sequence, types.GeneratorType):
        self.sequence = list(sequence)
    else:
        self.sequence = sequence

An example of why I'd want to do this would be if the evaluation of the sequence might raise an exception, and I want that exception to be raised from foo() and not from subsequent use of self.sequence.

I don't like the types.GeneratorType approach because it produces some false positives--I don't want to construct a copy of the input list unnecessarily, as it may be large.

I'm willing to ignore "unusual" iterators, meaning if someone implements a custom one that doesn't qualify as a generator, but I'm not as willing to have the wrong behavior for itertools, because they're rather popular.

like image 720
John Zwinck Avatar asked Jun 24 '16 13:06

John Zwinck


1 Answers

Why are Python itertools not classified as generators?

Think of a generators as being one of many possible ways to implement an iterator. The itertools are all custom iterators written in C. Most of the could have been been implemented with slower code using generators, but they were designed for speed.

The types.GeneratorType is specified to be "The type of generator-iterator objects, produced by calling a generator function." Since the iterator returned by glob.iglob() is produced by calling a generator function, it will match the generator type. However, the iterator returned by itertools.izip() is produced by C code, so it will not match the generator type.

In other words, types.GeneratorType isn't useful for recognizing all lazily evaluated iterators, it is only useful for recognizing actual generator-iterators.

How to recognize a fully-evaluated collection?

It sounds like the goal is to distinguish between "eagerly evaluated" collections (like list, tuple, dict, and set) versus "lazily evaluated" iterators. Using collections.Iterator is likely the way to go:

>>> isinstance([], collections.Iterator)
False
>>> isinstance((), collections.Iterator)
False
>>> isinstance({}, collections.Iterator)
False
>>> isinstance(set(), collections.Iterator)
False

>>> isinstance(iter([]), collections.Iterator)
True
>>> isinstance(iter(()), collections.Iterator)
True
>>> isinstance(iter({}), collections.Iterator)
True
>>> isinstance(iter(set()), collections.Iterator)
True

>>> isinstance(glob.iglob('.'), collections.Iterator)
True
>>> isinstance(itertools.izip('abc', 'def'), collections.Iterator)
True
>>> isinstance((x**2 for x in range(5)), collections.Iterator)
True

What if iter() has already been called?

If you've already called iter() on any of "eager" collections, then it is too late to figure-out the nature of the upstream iterable without resorting to shenanigans such as type(x) in {type(iter(s)) for s in ([], (), {}, set())}.

End goal

The stated goal is "store the sequence, making sure it is fully evaluated before this function returns". The usual way to do this is just list(sequence) with no surrounding checks to see if it is already a list, tuple, deque or some other fully-evaluated sequence. This may seem wasteful, but the list() call is very fast (it just copies the object pointers at C-speed).

like image 180
Raymond Hettinger Avatar answered Oct 14 '22 02:10

Raymond Hettinger