I assumed that itertools.product
generates elements one at the time. I am now noticing that it is not true.
Simple proof of concept:
Class A:
def __init__(self, n):
self.source = iter(range(n))
def __iter__(self):
return self
def __next__(self):
val = next(self.source)
print("I am at:", val)
return val
Now If I do:
from itertools import product
l = product(A(3), A(3))
print("Here")
next(l)
I expect to have as output:
>'Here'
>'I am at 0'
>'I am at 0'
But I have
>'I am at 0'
>'I am at 1'
>'I am at 2'
>'I am at 0'
>'I am at 1'
>'I am at 2'
>'Here'
Am I missing something?
To answer your question we need to look at the implementation of itertools.product:
def product(*args, repeat=1):
pools = [tuple(pool) for pool in args] * repeat
result = [[]]
for pool in pools:
result = [x+[y] for x in result for y in pool]
for prod in result:
yield tuple(prod)
here you find the real C implementation, but to answer this question, it is enough to refer to python (see the EXTRA paragraph at the bottom).
focus on this line of code:
pools = [tuple(pool) for pool in args] * repeat
in this way all the elements of the two iterators (taken in input) are transformed into a list of tuples (only the first time you call next()
), and at this time they are actually created.
Returning to your code, when you call next(l)
for the first time, all elements of the iterators are created. In your example the list will be created the polls
list with the following elements:
# pools: [(0, 1, 2), (0, 1, 2)]
which is why you got those outputs.
As for the print("Here")
, to understand why it is printed first you need to understand how the generators work:
itertool.product()
returns a generator object. The generator does not execute the function code until it is stimulated by the first next()
. Subsequently, each call next()
allows you to calculate the next element, executing only once the loop containing the keyword yield
.
Here you will find excellent resources to better understand how python generators work.
Because the Cartesian product must evaluate the same element several times, and iterators cannot instead be consumed only once.
in C the list of tuple pools it is created equivalent to python, as you can see from this code, are evaluated eagerly. Each iterable argument is first converted to a tuple:
pools = PyTuple_New(npools);
if (pools == NULL)
goto error;
for (i=0; i < nargs ; ++i) {
PyObject *item = PyTuple_GET_ITEM(args, i);
PyObject *pool = PySequence_Tuple(item);
if (pool == NULL)
goto error;
PyTuple_SET_ITEM(pools, i, pool);
indices[i] = 0;
}
for ( ; i < npools; ++i) {
PyObject *pool = PyTuple_GET_ITEM(pools, i - nargs);
Py_INCREF(pool);
PyTuple_SET_ITEM(pools, i, pool);
indices[i] = 0;
}
I'd like to point out that while for both instances of class A
the __next__
method gets called exhaustively (until StopIteration
is encountered), the itertools.product
iterator is still lazy evaluated with the subsequent calls to next
. Notice that:
> 'I am at 0'
> 'I am at 1'
> 'I am at 2'
> 'I am at 0'
> 'I am at 1'
> 'I am at 2'
> 'Here'
is just a result of calling exhaustively next
first for the first passed instance, and then for the second. This is more readily seen when calling product(A(2), A(3))
, which results in:
> 'I am at 0'
> 'I am at 1'
> 'I am at 0'
> 'I am at 1'
> 'I am at 2'
The same behavior is observed for combinations
and permutations
. In fact searching for so informed question with "Does itertools.product evaluate its arguments lazily?" brought me to this SO question which also answers your question. The arguments are not evaluated lazily:
since
product
sometimes needs to go over an iterable more than once, which is not possible if the arguments were left as iterators that can only be consumed once.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With