Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is looping through a generator in a loop over that same generator safe in Python?

From what I understand, a for x in a_generator: foo(x) loop in Python is roughly equivalent to this:

try:
    while True:
        foo(next(a_generator))
except StopIteration:
    pass

That suggests that something like this:

for outer_item in a_generator:
    if should_inner_loop(outer_item):
        for inner_item in a_generator:
            foo(inner_item)
            if stop_inner_loop(inner_item): break
    else:
        bar(outer_item)

would do two things:

  1. Not raise any exceptions, segfault, or anything like that
  2. Iterate over y until it reaches some x where should_inner_loop(x) returns truthy, then loop over it in the inner for until stop_inner_loop(thing) returns true. Then, the outer loop resumes where the inner one left off.

From my admittedly not very good tests, it seems to perform as above. However, I couldn't find anything in the spec guaranteeing that this behavior is constant across interpreters. Is there anywhere that says or implies that I can be sure it will always be like this? Can it cause errors, or perform in some other way? (i.e. do something other than what's described above


N.B. The code equivalent above is taken from my own experience; I don't know if it's actually accurate. That's why I'm asking.

like image 592
Nic Avatar asked May 17 '16 18:05

Nic


People also ask

Can you iterate over a generator?

Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).

Why would you use a generator over a loop in Python?

A generator is a construct in Python that allows for lazy or ad hoc loading of a stream of data. They can work like a list and be looped over, but generators have the ability to maintain state. Looking at the function above, you might be seeing an unfamiliar keyword called yield . This is similar to return .

How many times can you iterate through a generator?

This is because generators, like all iterators, can be exhausted. Unless your generator is infinite, you can iterate through it one time only. Once all values have been evaluated, iteration will stop and the for loop will exit. If you used next() , then instead you'll get an explicit StopIteration exception.

Is generator faster than for loop Python?

This generator uses an iterator, because the "for" loop is implemented using an iterator. If you time these, the generator is consistently faster.


1 Answers

TL;DR: it is safe with CPython (but I could not find any specification of this), although it may not do what you want to do.


First, let's talk about your first assumption, the equivalence.

A for loop actually calls first iter() on the object, then runs next() on its result, until it gets a StopIteration.

Here is the relevant bytecode (a low level form of Python, used by the interpreter itself):

>>> import dis
>>> def f():
...  for x in y:
...   print(x)
... 
>>> dis.dis(f)
  2           0 SETUP_LOOP              24 (to 27)
              3 LOAD_GLOBAL              0 (y)
              6 GET_ITER
        >>    7 FOR_ITER                16 (to 26)
             10 STORE_FAST               0 (x)

  3          13 LOAD_GLOBAL              1 (print)
             16 LOAD_FAST                0 (x)
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 POP_TOP
             23 JUMP_ABSOLUTE            7
        >>   26 POP_BLOCK
        >>   27 LOAD_CONST               0 (None)
             30 RETURN_VALUE

GET_ITER calls iter(y) (which itself calls y.__iter__()) and pushes its result on the stack (think of it as a bunch of local unnamed variables), then enters the loop at FOR_ITER, which calls next(<iterator>) (which itself calls <iterator>.__next__()), then executes the code inside the loop, and the JUMP_ABSOLUTE makes the execution comes back to FOR_ITER.


Now, for the safety:

Here are the methods of a generator: https://hg.python.org/cpython/file/101404/Objects/genobject.c#l589 As you can see at line 617, the implementation of __iter__() is PyObject_SelfIter, whose implementation you can find here. PyObject_SelfIter simply returns the object (ie. the generator) itself.

So, when you nest the two loops, both iterate on the same iterator. And, as you said, they are just calling next() on it, so it's safe.

But be cautious: the inner loop will consume items that will not be consumed by the outer loop. Even if that is what you want to do, it may not be very readable.

If that is not what you want to do, consider itertools.tee(), which buffers the output of an iterator, allowing you to iterate over its output twice (or more). This is only efficient if the tee iterators stay close to each other in the output stream; if one tee iterator will be fully exhausted before the other is used, it's better to just call list on the iterator to materialize a list out of it.

like image 160
Valentin Lorentz Avatar answered Dec 22 '22 00:12

Valentin Lorentz