Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

yield in list comprehensions and generator expressions

The following behaviour seems rather counterintuitive to me (Python 3.4):

>>> [(yield i) for i in range(3)] <generator object <listcomp> at 0x0245C148> >>> list([(yield i) for i in range(3)]) [0, 1, 2] >>> list((yield i) for i in range(3)) [0, None, 1, None, 2, None] 

The intermediate values of the last line are actually not always None, they are whatever we send into the generator, equivalent (I guess) to the following generator:

def f():    for i in range(3):       yield (yield i) 

It strikes me as funny that those three lines work at all. The Reference says that yield is only allowed in a function definition (though I may be reading it wrong and/or it may simply have been copied from the older version). The first two lines produce a SyntaxError in Python 2.7, but the third line doesn't.

Also, it seems odd

  • that a list comprehension returns a generator and not a list
  • and that the generator expression converted to a list and the corresponding list comprehension contain different values.

Could someone provide more information?

like image 856
zabolekar Avatar asked Aug 21 '15 12:08

zabolekar


People also ask

Can you use yield in list comprehension?

The yield expression is only used when defining a generator function and thus can only be used in the body of a function definition. This has been confirmed to be a bug in issue 10544.

What is the difference between a list comprehension and a generator expression?

So what's the difference between Generator Expressions and List Comprehensions? The generator yields one item at a time and generates item only when in demand. Whereas, in a list comprehension, Python reserves memory for the whole list. Thus we can say that the generator expressions are memory efficient than the lists.

Are list comprehensions generators?

List comprehensions and generators are not different at all; they are just different ways of writing the same thing. A list comprehension produces a list as output, a generator produces a generator object.

What do list comprehensions return?

List comprehensions are used for creating new lists from other iterables. As list comprehensions return lists, they consist of brackets containing the expression, which is executed for each element along with the for loop to iterate over each element.


1 Answers

Note: this was a bug in the CPython's handling of yield in comprehensions and generator expressions, fixed in Python 3.8, with a deprecation warning in Python 3.7. See the Python bug report and the What's New entries for Python 3.7 and Python 3.8.

Generator expressions, and set and dict comprehensions are compiled to (generator) function objects. In Python 3, list comprehensions get the same treatment; they are all, in essence, a new nested scope.

You can see this if you try to disassemble a generator expression:

>>> dis.dis(compile("(i for i in range(3))", '', 'exec'))   1           0 LOAD_CONST               0 (<code object <genexpr> at 0x10f7530c0, file "", line 1>)               3 LOAD_CONST               1 ('<genexpr>')               6 MAKE_FUNCTION            0               9 LOAD_NAME                0 (range)              12 LOAD_CONST               2 (3)              15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)              18 GET_ITER              19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)              22 POP_TOP              23 LOAD_CONST               3 (None)              26 RETURN_VALUE >>> dis.dis(compile("(i for i in range(3))", '', 'exec').co_consts[0])   1           0 LOAD_FAST                0 (.0)         >>    3 FOR_ITER                11 (to 17)               6 STORE_FAST               1 (i)               9 LOAD_FAST                1 (i)              12 YIELD_VALUE              13 POP_TOP              14 JUMP_ABSOLUTE            3         >>   17 LOAD_CONST               0 (None)              20 RETURN_VALUE 

The above shows that a generator expression is compiled to a code object, loaded as a function (MAKE_FUNCTION creates the function object from the code object). The .co_consts[0] reference lets us see the code object generated for the expression, and it uses YIELD_VALUE just like a generator function would.

As such, the yield expression works in that context, as the compiler sees these as functions-in-disguise.

This is a bug; yield has no place in these expressions. The Python grammar before Python 3.7 allows it (which is why the code is compilable), but the yield expression specification shows that using yield here should not actually work:

The yield expression is only used when defining a generator function and thus can only be used in the body of a function definition.

This has been confirmed to be a bug in issue 10544. The resolution of the bug is that using yield and yield from will raise a SyntaxError in Python 3.8; in Python 3.7 it raises a DeprecationWarning to ensure code stops using this construct. You'll see the same warning in Python 2.7.15 and up if you use the -3 command line switch enabling Python 3 compatibility warnings.

The 3.7.0b1 warning looks like this; turning warnings into errors gives you a SyntaxError exception, like you would in 3.8:

>>> [(yield i) for i in range(3)] <stdin>:1: DeprecationWarning: 'yield' inside list comprehension <generator object <listcomp> at 0x1092ec7c8> >>> import warnings >>> warnings.simplefilter('error') >>> [(yield i) for i in range(3)]   File "<stdin>", line 1 SyntaxError: 'yield' inside list comprehension 

The differences between how yield in a list comprehension and yield in a generator expression operate stem from the differences in how these two expressions are implemented. In Python 3 a list comprehension uses LIST_APPEND calls to add the top of the stack to the list being built, while a generator expression instead yields that value. Adding in (yield <expr>) just adds another YIELD_VALUE opcode to either:

>>> dis.dis(compile("[(yield i) for i in range(3)]", '', 'exec').co_consts[0])   1           0 BUILD_LIST               0               3 LOAD_FAST                0 (.0)         >>    6 FOR_ITER                13 (to 22)               9 STORE_FAST               1 (i)              12 LOAD_FAST                1 (i)              15 YIELD_VALUE              16 LIST_APPEND              2              19 JUMP_ABSOLUTE            6         >>   22 RETURN_VALUE >>> dis.dis(compile("((yield i) for i in range(3))", '', 'exec').co_consts[0])   1           0 LOAD_FAST                0 (.0)         >>    3 FOR_ITER                12 (to 18)               6 STORE_FAST               1 (i)               9 LOAD_FAST                1 (i)              12 YIELD_VALUE              13 YIELD_VALUE              14 POP_TOP              15 JUMP_ABSOLUTE            3         >>   18 LOAD_CONST               0 (None)              21 RETURN_VALUE 

The YIELD_VALUE opcode at bytecode indexes 15 and 12 respectively is extra, a cuckoo in the nest. So for the list-comprehension-turned-generator you have 1 yield producing the top of the stack each time (replacing the top of the stack with the yield return value), and for the generator expression variant you yield the top of the stack (the integer) and then yield again, but now the stack contains the return value of the yield and you get None that second time.

For the list comprehension then, the intended list object output is still returned, but Python 3 sees this as a generator so the return value is instead attached to the StopIteration exception as the value attribute:

>>> from itertools import islice >>> listgen = [(yield i) for i in range(3)] >>> list(islice(listgen, 3))  # avoid exhausting the generator [0, 1, 2] >>> try: ...     next(listgen) ... except StopIteration as si: ...     print(si.value) ...  [None, None, None] 

Those None objects are the return values from the yield expressions.

And to reiterate this again; this same issue applies to dictionary and set comprehension in Python 2 and Python 3 as well; in Python 2 the yield return values are still added to the intended dictionary or set object, and the return value is 'yielded' last instead of attached to the StopIteration exception:

>>> list({(yield k): (yield v) for k, v in {'foo': 'bar', 'spam': 'eggs'}.items()}) ['bar', 'foo', 'eggs', 'spam', {None: None}] >>> list({(yield i) for i in range(3)}) [0, 1, 2, set([None])] 
like image 79
Martijn Pieters Avatar answered Oct 22 '22 04:10

Martijn Pieters