Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected output from list(generator)

I have a list and a lambda function defined as

In [1]: i = lambda x: a[x]
In [2]: alist = [(1, 2), (3, 4)]

Then I try two different methods to calculate a simple sum

First method.

In [3]: [i(0) + i(1) for a in alist]
Out[3]: [3, 7]

Second method.

In [4]: list(i(0) + i(1) for a in alist)
Out[4]: [7, 7]

Both results are unexpectedly different. Why is that happening?

like image 957
Himanshu Mishra Avatar asked Jul 04 '15 08:07

Himanshu Mishra


5 Answers

This behaviour has been fixed in python 3. When you use a list comprehension [i(0) + i(1) for a in alist] you will define a in its surrounding scope which is accessible for i. In a new session list(i(0) + i(1) for a in alist) will throw error.

>>> i = lambda x: a[x]
>>> alist = [(1, 2), (3, 4)]
>>> list(i(0) + i(1) for a in alist)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
  File "<stdin>", line 1, in <lambda>
NameError: global name 'a' is not defined

A list comprehension is not a generator: Generator expressions and list comprehensions.

Generator expressions are surrounded by parentheses (“()”) and list comprehensions are surrounded by square brackets (“[]”).

In your example list() as a class has its own scope of variables and it has access to global variables at most. When you use that, i will look for a inside that scope. Try this in new session:

>>> i = lambda x: a[x]
>>> alist = [(1, 2), (3, 4)]
>>> [i(0) + i(1) for a in alist]
[3, 7]
>>> a
(3, 4)

Compare it to this in another session:

>>> i = lambda x: a[x]
>>> alist = [(1, 2), (3, 4)]
>>> l = (i(0) + i(1) for a in alist)
<generator object <genexpr> at 0x10e60db90>
>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> [x for x in l]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
  File "<stdin>", line 1, in <lambda>
NameError: global name 'a' is not defined

When you run list(i(0) + i(1) for a in alist) you will pass a generator (i(0) + i(1) for a in alist) to the list class which it will try to convert it to a list in its own scope before return the list. For this generator which has no access inside lambda function, the variable a has no meaning.

The generator object <generator object <genexpr> at 0x10e60db90> has lost the variable name a. Then when list tries to call the generator, lambda function will throw error for undefined a.

The behaviour of list comprehensions in contrast with generators also mentioned here:

List comprehensions also "leak" their loop variable into the surrounding scope. This will also change in Python 3.0, so that the semantic definition of a list comprehension in Python 3.0 will be equivalent to list(). Python 2.4 and beyond should issue a deprecation warning if a list comprehension's loop variable has the same name as a variable used in the immediately surrounding scope.

In python3:

>>> i = lambda x: a[x]
>>> alist = [(1, 2), (3, 4)]
>>> [i(0) + i(1) for a in alist]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
  File "<stdin>", line 1, in <lambda>
NameError: name 'a' is not defined
like image 188
Mehdi Avatar answered Nov 14 '22 18:11

Mehdi


Important things to understand here are

  1. generator expressions will be creating function objects internally but list comprehension will not.

  2. they both will bind the loop variable to the values and the loop variables will be in the current scope if they are not already created.

Lets see the byte codes of the generator expression

>>> dis(compile('(i(0) + i(1) for a in alist)', 'string', 'exec'))
  1           0 LOAD_CONST               0 (<code object <genexpr> at ...>)
              3 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (alist)
              9 GET_ITER            
             10 CALL_FUNCTION            1
             13 POP_TOP             
             14 LOAD_CONST               1 (None)
             17 RETURN_VALUE        

It loads the code object and then it makes it a function. Lets see the actual code object.

>>> dis(compile('(i(0) + i(1) for a in alist)', 'string', 'exec').co_consts[0])
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                27 (to 33)
              6 STORE_FAST               1 (a)
              9 LOAD_GLOBAL              0 (i)
             12 LOAD_CONST               0 (0)
             15 CALL_FUNCTION            1
             18 LOAD_GLOBAL              0 (i)
             21 LOAD_CONST               1 (1)
             24 CALL_FUNCTION            1
             27 BINARY_ADD          
             28 YIELD_VALUE         
             29 POP_TOP             
             30 JUMP_ABSOLUTE            3
        >>   33 LOAD_CONST               2 (None)
             36 RETURN_VALUE        

As you see here, the current value from the iterator is stored in the variable a. But since we make this a function object, the a created will be visible only within the generator expression.

But in case of list comprehension,

>>> dis(compile('[i(0) + i(1) for a in alist]', 'string', 'exec'))
  1           0 BUILD_LIST               0
              3 LOAD_NAME                0 (alist)
              6 GET_ITER            
        >>    7 FOR_ITER                28 (to 38)
             10 STORE_NAME               1 (a)
             13 LOAD_NAME                2 (i)
             16 LOAD_CONST               0 (0)
             19 CALL_FUNCTION            1
             22 LOAD_NAME                2 (i)
             25 LOAD_CONST               1 (1)
             28 CALL_FUNCTION            1
             31 BINARY_ADD          
             32 LIST_APPEND              2
             35 JUMP_ABSOLUTE            7
        >>   38 POP_TOP             
             39 LOAD_CONST               2 (None)
             42 RETURN_VALUE        

There is no explicit function creation and the variable a is created in the current scope. So, a is leaked in to the current scope.


With this understanding, lets approach your problem.

>>> i = lambda x: a[x]
>>> alist = [(1, 2), (3, 4)]

Now, when you create a list with comprehension,

>>> [i(0) + i(1) for a in alist]
[3, 7]
>>> a
(3, 4)

you can see that a is leaked to the current scope and it is still bound to the last value from the iteration.

So, when you iterate the generator expression after the list comprehension, the lambda function uses the leaked a. That is why you are getting [7, 7], since a is still bound to (3, 4).

But, if you iterate the generator expression first, then the a will be bound to the values from alist and will not be leaked to the current scope as generator expression becomes a function. So, when the lambda function tries to access a, it couldn't find it anywhere. That is why it fails with the error.

Note: The same behaviour cannot be observed in Python 3.x, because the leaking is prevented by creating functions for list comprehensions as well. You might want to read more about this in the History of Python blog's post, From List Comprehensions to Generator Expressions, written by Guido himself.

like image 43
thefourtheye Avatar answered Nov 14 '22 17:11

thefourtheye


See my other answer for a workaround. But thinking a bit more about, the problem seems to be a bit more complex. I think there are several issues going on here:

  • When you do i = lambda x: a[x], the variable a is not a parameter to the function, this is called a closure. This is the same for both lambda expressions and normal function definitions.

  • Python apparently does 'late binding', which means that the value of the variables you closed over are only looked up at the moment you call the function. This can lead to various unexpected results.

  • In Python 2, there is a difference between list comprehensions, which leak their loop variable, and generator expressions, in which the loop variable does not leak (see this PEP for details). This difference has been removed in Python 3, where a list comprehension is a shortcut for list(generater_expression). I am not sure, but this probably means that Python2 list comprehensions execute in their outer scope, while generator expressions and Python3 list comprehensions create their own inner scope.

Demonstration (in Python2):

In [1]: def f():  # closes over a from global scope
   ...:     return 2 * a
   ...: 

In [2]: list(f() for a in range(5))  # does not find a in global scope
[...]
NameError: global name 'a' is not defined

In [3]: [f() for a in range(5)]  
# executes in global scope, so f finds a. Also leaks a=8
Out[3]: [0, 2, 4, 6, 8]

In [4]: list(f() for a in range(5))  # finds a=8 in global scope
Out[4]: [8, 8, 8, 8, 8]

In Python3:

In [1]: def f():
   ...:     return 2 * a
   ...: 

In [2]: list(f() for a in range(5))  
# does not find a in global scope, does not leak a
[...]    
NameError: name 'a' is not defined

In [3]: [f() for a in range(5)]  
# does not find a in global scope, does not leak a
[...]
NameError: name 'a' is not defined

In [4]: list(f() for a in range(5))  # a still undefined
[...]
NameError: name 'a' is not defined
like image 5
Bas Swinckels Avatar answered Nov 14 '22 18:11

Bas Swinckels


a is in global scope. So it should give error

Solution is:

i = lambda a, x: a[x]

like image 1
sinhayash Avatar answered Nov 14 '22 17:11

sinhayash


After [i(0) + i(1) for a in alist] is executed, a becomes (3,4).

Then when the below line is executed:

list(i(0) + i(1) for a in alist)

(3,4) value is used both time by the lambda function i as the value of a, so it prints [7,7].

Instead you should define your lambda functions having two parameters a and x.

i = lambda a,x : a[x]
like image 1
Rahul Gupta Avatar answered Nov 14 '22 19:11

Rahul Gupta