Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generator Comprehension different output from list comprehension?

Tags:

python

I get different output when using a list comprehension versus a generator comprehension. Is this expected behavior or a bug?

Consider the following setup:

all_configs = [
    {'a': 1, 'b':3},
    {'a': 2, 'b':2}
]
unique_keys = ['a','b']

If I then run the following code, I get:

print(list(zip(*( [c[k] for k in unique_keys] for c in all_configs))))
>>> [(1, 2), (3, 2)]
# note the ( vs [
print(list(zip(*( (c[k] for k in unique_keys) for c in all_configs))))
>>> [(2, 2), (2, 2)]

This is on python 3.6.0:

Python 3.6.0 (default, Dec 24 2016, 08:01:42)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
like image 378
Bas Avatar asked Mar 15 '17 09:03

Bas


People also ask

What makes a list comprehension different from a generator?

So what's the difference between Generator Expressions and List Comprehensions? The generator yields one item at a time and generates item only when in demand. Whereas, in a list comprehension, Python reserves memory for the whole list.

What is the difference between list comprehension dict comprehension and generator?

The only difference between Generator Comprehension and List Comprehension is that the former uses parentheses.

What are the benefits of using a generator over a list comprehension?

The main advantage of generator over a list is that it take much less memory. We can check how much memory is taken by both types using sys. getsizeof() method.

Does list comprehension create a generator?

List comprehensions are eager but generators are lazy. In list comprehensions all objects are created right away, it takes longer to create and return the list. In generator expressions, object creation is delayed until request by next() . Upon next() generator object is created and returned immediately.


3 Answers

In a list comprehension, expressions are evaluated eagerly. In a generator expression, they are only looked up as needed.

Thus, as the generator expression iterates over for c in all_configs, it refers to c[k] but only looks up c after the loop is done, so it only uses the latest value for both tuples. By contrast, the list comprehension is evaluated immediately, so it creates a tuple with the first value of c and another tuple with the second value of c.

Consider this small example:

>>> r = range(3)
>>> i = 0
>>> a = [i for _ in r]
>>> b = (i for _ in r)
>>> i = 3
>>> print(*a)
0 0 0
>>> print(*b)
3 3 3

When creating a, the interpreter created that list immediately, looking up the value of i as soon as it was evaluated. When creating b, the interpreter just set up that generator and didn't actually iterate over it and look up the value of i. The print calls told the interpreter to evaluate those objects. a already existed as a full list in memory with the old value of i, but b was evaluated at that point, and when it looked up the value of i, it found the new value.

like image 174
TigerhawkT3 Avatar answered Oct 07 '22 19:10

TigerhawkT3


To see what's going on, replace c[k] with a function with a side effect:

def f(c,k):
    print(c,k)
    return c[k]
print("listcomp")
print(list(zip(*( [f(c,k) for k in unique_keys] for c in all_configs))))
print("gencomp")
print(list(zip(*( (f(c,k) for k in unique_keys) for c in all_configs))))

output:

listcomp
{'a': 1, 'b': 3} a
{'a': 1, 'b': 3} b
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} b
[(1, 2), (3, 2)]
gencomp
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} b
{'a': 2, 'b': 2} b
[(2, 2), (2, 2)]

c in generator expressions is evaluated after the outer loop has completed:

c bears the last value it took in the outer loop.

In the list comprehension case, c is evaluated at once.

(note that aabb vs abab too because of execution when zipping vs execution at once)

note that you can keep the "generator" way of doing it (not creating the temporary list) by passing c to map so the current value is stored:

print(list(zip(*( map(c.get,unique_keys) for c in all_configs))))

in Python 3, map does not create a list, but the result is still OK: [(1, 2), (3, 2)]

like image 21
Jean-François Fabre Avatar answered Oct 07 '22 19:10

Jean-François Fabre


This is happening because zip(*) call resulted in evaluation of the outer generator and this outer returned two more generators.

(c[k], print(c)) for k in unique_keys)

The evaluation of outer generator moved c to the second dict: {'a': 2, 'b':2}.

Now when we are evaluating these generators individually they look for c somewhere, and as its value is now {'a': 2, 'b':2} you get the output as [(2, 2), (2, 2)].

Demo:

>>> def my_zip(*args):
...     print(args)
...     for arg in args:
...         print (list(arg))
...
... my_zip(*((c[k] for k in unique_keys) for c in all_configs))
...

Output:

# We have two generators now, means it has looped through `all_configs`.
(<generator object <genexpr>.<genexpr> at 0x104415c50>, <generator object <genexpr>.<genexpr> at 0x10416b1a8>)
[2, 2]
[2, 2]

The list-comprehension on the other hand evaluates right away and can fetch the value of current value of c not its last value.


How to force it use the correct value of c?

Use a inner function and generator function. The inner function can help us remember c's value using default argument.

>>> def solve():
...     for c in all_configs:
...         def func(c=c):
...             return (c[k] for k in unique_keys)
...         yield func()
...

>>>

>>> list(zip(*solve()))
[(1, 2), (3, 2)]
like image 28
Ashwini Chaudhary Avatar answered Oct 07 '22 18:10

Ashwini Chaudhary