I get different output when using a list comprehension versus a generator comprehension. Is this expected behavior or a bug?
Consider the following setup:
all_configs = [
{'a': 1, 'b':3},
{'a': 2, 'b':2}
]
unique_keys = ['a','b']
If I then run the following code, I get:
print(list(zip(*( [c[k] for k in unique_keys] for c in all_configs))))
>>> [(1, 2), (3, 2)]
# note the ( vs [
print(list(zip(*( (c[k] for k in unique_keys) for c in all_configs))))
>>> [(2, 2), (2, 2)]
This is on python 3.6.0:
Python 3.6.0 (default, Dec 24 2016, 08:01:42)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
So what's the difference between Generator Expressions and List Comprehensions? The generator yields one item at a time and generates item only when in demand. Whereas, in a list comprehension, Python reserves memory for the whole list.
The only difference between Generator Comprehension and List Comprehension is that the former uses parentheses.
The main advantage of generator over a list is that it take much less memory. We can check how much memory is taken by both types using sys. getsizeof() method.
List comprehensions are eager but generators are lazy. In list comprehensions all objects are created right away, it takes longer to create and return the list. In generator expressions, object creation is delayed until request by next() . Upon next() generator object is created and returned immediately.
In a list comprehension, expressions are evaluated eagerly. In a generator expression, they are only looked up as needed.
Thus, as the generator expression iterates over for c in all_configs
, it refers to c[k]
but only looks up c
after the loop is done, so it only uses the latest value for both tuples. By contrast, the list comprehension is evaluated immediately, so it creates a tuple with the first value of c
and another tuple with the second value of c
.
Consider this small example:
>>> r = range(3)
>>> i = 0
>>> a = [i for _ in r]
>>> b = (i for _ in r)
>>> i = 3
>>> print(*a)
0 0 0
>>> print(*b)
3 3 3
When creating a
, the interpreter created that list immediately, looking up the value of i
as soon as it was evaluated. When creating b
, the interpreter just set up that generator and didn't actually iterate over it and look up the value of i
. The print
calls told the interpreter to evaluate those objects. a
already existed as a full list in memory with the old value of i
, but b
was evaluated at that point, and when it looked up the value of i
, it found the new value.
To see what's going on, replace c[k]
with a function with a side effect:
def f(c,k):
print(c,k)
return c[k]
print("listcomp")
print(list(zip(*( [f(c,k) for k in unique_keys] for c in all_configs))))
print("gencomp")
print(list(zip(*( (f(c,k) for k in unique_keys) for c in all_configs))))
output:
listcomp
{'a': 1, 'b': 3} a
{'a': 1, 'b': 3} b
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} b
[(1, 2), (3, 2)]
gencomp
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} b
{'a': 2, 'b': 2} b
[(2, 2), (2, 2)]
c
in generator expressions is evaluated after the outer loop has completed:
c
bears the last value it took in the outer loop.
In the list comprehension case, c
is evaluated at once.
(note that aabb
vs abab
too because of execution when zipping vs execution at once)
note that you can keep the "generator" way of doing it (not creating the temporary list) by passing c
to map
so the current value is stored:
print(list(zip(*( map(c.get,unique_keys) for c in all_configs))))
in Python 3, map
does not create a list
, but the result is still OK: [(1, 2), (3, 2)]
This is happening because zip(*)
call resulted in evaluation of the outer generator and this outer returned two more generators.
(c[k], print(c)) for k in unique_keys)
The evaluation of outer generator moved c
to the second dict: {'a': 2, 'b':2}
.
Now when we are evaluating these generators individually they look for c
somewhere, and as its value is now {'a': 2, 'b':2}
you get the output as [(2, 2), (2, 2)]
.
Demo:
>>> def my_zip(*args):
... print(args)
... for arg in args:
... print (list(arg))
...
... my_zip(*((c[k] for k in unique_keys) for c in all_configs))
...
Output:
# We have two generators now, means it has looped through `all_configs`.
(<generator object <genexpr>.<genexpr> at 0x104415c50>, <generator object <genexpr>.<genexpr> at 0x10416b1a8>)
[2, 2]
[2, 2]
The list-comprehension on the other hand evaluates right away and can fetch the value of current value of c
not its last value.
c
?Use a inner function and generator function. The inner function can help us remember c
's value using default argument.
>>> def solve():
... for c in all_configs:
... def func(c=c):
... return (c[k] for k in unique_keys)
... yield func()
...
>>>
>>> list(zip(*solve()))
[(1, 2), (3, 2)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With