Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order of for statements in a list comprehension

In python2.7, I am trying to prepend every item in a list of strings with another item (eg. add item 'a' before every item in the list ['b', 'c']). From How to add list of lists in a list comprehension, I have determined the correct command, which boils down to:

>>> [i for x in ['b', 'c'] for i in ['a', x]]
['a', 'b', 'a', 'c']

Based purely on the temporary i and x variables, the version below seems more readable. However, it gives a completely different result. Why does this not give the same result?

>>> [i for i in ['a', x] for x in ['b', 'c']]
['a', 'a', 'c', 'c']

Even more curious, what happened to the 'b' entry?

like image 689
petiepooo Avatar asked Dec 24 '22 14:12

petiepooo


1 Answers

The for loops in list comprehensions are always listed in nesting order. You can write out both of your comprehensions as regular loops using the same order to nest; remember that only the expression before the first for produces the final values, so put that inside the loops.

So [i for x in ['b', 'c'] for i in ['a', x]] becomes:

for x in ['b', 'c']:
    for i in ['a', x]:
        i  # added to the final list

and [i for i in ['a', x] for x in ['b', 'c']] becomes:

for i in ['a', x]:
    for x in ['b', 'c']:
        i

As you can see, the second version would not be able to run without first defining x outside of your list comprehension, because otherwise the ['a', x] list could not be created. Also note that the x for the inner loop for x in ['b', 'c'] is otherwise ignored. All you get is i repeated. It doesn't matter what the values are in that list in the inner loop, only the length of the loop matters anymore.

In your case, your output would be explained by setting x = 'c' first; then you get for i in ['a', 'c'] for the outer loop, the inner loop iterates twice so 'a' is added twice, then i = 'c' is set and you get 'c' added twice.

As it happens, in Python 2, the variables using in a list comprehension 'leak', just like the variables used in a regular for loop leak; after using for x in ['b', 'c']: pass, x would remain available and bound to 'c'. This is where your x = 'c' comes from:

>>> [i for x in ['b', 'c'] for i in ['a', x]]
['a', 'b', 'a', 'c']
>>> i
'c'
>>> x
'c'
>>> [i for i in ['a', x] for x in ['b', 'c']]
['a', 'a', 'c', 'c']

i and x reflect what they were last bound to, so running the next list comprehension works as the first (outer) loop iterates over ['a', 'c'].

Remove x from your globals and the second list comprehension simply fails to run:

>>> del x
>>> [i for i in ['a', x] for x in ['b', 'c']]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

The same happens to the full regular for loop versions above:

>>> for i in ['a', x]:
...     for x in ['b', 'c']:
...         i
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> x = 'foo'
>>> for i in ['a', x]:
...     for x in ['b', 'c']:
...         i
... 
'a'
'a'
'foo'
'foo'

In Python 3, list comprehensions are executed in a new scope (just like generator expressions, dict comprehensions and set comprehensions already do in Python 2).

like image 72
Martijn Pieters Avatar answered Jan 10 '23 04:01

Martijn Pieters