Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

understanding list comprehension for flattening list of lists in python

I found this comprehension that works perfectly for flattening a list of lists:

>>> list_of_lists = [(1,2,3),(2,3,4),(3,4,5)]
>>> [item for sublist in list_of_lists for item in sublist]
[1, 2, 3, 2, 3, 4, 3, 4, 5]

I like this better than using itertools.chain(), but I just can't understand it. I've tried surrounding parts with parentheses, to see if I could reduce the complexity, but now I'm just more confused:

>>> [(item for sublist in list_of_lists) for item in sublist]
[<generator object <genexpr> at 0x7ff919fdfd20>, <generator object <genexpr> at 0x7ff919fdfd70>, <generator object <genexpr> at 0x7ff919fdfdc0>]

>>> [item for sublist in (list_of_lists for item in sublist)]
[5, 5, 5]

I get this feeling that I'm having a hard time understanding because I don't quite understand how generators work... I mean, I thought I did, but now I'm seriously in doubt. Like I said, I love how compact this idiom is, and it's exactly what I need, but I'm loathe to use code that I don't understand.

Can anyone explain what exactly is happening here?

like image 956
gbromios Avatar asked Jul 05 '14 13:07

gbromios


2 Answers

Read the for loops as if they were nested, from left to right. The expression on the left is the one that produces each value in the final list:

for sublist in list_of_lists:
    for item in sublist:
        item  # added to the list

List comprehensions also support if tests to filter what elements are used; these can also be seen as nested statements, in the same way as the for loops.

By adding parenthesis, you changed the expression; everything in parenthesis is now the left-hand expression to add:

for item in sublist:
    (item for sublist in list_of_lists)  # added to the list

A for loop like that is a generator expression. It works exactly like a list comprehension except that it doesn't build a list. The elements are instead produced on demand. You can ask a generator expression for the next value, then the next value, etc.

In this case, there must be a pre-existing sublist object for this to work at all; the outer loop is not over list_of_lists anymore, after all.

Your last attempt translates to:

for sublist in (list_of_lists for item in sublist):
    item  # aded to the list

Here list_of_lists is a loop element in a generator expression looping over for item in sublist. Again, sublist must exist already for this to work. The loop then adds a pre-existing item to the final list output.

In your case, apparently sublist is a list with 3 items in it; your final list produced 3 elements. item was bound to 5, so you got 3 times 5 in your output.

like image 148
Martijn Pieters Avatar answered Oct 28 '22 10:10

Martijn Pieters


List Comprehension

When I first started with list comprehension, I read that like English sentences and I was able to easily understand them. For example,

[item for sublist in list_of_lists for item in sublist]

can be read like

for each sublist in list_of_lists and for each item in sublist add item

Also, the filtering part can be read as

for each sublist in list_of_lists and for each item in sublist add item only if it is valid

And the corresponding comprehension would be

[item for sublist in list_of_lists for item in sublist if valid(item)]

Generators

They are like land mines, triggered only when invoked with the next protocol. They are similar to functions, but till an exception is raised or the end of function is reached, they are not exhausted and they can be invoked again and again. The important thing is, they retain the state between the previous invocation and the current.

The difference between a generator and a function is that, generators use yield keyword to give the value back to the invoker. In case of a generator expression, they are similar to the list comprehension, the fist expression is the actual value being "yielded".

With this basic understanding, if we look at your expressions in the question,

[(item for sublist in list_of_lists) for item in sublist]

You are mixing list comprehension with the generator expressions. This will be read like this

for each item in sublist add a generator expression which is defined as, for every sublist in list_of_lists yield item

which is not what you had in your mind. And since the generator expression is not iterated, the generator expression object is added in the list as it is. Since they will not be evaluated without being invoked with the next protocol, they will not produce any error (if there are any, unless they have syntax error). In this case, it will produce runtime error as sublist is not defined yet.

Also, in the last case,

[item for sublist in (list_of_lists for item in sublist)]
for each sublist in the generator expression, add item and the generator expression is defined as for each item in sublist yield list_of_lists.

The for loop will iterate any iterable with the next protocol. So, the generator expression will be evaluated and the item will always be the last element in the iteration of the sublist and you are adding that in the list. This will also produce runtime error, since sublist is not defined yet.

like image 34
thefourtheye Avatar answered Oct 28 '22 09:10

thefourtheye