Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive generator for flattening nested lists

I'm a programming newbie and am having some trouble understanding an example from my python textbook ("Beginning Python" by Magnus Lie Hetland). The example is for a recursive generator designed to flatten the elements of nested lists (with arbitrary depth):

def flatten(nested):
    try:
        for sublist in nested:
            for element in flatten(sublist):
                yield element
    except TypeError:
        yield nested

You would then feed in a nested list as follows:

>>> list(flatten([[[1],2],3,4,[5,[6,7]],8]))
[1,2,3,4,5,6,7,8]

I understand how the recursion within flatten() helps to whittle down to the innermost element of this list, '1', but what I don't understand is what happens when '1' is actually passed back into flatten() as 'nested'. I thought that this would lead to a TypeError (can't iterate over a number), and that the exception handling was what would actually do the heavy lifting for generating output... but testing with modified versions of flatten() has convinced me that this isn't the case. Instead, it seems like the 'yield element' line is responsible.

That said, my question is this... how can 'yield element' ever actually be executed? It seems like 'nested' will either be a list - in which case another layer of recursion is added - or it's a number and you get a TypeError.

Any help with this would be much appreciated... in particular, I'd love to be walked through the chain of events as flatten() handles a simple example like:

list(flatten([[1,2],3]))
like image 468
WithoutATowel Avatar asked Jul 07 '12 17:07

WithoutATowel


4 Answers

I have added some instrumentation to the function:

def flatten(nested, depth=0):
    try:
        print("{}Iterate on {}".format('  '*depth, nested))
        for sublist in nested:
            for element in flatten(sublist, depth+1):
                print("{}got back {}".format('  '*depth, element))
                yield element
    except TypeError:
        print('{}not iterable - return {}'.format('  '*depth, nested))
        yield nested

Now calling

list(flatten([[1,2],3]))

displays

Iterate on [[1, 2], 3]
  Iterate on [1, 2]
    Iterate on 1
    not iterable - return 1
  got back 1
got back 1
    Iterate on 2
    not iterable - return 2
  got back 2
got back 2
  Iterate on 3
  not iterable - return 3
got back 3
like image 130
Hugh Bothwell Avatar answered Nov 13 '22 21:11

Hugh Bothwell


Perhaps part of your confusion is that you're thinking of the final yield statement as though it were a return statement. Indeed, a couple of people have suggested that when a TypeError is thrown in this code, the item passed is "returned". That's not the case!

Remember that any time yield appears in a function, the result is not a single item, but an iterable -- even if only one item appears in the sequence. So when you pass 1 to flatten, the result is a one-item generator. To get the item out of it, you still need to iterate over it.

Since this one-item generator is iterable, it doesn't throw a TypeError when the inner for loop tries to iterate over it; but the inner for loop only executes once. Then the outer for loop moves on to the next iterable in the nested list.

Another way to think about this would be to say that every time you pass a non-iterable value to flatten, it wraps the value in a one-item iterable and "returns" that.

like image 42
senderle Avatar answered Nov 13 '22 23:11

senderle


A great way to break down a function that you generally understand, but one little part is stumping you, is to use the python debugger. Here it is with comments added:

-> def flatten(nested):
(Pdb) l
  1  -> def flatten(nested):
  2         try:
  3             for sublist in nested:
  4                 for element in flatten(sublist):
  5                     yield element
  6         except TypeError:
  7             yield nested
  8     
  9     import pdb; pdb.set_trace()
 10     list(flatten([[1,2],3]))
 11     
(Pdb) a
nested = [[1, 2], 3]

Above, we've just entered the function and the argument is [[1, 2], 3]. Let's use pdb's step function to step through the function into any recursive calls we should encounter:

(Pdb) s
> /Users/michael/foo.py(2)flatten()
-> try:
(Pdb) s
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
> /Users/michael/foo.py(4)flatten()
-> for element in flatten(sublist):
(Pdb) s
--Call--
> /Users/michael/foo.py(1)flatten()
-> def flatten(nested):
(Pdb) a
nested = [1, 2]

We've stepped into one inner frame of flatten, where the argument is [1, 2].

(Pdb) s
> /Users/michael/foo.py(2)flatten()
-> try:
(Pdb) s
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
> /Users/michael/foo.py(4)flatten()
-> for element in flatten(sublist):
(Pdb) s
--Call--
> /Users/michael/foo.py(1)flatten()
-> def flatten(nested):
(Pdb) a
nested = 1

Two frames in, the argument 1 isn't an iterable anymore. This should be interesting…

(Pdb) s
> /Users/michael/foo.py(2)flatten()
-> try:
(Pdb) s
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
TypeError: "'int' object is not iterable"
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
> /Users/michael/foo.py(6)flatten()
-> except TypeError:
(Pdb) s
> /Users/michael/foo.py(7)flatten()
-> yield nested
(Pdb) s
--Return--
> /Users/michael/foo.py(7)flatten()->1
-> yield nested

OK, so because of the except TypeError, we're just yielding the argument itself. Up a frame!

(Pdb) s
> /Users/michael/foo.py(5)flatten()
-> yield element
(Pdb) l
  1     def flatten(nested):
  2         try:
  3             for sublist in nested:
  4                 for element in flatten(sublist):
  5  ->                 yield element
  6         except TypeError:
  7             yield nested
  8     
  9     import pdb; pdb.set_trace()
 10     list(flatten([[1,2],3]))
 11     

yield element will of course yield 1, so once our lowest frame hits a TypeError, the result propagates all the way up the stack to the outermost frame of flatten, which yields it to the outside world before moving on to further parts of the outer iterable.

like image 30
kojiro Avatar answered Nov 13 '22 23:11

kojiro


the try except construction catches the exception for you and yields nested back which is just the argument that was given to flatten().

So flatten(1) will go wrong in for sublist in nested: and continues with the except part and yields nested which is 1.

like image 30
Marco de Wit Avatar answered Nov 13 '22 22:11

Marco de Wit