Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Annoying generator bug

The original context of this bug is a piece of code too large to post in a question like this. I had to whittle this code down to a minimal snippet that still exhibits the bug. This is why the code shown below is somewhat bizarre-looking.

In the code below, the class Foo may thought of as a convoluted way to get something like xrange.

class Foo(object):
    def __init__(self, n):
        self.generator = (x for x in range(n))

    def __iter__(self):
        for e in self.generator:
            yield e

Indeed, Foo seems to behave very much like xrange:

for c in Foo(3):
    print c
# 0
# 1
# 2

print list(Foo(3))
# [0, 1, 2]

Now, the subclass Bar of Foo adds only a __len__ method:

class Bar(Foo):
    def __len__(self):
        return sum(1 for _ in self.generator)

Bar behaves just like Foo when used in a for-loop:

for c in Bar(3):
    print c
# 0
# 1
# 2

BUT:

print list(Bar(3))
# []

My guess is that, in the evaluation of list(Bar(3)), the __len__ method of Bar(3) is getting called, thereby using up the generator.

(If this guess is correct, the call to Bar(3).__len__ is unnecessary; after all, list(Foo(3)) produces the correct result even though Foo has no __len__ method.)

This situation is annoying: there's no good reason for list(Foo(3)) and list(Bar(3)) to produce different results.

Is it possible to fix Bar (without, of course, getting rid of its __len__ method) such that list(Bar(3)) returns [0, 1, 2]?

like image 577
kjo Avatar asked Aug 18 '15 17:08

kjo


2 Answers

Your problem is that Foo does not behave the same as xrange: xrange gives you a new iterator each time you asks its iter method, while Foo gives you always the same, meaning that once it is exhausted the object is too:

>>> a = Foo(3)
>>> list(a)
[0, 1, 2]
>>> list(a)
[]
>>> a = range(3)
>>> list(a)
[0, 1, 2]
>>> list(a)
[0, 1, 2]

I could easily confirm that the __len__ method is called by list by adding spys to your methods:

class Bar(Foo):
    def __len__(self):
        print "LEN"
        return sum(1 for _ in self.generator)

(and I added a print "ITERATOR" in Foo.__iter__). It yields:

>>> list(Bar(3))
LEN
ITERATOR
[]

I can only imagine two workarounds:

  1. my preferred one: return a new iterator on each call to __iter__ at Foo level to mimic xrange:

    class Foo(object):
        def __init__(self, n):
            self.n = n
    
        def __iter__(self):
            print "ITERATOR"
            return ( x for x in range(self.n))
    
    class Bar(Foo):
        def __len__(self):
            print "LEN"
            return sum(1 for _ in self.generator)
    

    we get correctly:

    >>> list(Bar(3))
    ITERATOR
    LEN
    ITERATOR
    [0, 1, 2]
    
  2. the alternative: change len to not call the iterator and let Foo untouched:

    class Bar(Foo):
        def __init__(self, n):
            self.len  = n
            super(Bar, self).__init__(n)
        def __len__(self):
            print "LEN"
            return self.len
    

    Here again we get:

    >>> list(Bar(3))
    LEN
    ITERATOR
    [0, 1, 2]
    

    but Foo and Bar objects are exhausted once first iterator reaches its end.

But I must admit that I do not know the context of your real classes...

like image 187
Serge Ballesta Avatar answered Nov 01 '22 01:11

Serge Ballesta


This behaviour might be annoying but it's actually quite understandable. Internally a list is simply an array and an array is a fixed size datastructure. The result of this is that if you have a list that has size n and you want to add an extra item to reach n+1 it will have to create a whole new array and completely copy the old one to the new one. Effectively your list.append(x) is now a O(n) operation instead of the regular O(1).

To prevent this, list() tries to get the size of your input so it can guess what size the array needs to be.

So one solution for this problem is to force it to guess by using iter:

list(iter(Bar(3)))
like image 43
Wolph Avatar answered Nov 01 '22 01:11

Wolph