The original context of this bug is a piece of code too large to post in a question like this. I had to whittle this code down to a minimal snippet that still exhibits the bug. This is why the code shown below is somewhat bizarre-looking.
In the code below, the class Foo
may thought of as a convoluted way to get something like xrange
.
class Foo(object):
def __init__(self, n):
self.generator = (x for x in range(n))
def __iter__(self):
for e in self.generator:
yield e
Indeed, Foo
seems to behave very much like xrange
:
for c in Foo(3):
print c
# 0
# 1
# 2
print list(Foo(3))
# [0, 1, 2]
Now, the subclass Bar
of Foo
adds only a __len__
method:
class Bar(Foo):
def __len__(self):
return sum(1 for _ in self.generator)
Bar
behaves just like Foo
when used in a for
-loop:
for c in Bar(3):
print c
# 0
# 1
# 2
BUT:
print list(Bar(3))
# []
My guess is that, in the evaluation of list(Bar(3))
, the __len__
method of Bar(3)
is getting called, thereby using up the generator.
(If this guess is correct, the call to Bar(3).__len__
is unnecessary; after all, list(Foo(3))
produces the correct result even though Foo
has no __len__
method.)
This situation is annoying: there's no good reason for list(Foo(3))
and list(Bar(3))
to produce different results.
Is it possible to fix Bar
(without, of course, getting rid of its __len__
method) such that list(Bar(3))
returns [0, 1, 2]
?
Your problem is that Foo does not behave the same as xrange: xrange gives you a new iterator each time you asks its iter
method, while Foo gives you always the same, meaning that once it is exhausted the object is too:
>>> a = Foo(3)
>>> list(a)
[0, 1, 2]
>>> list(a)
[]
>>> a = range(3)
>>> list(a)
[0, 1, 2]
>>> list(a)
[0, 1, 2]
I could easily confirm that the __len__
method is called by list
by adding spys to your methods:
class Bar(Foo):
def __len__(self):
print "LEN"
return sum(1 for _ in self.generator)
(and I added a print "ITERATOR"
in Foo.__iter__
). It yields:
>>> list(Bar(3))
LEN
ITERATOR
[]
I can only imagine two workarounds:
my preferred one: return a new iterator on each call to __iter__
at Foo
level to mimic xrange
:
class Foo(object):
def __init__(self, n):
self.n = n
def __iter__(self):
print "ITERATOR"
return ( x for x in range(self.n))
class Bar(Foo):
def __len__(self):
print "LEN"
return sum(1 for _ in self.generator)
we get correctly:
>>> list(Bar(3))
ITERATOR
LEN
ITERATOR
[0, 1, 2]
the alternative: change len to not call the iterator and let Foo
untouched:
class Bar(Foo):
def __init__(self, n):
self.len = n
super(Bar, self).__init__(n)
def __len__(self):
print "LEN"
return self.len
Here again we get:
>>> list(Bar(3))
LEN
ITERATOR
[0, 1, 2]
but Foo and Bar objects are exhausted once first iterator reaches its end.
But I must admit that I do not know the context of your real classes...
This behaviour might be annoying but it's actually quite understandable. Internally a list
is simply an array and an array is a fixed size datastructure. The result of this is that if you have a list
that has size n
and you want to add an extra item to reach n+1
it will have to create a whole new array and completely copy the old one to the new one. Effectively your list.append(x)
is now a O(n)
operation instead of the regular O(1)
.
To prevent this, list()
tries to get the size of your input so it can guess what size the array needs to be.
So one solution for this problem is to force it to guess by using iter
:
list(iter(Bar(3)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With