I am trying to implement an iterable proxy for a web resource (lazily fetched images).
Firstly, I did (returning ids, in production those will be image buffers)
def iter(ids=[1,2,3]):
for id in ids:
yield id
and that worked nicely, but now I need to keep state.
I read the four ways to define iterators. I judged that the iterator protocol is the way to go. Follow my attempt and failure to implement that.
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return self
def __next__(self):
for id in self.ids:
yield id
raise StopIteration
test = Test([1,2,3])
for t in test:
print('new value', t)
Output:
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
forever.
What's wrong?
Thanks to absolutely everyone! It's all new to me, but I'm learning new cool stuff.
Your __next__
method uses yield
, which makes it a generator function. Generator functions return a new iterator when called.
But the __next__
method is part of the iterator interface. It should not itself be an iterator. __next__
should return the next value, not something that returns all values(*).
Because you wanted to create an iterable, you can just make __iter__
the generator here:
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
for id in self.ids:
yield id
Note that a generator function should not use raise StopIteration
, just returning from the function does that for you.
The above class is an iterable. Iterables only have an __iter__
method, and no __next__
method. Iterables produce an iterator when __iter__
is called:
Iterable -> (call __iter__
) -> Iterator
In the above example, because Test.__iter__
is a generator function, it creates a new object each time we call it:
>>> test = Test([1,2,3])
>>> test.__iter__() # create an iterator
<generator object Test.__iter__ at 0x111e85660>
>>> test.__iter__()
<generator object Test.__iter__ at 0x111e85740>
A generator object is a specific kind of iterator, one created by calling a generator function, or by using a generator expression. Note that the hex values in the representations differ, two different objects were created for the two calls. This is by design! Iterables produce iterators, and can create more at will. This lets you loop over them independently:
>>> test_it1 = test.__iter__()
>>> test_it1.__next__()
1
>>> test_it2 = test.__iter__()
>>> test_it2.__next__()
1
>>> test_it1.__next__()
2
Note that I called __next__()
on the object returned by test.__iter__()
, the iterator, not on test
itself, which doesn't have that method because it is only an iterable, not an iterator.
Iterators also have an __iter__
method, which always must return self
, because they are their own iterators. It is the __next__
method that makes them an iterator, and the job of __next__
is to be called repeatedly, until it raises StopIteration
. Until StopIteration
is raised, each call should return the next value. Once an iterator is done (has raised StopIteration
), it is meant to then always raise StopIteration
. Iterators can only be used once, unless they are infinite (never raise StopIteration
and just keep producing values each time __next__
is called).
So this is an iterator:
class IteratorTest:
def __init__(self, ids):
self.ids = ids
self.nextpos = 0
def __iter__(self):
return self
def __next__(self):
if self.ids is None or self.nextpos >= len(self.ids):
# we are done
self.ids = None
raise StopIteration
value = self.ids[self.nextpos]
self.nextpos += 1
return value
This has to do a bit more work; it has to keep track of what the next value to produce would be, and if we have raised StopIteration
yet. Other answerers here have used what appear to be simpler ways, but those actually involve letting something else do all the hard work. When you use iter(self.ids)
or (i for i in ids)
you are creating a different iterator to delegate __next__
calls to. That's cheating a bit, hiding the state of the iterator inside ready-made standard library objects.
You don't usually see anything calling __iter__
or __next__
in Python code, because those two methods are just the hooks that you can implement in your Python classes; if you were to implement an iterator in the C API then the hook names are slightly different. Instead, you either use the iter()
and next()
functions, or just use the object in syntax or a function call that accepts an iterable.
The for
loop is such syntax. When you use a for
loop, Python uses the (moral equivalent) of calling __iter__()
on the object, then __next__()
on the resulting iterator object to get each value. You can see this if you disassemble the Python bytecode:
>>> from dis import dis
>>> dis("for t in test: pass")
1 0 LOAD_NAME 0 (test)
2 GET_ITER
>> 4 FOR_ITER 4 (to 10)
6 STORE_NAME 1 (t)
8 JUMP_ABSOLUTE 4
>> 10 LOAD_CONST 0 (None)
12 RETURN_VALUE
The GET_ITER
opcode at position 2 calls test.__iter__()
, and FOR_ITER
uses __next__
on the resulting iterator to keep looping (executing STORE_NAME
to set t
to the next value, then jumping back to position 4), until StopIteration
is raised. Once that happens, it'll jump to position 10 to end the loop.
If you want to play more with the difference between iterators and iterables, take a look at the Python standard types and see what happens when you use iter()
and next()
on them. Like lists or tuples:
>>> foo = (42, 81, 17, 111)
>>> next(foo) # foo is a tuple, not an iterator
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object is not an iterator
>>> t_it = iter(foo) # so use iter() to create one from the tuple
>>> t_it # here is an iterator object for our foo tuple
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) # it returns itself
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) is t_it # really, it returns itself, not a new object
True
>>> next(t_it) # we can get values from it, one by one
42
>>> next(t_it) # another one
81
>>> next(t_it) # yet another one
17
>>> next(t_it) # this is getting boring..
111
>>> next(t_it) # and now we are done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(t_it) # an *stay* done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> foo # but foo itself is still there
(42, 81, 17, 111)
You could make Test
, the iterable, return a custom iterator class instance too (and not cop out by having generator function create the iterator for us):
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return TestIterator(self)
class TestIterator:
def __init__(self, test):
self.test = test
def __iter__(self):
return self
def __next__(self):
if self.test is None or self.nextpos >= len(self.test.ids):
# we are done
self.test = None
raise StopIteration
value = self.test.ids[self.nextpos]
self.nextpos += 1
return value
That's a lot like the original IteratorTest
class above, but TestIterator
keeps a reference to the Test
instance. That's really how tuple_iterator
works too.
A brief, final note on naming conventions here: I am sticking with using self
for the first argument to methods, so the bound instance. Using different names for that argument only serves to make it harder to talk about your code with other, experienced Python developers. Don't use me
, however cute or short it may seem.
(*) Unless your goal was to create an iterator of iterators, of course (which is basically what the itertools.groupby()
iterator does, it is an iterator producing (object, group_iterator)
tuples, but I digress).
It is unclear to me exactly what you are trying to achieve, but if you really want to use your instance attributes like this, you can convert the input to a generator and then iterate it as such. But, as I said, this feels odd and I don't think you'd actually want a setup like this.
class Test:
def __init__(self, ids):
self.ids = iter(ids)
def __iter__(self):
return self
def __next__(self):
return next(self.ids)
test = Test([1,2,3])
for t in test:
print('new value', t)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With