Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterable class in python3

I am trying to implement an iterable proxy for a web resource (lazily fetched images).

Firstly, I did (returning ids, in production those will be image buffers)

def iter(ids=[1,2,3]):
    for id in ids:
        yield id

and that worked nicely, but now I need to keep state.

I read the four ways to define iterators. I judged that the iterator protocol is the way to go. Follow my attempt and failure to implement that.

class Test:
    def __init__(self, ids):
         self.ids = ids
    def __iter__(self):
        return self
    def __next__(self):
        for id in self.ids:
            yield id
        raise StopIteration

test = Test([1,2,3])
for t in test:
    print('new value', t)

Output:

new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>

forever.

What's wrong?


Thanks to absolutely everyone! It's all new to me, but I'm learning new cool stuff.

like image 693
Vorac Avatar asked Dec 13 '22 11:12

Vorac


2 Answers

Your __next__ method uses yield, which makes it a generator function. Generator functions return a new iterator when called.

But the __next__ method is part of the iterator interface. It should not itself be an iterator. __next__ should return the next value, not something that returns all values(*).

Because you wanted to create an iterable, you can just make __iter__ the generator here:

class Test:
    def __init__(self, ids):
         self.ids = ids
    def __iter__(self):
        for id in self.ids:
            yield id

Note that a generator function should not use raise StopIteration, just returning from the function does that for you.

The above class is an iterable. Iterables only have an __iter__ method, and no __next__ method. Iterables produce an iterator when __iter__ is called:

Iterable -> (call __iter__) -> Iterator

In the above example, because Test.__iter__ is a generator function, it creates a new object each time we call it:

>>> test = Test([1,2,3])
>>> test.__iter__()  # create an iterator
<generator object Test.__iter__ at 0x111e85660>
>>> test.__iter__()
<generator object Test.__iter__ at 0x111e85740>

A generator object is a specific kind of iterator, one created by calling a generator function, or by using a generator expression. Note that the hex values in the representations differ, two different objects were created for the two calls. This is by design! Iterables produce iterators, and can create more at will. This lets you loop over them independently:

>>> test_it1 = test.__iter__()
>>> test_it1.__next__()
1
>>> test_it2 = test.__iter__()
>>> test_it2.__next__()
1
>>> test_it1.__next__()
2

Note that I called __next__() on the object returned by test.__iter__(), the iterator, not on test itself, which doesn't have that method because it is only an iterable, not an iterator.

Iterators also have an __iter__ method, which always must return self, because they are their own iterators. It is the __next__ method that makes them an iterator, and the job of __next__ is to be called repeatedly, until it raises StopIteration. Until StopIteration is raised, each call should return the next value. Once an iterator is done (has raised StopIteration), it is meant to then always raise StopIteration. Iterators can only be used once, unless they are infinite (never raise StopIteration and just keep producing values each time __next__ is called).

So this is an iterator:

class IteratorTest:
    def __init__(self, ids):
        self.ids = ids
        self.nextpos = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.ids is None or self.nextpos >= len(self.ids):
            # we are done
            self.ids = None
            raise StopIteration
        value = self.ids[self.nextpos]
        self.nextpos += 1
        return value

This has to do a bit more work; it has to keep track of what the next value to produce would be, and if we have raised StopIteration yet. Other answerers here have used what appear to be simpler ways, but those actually involve letting something else do all the hard work. When you use iter(self.ids) or (i for i in ids) you are creating a different iterator to delegate __next__ calls to. That's cheating a bit, hiding the state of the iterator inside ready-made standard library objects.

You don't usually see anything calling __iter__ or __next__ in Python code, because those two methods are just the hooks that you can implement in your Python classes; if you were to implement an iterator in the C API then the hook names are slightly different. Instead, you either use the iter() and next() functions, or just use the object in syntax or a function call that accepts an iterable.

The for loop is such syntax. When you use a for loop, Python uses the (moral equivalent) of calling __iter__() on the object, then __next__() on the resulting iterator object to get each value. You can see this if you disassemble the Python bytecode:

>>> from dis import dis
>>> dis("for t in test: pass")
  1           0 LOAD_NAME                0 (test)
              2 GET_ITER
        >>    4 FOR_ITER                 4 (to 10)
              6 STORE_NAME               1 (t)
              8 JUMP_ABSOLUTE            4
        >>   10 LOAD_CONST               0 (None)
             12 RETURN_VALUE

The GET_ITER opcode at position 2 calls test.__iter__(), and FOR_ITER uses __next__ on the resulting iterator to keep looping (executing STORE_NAME to set t to the next value, then jumping back to position 4), until StopIteration is raised. Once that happens, it'll jump to position 10 to end the loop.

If you want to play more with the difference between iterators and iterables, take a look at the Python standard types and see what happens when you use iter() and next() on them. Like lists or tuples:

>>> foo = (42, 81, 17, 111)
>>> next(foo)  # foo is a tuple, not an iterator
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object is not an iterator
>>> t_it = iter(foo)  # so use iter() to create one from the tuple
>>> t_it   # here is an iterator object for our foo tuple
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it)  # it returns itself
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) is t_it  # really, it returns itself, not a new object
True
>>> next(t_it)  # we can get values from it, one by one
42
>>> next(t_it)  # another one
81
>>> next(t_it)  # yet another one
17
>>> next(t_it)  # this is getting boring..
111
>>> next(t_it)  # and now we are done
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> next(t_it)  # an *stay* done
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> foo  # but foo itself is still there
(42, 81, 17, 111)

You could make Test, the iterable, return a custom iterator class instance too (and not cop out by having generator function create the iterator for us):

class Test:
    def __init__(self, ids):
        self.ids = ids
    def __iter__(self):
        return TestIterator(self)

class TestIterator:
    def __init__(self, test):
        self.test = test
    def __iter__(self):
        return self
def __next__(self):
    if self.test is None or self.nextpos >= len(self.test.ids):
        # we are done
        self.test = None
        raise StopIteration
    value = self.test.ids[self.nextpos]
    self.nextpos += 1
    return value

That's a lot like the original IteratorTest class above, but TestIterator keeps a reference to the Test instance. That's really how tuple_iterator works too.

A brief, final note on naming conventions here: I am sticking with using self for the first argument to methods, so the bound instance. Using different names for that argument only serves to make it harder to talk about your code with other, experienced Python developers. Don't use me, however cute or short it may seem.


(*) Unless your goal was to create an iterator of iterators, of course (which is basically what the itertools.groupby() iterator does, it is an iterator producing (object, group_iterator) tuples, but I digress).

like image 89
Martijn Pieters Avatar answered Dec 22 '22 00:12

Martijn Pieters


It is unclear to me exactly what you are trying to achieve, but if you really want to use your instance attributes like this, you can convert the input to a generator and then iterate it as such. But, as I said, this feels odd and I don't think you'd actually want a setup like this.

class Test:
    def __init__(self, ids):
         self.ids = iter(ids)
    def __iter__(self):
        return self
    def __next__(self):
        return next(self.ids)

test = Test([1,2,3])
for t in test:
    print('new value', t)
like image 36
Bram Vanroy Avatar answered Dec 21 '22 22:12

Bram Vanroy