Why have an __iter__
method? If an object is an iterator, then it is pointless to have a method which returns itself. If it is not an iterator but is instead an iterable, i.e something with an __iter__
and __getitem__
method, then why would one want to ever define something which returns an iterator but is not an iterator itself? In Python, when would one want to define an iterable that is not itself an iterator? Or, what is an example of something that is an iterable but not an iterator?
Trying to answer your questions one at a time:
Why have an
__iter__
method? If an object is an iterator, then it is pointless to have a method which returns itself.
It's not pointless. The iterator protocol demands an __iter__
and __next__
(or next
in Python 2) method. All sane iterators I have ever seen just return self
in their __iter__
method, but it is still crucial to have that method. Not having it would lead to all kinds of weirdness, for example:
somelist = [1, 2, 3]
it = iter(somelist)
now
iter(it)
or
for x in it: pass
would throw a TypeError
and complain that it
is not iterable, because when iter(x)
is called (which implicitly happens when you employ a for
loop) it expects the argument object x
to be able to produce an iterator (it just tries to call __iter__
on that object). Concrete example (Python 3):
>>> class A:
... def __iter__(self):
... return B()
...
>>> class B:
... def __next__(self):
... pass
...
>>> iter(iter(A()))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'B' object is not iterable
Consider any functions, escpecially from itertools that expect an iterable, for example dropwhile. Calling it with any object that has an __iter__
method will be fine, regardless of whether it's an iterable that is not an iterator, or an iterator - because you can expect the same result when calling iter
with that object as an argument. Making a weird distinction between two kinds of iterables here would go against the principle of duck typing which python strongly embraces.
Neat tricks like
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list(zip(*[iter(a)]*3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
would just stop working if you could not pass iterators to zip
.
why would one want to ever define something which returns an iterator but is not an iterator itself
Let's consider this simple list iterator:
>>> class MyList(list):
... def __iter__(self):
... return MyListIterator(self)
>>>
>>> class MyListIterator:
... def __init__(self, lst):
... self._lst = lst
... self.index = 0
... def __iter__(self):
... return self
... def __next__(self):
... try:
... n = self._lst[self.index]
... self.index += 1
... return n
... except IndexError:
... raise StopIteration
>>>
>>> a = MyList([1,2,3])
>>> for x in a:
... for x in a:
... x
...
1
2
3
1
2
3
1
2
3
Remember that iter
is called with the iterable in question for both for
loops, expecting a fresh iterator each time from the object's __iter__
method.
Now, without an iterator being produced each time a for
loop is employed, how would you be able to keep track of the current state of any iteration when a MyList
object is iterated over an arbitrary number of times at the same time? Oh, that's right, you can't. :)
edit: Bonus and sort of a reply to Tadhg McDonald-Jensen's comment
A resuable iterator is not unthinkable, but of course a bit weird because it would rely on being initialized with a "non-consumable" iterable (i.e. not a classic iterator):
>>> class riter(object):
... def __init__(self, iterable):
... self.iterable = iterable
... self.it = iter(iterable)
... def __next__(self): # python 2: next
... try:
... return next(self.it)
... except StopIteration:
... self.it = iter(self.iterable)
... raise
... def __iter__(self):
... return self
...
>>>
>>> a = [1, 2, 3]
>>> it = riter(a)
>>> for x in it:
... x
...
1
2
3
>>> for x in it:
... x
...
1
2
3
An iterable is something that can be iterated (looped) over, where as an iterator is something that is consumed.
what is an example of something that is an iterable but not an iterator?
Simple, a list
. Or any sequence, since you can iterate over a list as many times as you want without destruction to the list:
>>> a = [1,2,3]
>>> for i in a:
print(i,end=" ")
1 2 3
>>> for i in a:
print(i,end=" ")
1 2 3
Where as an iterator (like a generator) can only be used once:
>>> b = (i for i in range(3))
>>> for i in b:
print(i,end=" ")
0 1 2
>>> for i in b:
print(i,end=" ")
>>> #iterator has already been used up, nothing gets printed
For a list to be consumed like an iterator you would need to use something like self.pop(0)
to remove the first element of the list for iteration:
class IteratorList(list):
def __iter__(self):
return self #since the current mechanics require this
def __next__(self):
try:
return self.pop(0)
except IndexError: #we need to raise the expected kind of error
raise StopIteration
next = __next__ #for compatibility with python 2
a = IteratorList([1,2,3,4,5])
for i in a:
print(i)
if i==3: # lets stop at three and
break # see what the list is after
print(a)
which gives this output:
1
2
3
[4, 5]
You see? This is what iterators do, once a value is returned from __next__
it has no reason to hang around in the iterator or in memory, so it is removed. That's why we need the __iter__
, to define iterators that let us iterate over sequences without destroying them in the process.
In response to @timgeb's comment, I suppose if you added items to an IteratorList
then iterated over it again that would make sense:
a = IteratorList([1,2,3,4,5])
for i in a:
print(i)
a.extend([6,7,8,9])
for i in a:
print(i)
But all iterators only make sense to either be consumed or never end. (like itertools.repeat
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With