I wanted to understand a bit more about iterators
, so please correct me if I'm wrong.
An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing.
However I still don't understand why is the following behavior happening:
In [1]: iter = (i for i in range(5)) In [2]: for _ in iter: ....: print _ ....: 0 1 2 3 4 In [3]: for _ in iter: ....: print _ ....: In [4]:
After a first loop through the iterator (In [2]
) it's as if it was consumed and left empty, so the second loop (In [3]
) prints nothing.
However I never assigned a new value to the iter
variable.
What is really happening under the hood of the for
loop?
When a for loop is executed, for statement calls iter() on the object, which it is supposed to loop over. If this call is successful, the iter call will return an iterator object that defines the method __next__(), which accesses elements of the object one at a time.
StopIteration: Under the hood, Python's for loop is using iterators.
Iterators will be faster and have better memory efficiency.
In for-each loop, we can't modify collection, it will throw a ConcurrentModificationException on the other hand with iterator we can modify collection. Modifying a collection simply means removing an element or changing content of an item stored in the collection.
Your suspicion is correct: the iterator has been consumed.
In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.
type((i for i in range(5))) # says it's type generator def another_generator(): yield 1 # the yield expression makes it a generator, not a function type(another_generator()) # also a generator
The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:
def my_gen(): while True: yield 1 # again: yield means it is a generator, not a function for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!
Some other corrections to help improve your understanding:
for
in
accepts an iterable object as its second argument. list
, or dict
, or a str
object (string), or a user-defined type that provides the required functionality. iter
function is applied to the object to get an iterator (by the way: don't use iter
as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's __iter__
method is called (which is, for the most part, all the iter
function does anyway; __iter__
is one of Python's so-called "magic methods").__iter__
is successful, the function next()
is applied to the iterable object over and over again, in a loop, and the first variable supplied to for
in
is assigned to the result of the next()
function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__
method, which is another "magic method". for
loop ends when next()
raises the StopIteration
exception (which usually happens when the iterable does not have another object to yield when next()
is called).You can "manually" implement a for
loop in python this way (probably not perfect, but close enough):
try: temp = iterable.__iter__() except AttributeError(): raise TypeError("'{}' object is not iterable".format(type(iterable).__name__)) else: while True: try: _ = temp.__next__() except StopIteration: break except AttributeError: raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__)) # this is the "body" of the for loop continue
There is pretty much no difference between the above and your example code.
Actually, the more interesting part of a for
loop is not the for
, but the in
. Using in
by itself produces a different effect than for
in
, but it is very useful to understand what in
does with its arguments, since for
in
implements very similar behavior.
When used by itself, the in
keyword first calls the object's __contains__
method, which is yet another "magic method" (note that this step is skipped when using for
in
). Using in
by itself on a container, you can do things like this:
1 in [1, 2, 3] # True 'He' in 'Hello' # True 3 in range(10) # True 'eH' in 'Hello'[::-1] # True
If the iterable object is NOT a container (i.e. it doesn't have a __contains__
method), in
next tries to call the object's __iter__
method. As was said previously: the __iter__
method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function next()
on1. A generator is just one type of iterator.
__iter__
is successful, the in
keyword applies the function next()
to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__
method). __iter__
method to return an iterator, in
then falls back on the old-style iteration protocol using the object's __getitem__
method2. TypeError
exception.If you wish to create your own object type to iterate over (i.e, you can use for
in
, or just in
, on it), it's useful to know about the yield
keyword, which is used in generators (as mentioned above).
class MyIterable(): def __iter__(self): yield 1 m = MyIterable() for _ in m: print(_) # 1 1 in m # True
The presence of yield
turns a function or method into a generator instead of a regular function/method. You don't need the __next__
method if you use a generator (it brings __next__
along with it automatically).
If you wish to create your own container object type (i.e, you can use in
on it by itself, but NOT for
in
), you just need the __contains__
method.
class MyUselessContainer(): def __contains__(self, obj): return True m = MyUselessContainer() 1 in m # True 'Foo' in m # True TypeError in m # True None in m # True
1 Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__
and __iter__
methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__
method is actually next
(no underscores) in Python 2.
2 See this answer for the different ways to create iterable classes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With