I wanted to understand a bit more about <code>iterators</code>, so please correct me if I'm wrong. An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing. However I still don't understand why is the following behavior happening: <pre class="prettyprint"><code>In [1]: iter = (i for i in range(5)) In [2]: for _ in iter: ....: print _ ....: 0 1 2 3 4 In [3]: for _ in iter: ....: print _ ....: In [4]: </code></pre> After a first loop through the iterator (<code>In [2]</code>) it's as if it was consumed and left empty, so the second loop (<code>In [3]</code>) prints nothing. However I never assigned a new value to the <code>iter</code> variable. What is really happening under the hood of the <code>for</code> loop?

Your suspicion is correct: the iterator has been consumed. In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once. <pre class="prettyprint"><code>type((i for i in range(5))) # says it's type generator def another_generator(): yield 1 # the yield expression makes it a generator, not a function type(another_generator()) # also a generator </code></pre> The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator: <pre class="prettyprint"><code>def my_gen(): while True: yield 1 # again: yield means it is a generator, not a function for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop! </code></pre> Some other corrections to help improve your understanding: <ul> <li>The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.</li> <li>One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.</li> <li>The keyword combination <code>for</code> <code>in</code> accepts an iterable object as its second argument. </li> <li>The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a <code>list</code>, or <code>dict</code>, or a <code>str</code> object (string), or a user-defined type that provides the required functionality. </li> <li>The <code>iter</code> function is applied to the object to get an iterator (by the way: don't use <code>iter</code> as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's <code>__iter__</code> method is called (which is, for the most part, all the <code>iter</code> function does anyway; <code>__iter__</code> is one of Python's so-called "magic methods").</li> <li>If the call to <code>__iter__</code> is successful, the function <code>next()</code> is applied to the iterable object over and over again, in a loop, and the first variable supplied to <code>for</code> <code>in</code> is assigned to the result of the <code>next()</code> function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's <code>__next__</code> method, which is another "magic method". </li> <li>The <code>for</code> loop ends when <code>next()</code> raises the <code>StopIteration</code> exception (which usually happens when the iterable does not have another object to yield when <code>next()</code> is called).</li> </ul> You can "manually" implement a <code>for</code> loop in python this way (probably not perfect, but close enough): <pre class="prettyprint"><code>try: temp = iterable.__iter__() except AttributeError(): raise TypeError("'{}' object is not iterable".format(type(iterable).__name__)) else: while True: try: _ = temp.__next__() except StopIteration: break except AttributeError: raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__)) # this is the "body" of the for loop continue </code></pre> There is pretty much no difference between the above and your example code. Actually, the more interesting part of a <code>for</code> loop is not the <code>for</code>, but the <code>in</code>. Using <code>in</code> by itself produces a different effect than <code>for</code> <code>in</code>, but it is very useful to understand what <code>in</code> does with its arguments, since <code>for</code> <code>in</code> implements very similar behavior. <ul> <li> When used by itself, the <code>in</code> keyword first calls the object's <code>__contains__</code> method, which is yet another "magic method" (note that this step is skipped when using <code>for</code> <code>in</code>). Using <code>in</code> by itself on a container, you can do things like this: <pre class="prettyprint"><code>1 in [1, 2, 3] # True 'He' in 'Hello' # True 3 in range(10) # True 'eH' in 'Hello'[::-1] # True </code></pre> </li> <li>If the iterable object is NOT a container (i.e. it doesn't have a <code>__contains__</code> method), <code>in</code> next tries to call the object's <code>__iter__</code> method. As was said previously: the <code>__iter__</code> method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function <code>next()</code> on1. A generator is just one type of iterator. </li> <li>If the call to <code>__iter__</code> is successful, the <code>in</code> keyword applies the function <code>next()</code> to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's <code>__next__</code> method). </li> <li>If the object doesn't have a <code>__iter__</code> method to return an iterator, <code>in</code> then falls back on the old-style iteration protocol using the object's <code>__getitem__</code> method2. </li> <li>If all of the above attempts fail, you'll get a <code>TypeError</code> exception.</li> </ul> If you wish to create your own object type to iterate over (i.e, you can use <code>for</code> <code>in</code>, or just <code>in</code>, on it), it's useful to know about the <code>yield</code> keyword, which is used in generators (as mentioned above). <pre class="prettyprint"><code>class MyIterable(): def __iter__(self): yield 1 m = MyIterable() for _ in m: print(_) # 1 1 in m # True </code></pre> The presence of <code>yield</code> turns a function or method into a generator instead of a regular function/method. You don't need the <code>__next__</code> method if you use a generator (it brings <code>__next__</code> along with it automatically). If you wish to create your own container object type (i.e, you can use <code>in</code> on it by itself, but NOT <code>for</code> <code>in</code>), you just need the <code>__contains__</code> method. <pre class="prettyprint"><code>class MyUselessContainer(): def __contains__(self, obj): return True m = MyUselessContainer() 1 in m # True 'Foo' in m # True TypeError in m # True None in m # True </code></pre> <hr> 1 Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the <code>__next__</code> and <code>__iter__</code> methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the <code>___next__</code> method is actually <code>next</code> (no underscores) in Python 2. 2 See this answer for the different ways to create iterable classes.

Python for loop and iterator behavior

Tags:

python

iterator

I wanted to understand a bit more about iterators, so please correct me if I'm wrong.

An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing.

However I still don't understand why is the following behavior happening:

In [1]: iter = (i for i in range(5))  In [2]: for _ in iter:    ....:     print _    ....:      0 1 2 3 4  In [3]: for _ in iter:    ....:     print _    ....:       In [4]:

After a first loop through the iterator (In [2]) it's as if it was consumed and left empty, so the second loop (In [3]) prints nothing.

However I never assigned a new value to the iter variable.

What is really happening under the hood of the for loop?

468

asked Apr 02 '15 01:04

Matteo

1 Answers

Your suspicion is correct: the iterator has been consumed.

In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.

type((i for i in range(5))) # says it's type generator   def another_generator():     yield 1 # the yield expression makes it a generator, not a function  type(another_generator()) # also a generator

The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:

def my_gen():     while True:         yield 1 # again: yield means it is a generator, not a function  for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!

Some other corrections to help improve your understanding:

The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
The keyword combination for in accepts an iterable object as its second argument.
The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a list, or dict, or a str object (string), or a user-defined type that provides the required functionality.
The iter function is applied to the object to get an iterator (by the way: don't use iter as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's __iter__ method is called (which is, for the most part, all the iter function does anyway; __iter__ is one of Python's so-called "magic methods").
If the call to __iter__ is successful, the function next() is applied to the iterable object over and over again, in a loop, and the first variable supplied to for in is assigned to the result of the next() function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method, which is another "magic method".
The for loop ends when next() raises the StopIteration exception (which usually happens when the iterable does not have another object to yield when next() is called).

You can "manually" implement a for loop in python this way (probably not perfect, but close enough):

try:     temp = iterable.__iter__() except AttributeError():     raise TypeError("'{}' object is not iterable".format(type(iterable).__name__)) else:     while True:         try:             _ = temp.__next__()         except StopIteration:             break         except AttributeError:             raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))         # this is the "body" of the for loop         continue

There is pretty much no difference between the above and your example code.

Actually, the more interesting part of a for loop is not the for, but the in. Using in by itself produces a different effect than for in, but it is very useful to understand what in does with its arguments, since for in implements very similar behavior.

When used by itself, the in keyword first calls the object's __contains__ method, which is yet another "magic method" (note that this step is skipped when using for in). Using in by itself on a container, you can do things like this:
```
1 in [1, 2, 3] # True 'He' in 'Hello' # True 3 in range(10) # True 'eH' in 'Hello'[::-1] # True 
```
If the iterable object is NOT a container (i.e. it doesn't have a __contains__ method), in next tries to call the object's __iter__ method. As was said previously: the __iter__ method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function next() on¹. A generator is just one type of iterator.
If the call to __iter__ is successful, the in keyword applies the function next() to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method).
If the object doesn't have a __iter__ method to return an iterator, in then falls back on the old-style iteration protocol using the object's __getitem__ method².
If all of the above attempts fail, you'll get a TypeError exception.

If you wish to create your own object type to iterate over (i.e, you can use for in, or just in, on it), it's useful to know about the yield keyword, which is used in generators (as mentioned above).

class MyIterable():     def __iter__(self):         yield 1  m = MyIterable() for _ in m: print(_) # 1 1 in m # True

The presence of yield turns a function or method into a generator instead of a regular function/method. You don't need the __next__ method if you use a generator (it brings __next__ along with it automatically).

If you wish to create your own container object type (i.e, you can use in on it by itself, but NOT for in), you just need the __contains__ method.

class MyUselessContainer():     def __contains__(self, obj):         return True  m = MyUselessContainer() 1 in m # True 'Foo' in m # True TypeError in m # True None in m # True

¹ Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__ and __iter__ methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__ method is actually next (no underscores) in Python 2.

² See this answer for the different ways to create iterable classes.

answered Sep 22 '22 21:09

Rick supports Monica

Related questions
                            
                                How do I invalidate @cached_property in django
                            
                                sort Python list with two keys but only one in reverse order
                            
                                Get intersecting rows across two 2D numpy arrays
                            
                                Flask and Werkzeug: Testing a post request with custom headers
                            
                                Searching for equivalent of FileNotFoundError in Python 2
                            
                                Parsing outlook .msg files with python
                            
                                What is "backlog" in TCP connections?
                            
                                How can I use "e" (Euler's number) and power operation in python 2.7
                            
                                How to read images into a script without using using imageio or scikit image?
                            
                                What are the implications of running python with the optimize flag?
                            
                                Why use Tornado and Flask together?
                            
                                Python, SQLAlchemy pass parameters in connection.execute
                            
                                Pylint: overriding max-line-length in individual file
                            
                                Calculate Matrix Rank using scipy
                            
                                Access the sole element of a set
                            
                                TypeError: 'list' object cannot be interpreted as an integer
                            
                                How can I get the name of an object?
                            
                                Python timedelta issue with negative values
                            
                                Testing file uploads in Flask
                            
                                Remove list from list in Python [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With