I'm trying to learn Python, and I started to play with some code: <pre class="prettyprint lang-py prettyprint-override"><code>a = [3,4,5,6,7] for b in a: print(a) a.pop(0) </code></pre> And the output is: <pre class="prettyprint"><code>[3, 4, 5, 6, 7] [4, 5, 6, 7] [5, 6, 7] </code></pre> I know that's not a good practice change data structures while I'm looping on it, but I want to understand how Python manage the iterators in this case. The principal question is: How does it know that it has to finish the loop if I'm changing the state of <code>a</code>?

kjaquier and Felix have talked about the iterator protocol, and we can see it in action in your case: <pre class="prettyprint"><code>>>> L = [1, 2, 3] >>> iterator = iter(L) >>> iterator <list_iterator object at 0x101231f28> >>> next(iterator) 1 >>> L.pop() 3 >>> L [1, 2] >>> next(iterator) 2 >>> next(iterator) Traceback (most recent call last): File "<input>", line 1, in <module> StopIteration </code></pre> From this we can infer that <code>list_iterator.__next__</code> has code that behaves something like: <pre class="prettyprint"><code>if self.i < len(self.list): return self.list[i] raise StopIteration </code></pre> It does not naively get the item. That would raise an <code>IndexError</code> which would bubble to the top: <pre class="prettyprint"><code>class FakeList(object): def __iter__(self): return self def __next__(self): raise IndexError for i in FakeList(): # Raises `IndexError` immediately with a traceback and all print(i) </code></pre> Indeed, looking at <code>listiter_next</code> in the CPython source (thanks Brian Rodriguez): <pre class="prettyprint"><code>if (it->it_index < PyList_GET_SIZE(seq)) { item = PyList_GET_ITEM(seq, it->it_index); ++it->it_index; Py_INCREF(item); return item; } Py_DECREF(seq); it->it_seq = NULL; return NULL; </code></pre> Although I don't know how <code>return NULL;</code> eventually translates into a <code>StopIteration</code>.

The reason why you shouldn't do that is precisely so you don't have to rely on how the iteration is implemented. But back to the question. Lists in Python are array lists. They represent a continuous chunk of allocated memory, as opposed to linked lists in which each element in allocated independently. Thus, Python's lists, like arrays in C, are optimized for random access. In other words, the most efficient way to get from element n to element n+1 is by accessing to the element n+1 directly (by calling <code>mylist.__getitem__(n+1)</code> or <code>mylist[n+1]</code>). So, the implementation of <code>__next__</code> (the method called on each iteration) for lists is just like you would expect: the index of the current element is first set at 0 and then increased after each iteration. In your code, if you also print <code>b</code>, you will see that happening: <pre class="prettyprint"><code>a = [3,4,5,6,7] for b in a: print a, b a.pop(0) </code></pre> Result : <pre class="prettyprint"><code>[3, 4, 5, 6, 7] 3 [4, 5, 6, 7] 5 [5, 6, 7] 7 </code></pre> Because : <ul> <li>At iteration 0, <code>a[0] == 3</code>.</li> <li>At iteration 1, <code>a[1] == 5</code>.</li> <li>At iteration 2, <code>a[2] == 7</code>.</li> <li>At iteration 3, the loop is over (<code>len(a) < 3</code>)</li> </ul>

How does Python manage a 'for' loop internally?

I'm trying to learn Python, and I started to play with some code:

a = [3,4,5,6,7]
for b in a:
    print(a)
    a.pop(0)

And the output is:

[3, 4, 5, 6, 7]
[4, 5, 6, 7]
[5, 6, 7]

I know that's not a good practice change data structures while I'm looping on it, but I want to understand how Python manage the iterators in this case.

The principal question is: How does it know that it has to finish the loop if I'm changing the state of a?

How does for loop internally work in Python?

To carry out the iteration this for loop describes, Python does the following: Calls iter() to obtain an iterator for l. Calls next() repeatedly to obtain each item from the iterator in turn. Terminates the loop when next() raises the StopIteration exception.

HOW IS for loop executed in Python?

for loops are used when you have a block of code which you want to repeat a fixed number of times. The for-loop is always used in combination with an iterable object, like a list or a range. The Python for statement iterates over the members of a sequence in order, executing the block each time.

How do you control a loop in Python?

In Python, Loops are used to iterate repeatedly over a block of code. In order to change the way a loop is executed from its usual behavior, control statements are used. Control statements are used to control the flow of the execution of the loop based on a condition.

How does foreach loop work internally?

The 'foreach' loop works with arrays only, with the advantage that a loop counter wouldn't need to be initialized. In addition to this, no condition needs to be set that would be needed to exit out of the loop. The 'foreach' loop implicitly does this too.

kjaquier and Felix have talked about the iterator protocol, and we can see it in action in your case:

>>> L = [1, 2, 3]
>>> iterator = iter(L)
>>> iterator
<list_iterator object at 0x101231f28>
>>> next(iterator)
1
>>> L.pop()
3
>>> L
[1, 2]
>>> next(iterator)
2
>>> next(iterator)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
StopIteration

From this we can infer that list_iterator.__next__ has code that behaves something like:

if self.i < len(self.list):
    return self.list[i]
raise StopIteration

It does not naively get the item. That would raise an IndexError which would bubble to the top:

class FakeList(object):
    def __iter__(self):
        return self

    def __next__(self):
        raise IndexError

for i in FakeList():  # Raises `IndexError` immediately with a traceback and all
    print(i)

Indeed, looking at listiter_next in the CPython source (thanks Brian Rodriguez):

if (it->it_index < PyList_GET_SIZE(seq)) {
    item = PyList_GET_ITEM(seq, it->it_index);
    ++it->it_index;
    Py_INCREF(item);
    return item;
}

Py_DECREF(seq);
it->it_seq = NULL;
return NULL;

Although I don't know how return NULL; eventually translates into a StopIteration.

The reason why you shouldn't do that is precisely so you don't have to rely on how the iteration is implemented.

But back to the question. Lists in Python are array lists. They represent a continuous chunk of allocated memory, as opposed to linked lists in which each element in allocated independently. Thus, Python's lists, like arrays in C, are optimized for random access. In other words, the most efficient way to get from element n to element n+1 is by accessing to the element n+1 directly (by calling mylist.__getitem__(n+1) or mylist[n+1]).

So, the implementation of __next__ (the method called on each iteration) for lists is just like you would expect: the index of the current element is first set at 0 and then increased after each iteration.

In your code, if you also print b, you will see that happening:

a = [3,4,5,6,7]
for b in a:
    print a, b
    a.pop(0)

Result :

[3, 4, 5, 6, 7] 3
[4, 5, 6, 7] 5
[5, 6, 7] 7

Because :

At iteration 0, a[0] == 3.
At iteration 1, a[1] == 5.
At iteration 2, a[2] == 7.
At iteration 3, the loop is over (len(a) < 3)

How does Python manage a 'for' loop internally?

Tags:

python

for-loop

data-structures

Pau Trepat

People also ask

2 Answers

Alex Hall

kjaquier

Recent Activity

Donate For Us

How does Python manage a 'for' loop internally?

Tags:

python

for-loop

data-structures

Pau Trepat

People also ask

2 Answers

Alex Hall

kjaquier

Related questions

Recent Activity

Donate For Us