Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Python manage a 'for' loop internally?

I'm trying to learn Python, and I started to play with some code:

a = [3,4,5,6,7]
for b in a:
    print(a)
    a.pop(0)

And the output is:

[3, 4, 5, 6, 7]
[4, 5, 6, 7]
[5, 6, 7]

I know that's not a good practice change data structures while I'm looping on it, but I want to understand how Python manage the iterators in this case.

The principal question is: How does it know that it has to finish the loop if I'm changing the state of a?

like image 290
Pau Trepat Avatar asked Apr 04 '17 11:04

Pau Trepat


People also ask

How does for loop internally work in Python?

To carry out the iteration this for loop describes, Python does the following: Calls iter() to obtain an iterator for l. Calls next() repeatedly to obtain each item from the iterator in turn. Terminates the loop when next() raises the StopIteration exception.

HOW IS for loop executed in Python?

for loops are used when you have a block of code which you want to repeat a fixed number of times. The for-loop is always used in combination with an iterable object, like a list or a range. The Python for statement iterates over the members of a sequence in order, executing the block each time.

How do you control a loop in Python?

In Python, Loops are used to iterate repeatedly over a block of code. In order to change the way a loop is executed from its usual behavior, control statements are used. Control statements are used to control the flow of the execution of the loop based on a condition.

How does foreach loop work internally?

The 'foreach' loop works with arrays only, with the advantage that a loop counter wouldn't need to be initialized. In addition to this, no condition needs to be set that would be needed to exit out of the loop. The 'foreach' loop implicitly does this too.


2 Answers

kjaquier and Felix have talked about the iterator protocol, and we can see it in action in your case:

>>> L = [1, 2, 3]
>>> iterator = iter(L)
>>> iterator
<list_iterator object at 0x101231f28>
>>> next(iterator)
1
>>> L.pop()
3
>>> L
[1, 2]
>>> next(iterator)
2
>>> next(iterator)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
StopIteration

From this we can infer that list_iterator.__next__ has code that behaves something like:

if self.i < len(self.list):
    return self.list[i]
raise StopIteration

It does not naively get the item. That would raise an IndexError which would bubble to the top:

class FakeList(object):
    def __iter__(self):
        return self

    def __next__(self):
        raise IndexError

for i in FakeList():  # Raises `IndexError` immediately with a traceback and all
    print(i)

Indeed, looking at listiter_next in the CPython source (thanks Brian Rodriguez):

if (it->it_index < PyList_GET_SIZE(seq)) {
    item = PyList_GET_ITEM(seq, it->it_index);
    ++it->it_index;
    Py_INCREF(item);
    return item;
}

Py_DECREF(seq);
it->it_seq = NULL;
return NULL;

Although I don't know how return NULL; eventually translates into a StopIteration.

like image 42
Alex Hall Avatar answered Sep 30 '22 07:09

Alex Hall


The reason why you shouldn't do that is precisely so you don't have to rely on how the iteration is implemented.

But back to the question. Lists in Python are array lists. They represent a continuous chunk of allocated memory, as opposed to linked lists in which each element in allocated independently. Thus, Python's lists, like arrays in C, are optimized for random access. In other words, the most efficient way to get from element n to element n+1 is by accessing to the element n+1 directly (by calling mylist.__getitem__(n+1) or mylist[n+1]).

So, the implementation of __next__ (the method called on each iteration) for lists is just like you would expect: the index of the current element is first set at 0 and then increased after each iteration.

In your code, if you also print b, you will see that happening:

a = [3,4,5,6,7]
for b in a:
    print a, b
    a.pop(0)

Result :

[3, 4, 5, 6, 7] 3
[4, 5, 6, 7] 5
[5, 6, 7] 7

Because :

  • At iteration 0, a[0] == 3.
  • At iteration 1, a[1] == 5.
  • At iteration 2, a[2] == 7.
  • At iteration 3, the loop is over (len(a) < 3)
like image 183
kjaquier Avatar answered Sep 30 '22 06:09

kjaquier