Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modifying a dict during iteration

What's going on below

>>> d = {0:0}
>>> for i in d:
...     del d[i]
...     d[i+1] = 0
...     print(i)
...     
0
1
2
3
4
5
6
7
>>> 

Why does the iteration stop at 8 without any error?

Reproducible on both python2.7 and python 3.5.

like image 898
wim Avatar asked Jun 06 '16 21:06

wim


2 Answers

The initial size of the key table in a dict is 8 elements. So 0...7 sets the 1st to 8th element and 8 sets the 1st element again, ending the loop.

Source: Objects/dictobject.c

/* PyDict_MINSIZE_COMBINED is the starting size for any new, non-split dict. 8 allows dicts with no more than 5 active entries; experiments suggested this suffices for the majority of dicts (consisting mostly of usually-small dicts created to pass keyword arguments). Making this 8, rather than 4 reduces the number of resizes for most dictionaries, without any significant extra memory use. */

#define PyDict_MINSIZE_COMBINED 8

like image 62
Daniel Avatar answered Oct 19 '22 07:10

Daniel


This behavior originates from the key look up algorithm in cpython static PyDictKeyEntry * lookdict(...) as written in the document:

The basic lookup function used by all operations. This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. ... The initial probe index is computed as hash mod the table size (which initially equals to 8).

At the beginning of every for loop, the dict_next function is called internally to resolve the address of the next element. The core of this function reads:

value_ptr = &mp->ma_keys->dk_entries[i].me_value;
mask = DK_MASK(mp->ma_keys); // size of the array which stores the key values (ma_keys)
while (i <= mask && *value_ptr == NULL) { // skip NULL elements ahead 
    value_ptr = (PyObject **)(((char *)value_ptr) + offset);
    i++;
}
if (i > mask)
    return -1; // raise StopIteration 

where i is the index of the C array which actually stores the values. As written above, the initial index of a key is calculated from hash(key)%table_size. The other element in the array is all set to NULL since the dict contains only one element in your test case.

Given the fact that hash(i)==i if i is an int, the memory layout of the dict in your example will be:

1st iter: [0,   NULL,NULL,NULL,NULL,NULL,NULL,NULL]; i=0
2nd iter: [NULL,1   ,NULL,NULL,NULL,NULL,NULL,NULL]; i=1
...
8th iter: [NULL,NULL,NULL,NULL,NULL,NULL,NULL,7   ]; i=7

A more interesting test case would be:

def f(d):
  for i in d:
    del d[i]
    print hash(i)%8
    d[str(hash(i))]=0
f({0:0})       # outputs 0,1,6
f({'hello':0}) # outputs 5,7
f({'world':0}) # outputs 1

To conclude, the exiting condition of such loop is

hash(new_key)%8<=hash(old_key)%8
like image 27
gdlmx Avatar answered Oct 19 '22 05:10

gdlmx