Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RuntimeError: dictionary changed size during iteration - During Iteration with iteritems on a defaultdict

While answering a particular question here in SO I stumbled upon a peculiar issue which I couldn't explain. Unfortunately the first two Google Search page returned one SO Page which was also not helpful.

The Problem Code

>>> somedata=[random.randint(1,1000) for i in xrange(1,10000)]
>>> somehash=collections.defaultdict(int)
>>> for d in somedata:
    somehash[d]+=1      
>>> maxkey=0
>>> for k,v in somehash.iteritems():
    if somehash[maxkey] > v:
        maxkey=k            

Traceback (most recent call last):
  File "<pyshell#700>", line 1, in <module>
    for k,v in somehash.iteritems():
RuntimeError: dictionary changed size during iteration
>>> for k,v in somehash.iteritems():
    if somehash[maxkey] > v:
        maxkey=k
>>>

And due to some odd reason, the first time I am iterating over the dictionary, Python is creating tantrums but the subsequent executions are fine as you can see in the example, the first time I iterated over the dictionary, it gave the Run Time Error but the next Time it didn't complain.

Any Idea what might be going wrong?

Just in case if required

>>> sys.version_info
sys.version_info(major=2, minor=7, micro=0, releaselevel='final', serial=0)
>>> sys.version
'2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]'

OS: Microsoft Windows [Version 6.1.7601] (Windows 7)
like image 250
Abhijit Avatar asked Jan 06 '12 18:01

Abhijit


3 Answers

Adding or deleting items of a dictionary while iterating over it is an error. Since somehash is a defaultdict, even what seems like a read-only access in the line

if somehash[maxkey] > k:

might add a new key -- resulting in the error you encountered.

like image 196
Sven Marnach Avatar answered Nov 03 '22 07:11

Sven Marnach


As Sven explained, the error you're running into is due to the way defaultdict works. When performing a lookup in a defaultdict, if the key doesn't already exist a default value is retrieved (hence the name) and the key is added to the dictionary (with the default value). This is the source of your RuntimeError.

You could do the following to avoid this problem:

for k, v in somehash.items():
    if somehash[maxkey] > v:
        maxkey = k

The main difference being that somehash.items() returns a list of (key, value) tuples, so you are actually iterating over that list and not somehash itself. The same goes for .keys() vs .iterkeys().

like image 27
FastTurtle Avatar answered Nov 03 '22 06:11

FastTurtle


You are generating 9999 (somewhat) random integers between 1 and 1000, stored in somedata, which is used as a key for somehash, to store the occurrence of the numbers in somedata.

As maxkey=0, that key will never exist. As you are using a defaultdict, when each key is encountered for the first time, an entry is automatically created using the default_factory function which returns an empty list, thus correctly throwing an error during iteration, in a manner that * FastTurtle* already pointed out.

Use the get method to retrieve an item safely.

import random
import collections

somedata=[random.randint(1,1000) for i in xrange(1,10000)]
somehash=collections.defaultdict(int)
for d in somedata:
   somehash[d]+=1 

maxkey=0
for k,v in somehash.iteritems():
   if somehash.get(maxkey) > v:
       maxkey=k 
       print k,v  

I see you are using Python 2.7, which has a new collection class called Counter , for counting hashable objects. Using Counter should be faster than the code above, and reduces your code to:

somedata=[random.randint(1,1000) for i in xrange(1,10000)]
somehash=collections.Counter(somedata)
like image 1
Lorenz Lo Sauer Avatar answered Nov 03 '22 06:11

Lorenz Lo Sauer