While answering a particular question here in SO I stumbled upon a peculiar issue which I couldn't explain. Unfortunately the first two Google Search page returned one SO Page which was also not helpful.
The Problem Code
>>> somedata=[random.randint(1,1000) for i in xrange(1,10000)]
>>> somehash=collections.defaultdict(int)
>>> for d in somedata:
somehash[d]+=1
>>> maxkey=0
>>> for k,v in somehash.iteritems():
if somehash[maxkey] > v:
maxkey=k
Traceback (most recent call last):
File "<pyshell#700>", line 1, in <module>
for k,v in somehash.iteritems():
RuntimeError: dictionary changed size during iteration
>>> for k,v in somehash.iteritems():
if somehash[maxkey] > v:
maxkey=k
>>>
And due to some odd reason, the first time I am iterating over the dictionary, Python is creating tantrums but the subsequent executions are fine as you can see in the example, the first time I iterated over the dictionary, it gave the Run Time Error but the next Time it didn't complain.
Any Idea what might be going wrong?
Just in case if required
>>> sys.version_info
sys.version_info(major=2, minor=7, micro=0, releaselevel='final', serial=0)
>>> sys.version
'2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]'
OS: Microsoft Windows [Version 6.1.7601] (Windows 7)
Adding or deleting items of a dictionary while iterating over it is an error. Since somehash
is a defaultdict
, even what seems like a read-only access in the line
if somehash[maxkey] > k:
might add a new key -- resulting in the error you encountered.
As Sven explained, the error you're running into is due to the way defaultdict
works. When performing a lookup in a defaultdict
, if the key doesn't already exist a default value is retrieved (hence the name) and the key is added to the dictionary (with the default value). This is the source of your RuntimeError
.
You could do the following to avoid this problem:
for k, v in somehash.items():
if somehash[maxkey] > v:
maxkey = k
The main difference being that somehash.items()
returns a list of (key, value) tuples, so you are actually iterating over that list and not somehash
itself. The same goes for .keys()
vs .iterkeys()
.
You are generating 9999 (somewhat) random integers between 1 and 1000, stored in somedata
, which is used as a key for somehash
, to store the occurrence of the numbers in somedata.
As maxkey=0
, that key will never exist. As you are using a defaultdict, when each key is encountered for the first time, an entry is automatically created using the default_factory
function which returns an empty list, thus correctly throwing an error during iteration, in a manner that * FastTurtle* already pointed out.
Use the get
method to retrieve an item safely.
import random
import collections
somedata=[random.randint(1,1000) for i in xrange(1,10000)]
somehash=collections.defaultdict(int)
for d in somedata:
somehash[d]+=1
maxkey=0
for k,v in somehash.iteritems():
if somehash.get(maxkey) > v:
maxkey=k
print k,v
I see you are using Python 2.7, which has a new collection class called Counter
, for counting hashable objects. Using Counter
should be faster than the code above, and reduces your code to:
somedata=[random.randint(1,1000) for i in xrange(1,10000)]
somehash=collections.Counter(somedata)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With