Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python key in dict.keys() performance for large dictionaries

Tags:

python

I was wondering if you guys might be able to give me some advice in regards to making the performance of my code much better.

I have a set of for loops which look to see if a key is in a dictionary of which its values are a list, if the key exists, it appends to the list and if it doesnt it adds a new list in for that key

dict={}
for value in value_list:
   if value.key in dict.keys():
      temp_list = dict[value.key]
      temp_list.append(value.val)
      dict[value.key] = temp_list
   else:
      dict[value.key] = [value.val]

Now this code works fine, but evenrually as the dictionary starts to fill up the line value.key in dict.keys() becomes more and more cumbersome.

Is there a better way of doing this?

Thanks,

Mike

like image 734
Werda Avatar asked Jan 19 '11 01:01

Werda


2 Answers

Don't do this:

value.key in dict.keys()

That--in Python 2, at least--creates a list containing every key. That gets more and more expensive as the dictionary gets larger, and performs an O(n) search on the list to find the key, which defeats the purpose of using a dict.

Instead, just do:

value.key in dict

which doesn't create a temporary list, and does a hash table lookup for the key rather than a linear search.

setdefault, as mentioned elsewhere, is the cleaner way to do this, but it's very important to understand the above.

like image 128
Glenn Maynard Avatar answered Oct 07 '22 19:10

Glenn Maynard


Using collections.defaultdict, this can be simplified to

d = collections.defaultdict(list)
for value in value_list:
    d[value.key].append(value.val)
like image 39
Sven Marnach Avatar answered Oct 07 '22 17:10

Sven Marnach