Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient Dictionary Searching?

Tags:

I had a quick question about efficiency of searching through large dictionaries in Python. I am reading a large comma-separated file and getting a key and value from each line. If my key is already in the dictionary, I'm adding the value to the value listed in the dictionary, if the key doesn't exist in the dictionary, I simply add the value. Previously I was using this:

if key in data_dict.keys():     add values else:     data_dict[key] = value 

This starts pretty fast, but as the dictionary grows it becomes slower and slower, to the point where I can't use it at all. I changed the way I search for the key in the dictionary to this:

try:     # This will fail if key not present     data_dict[keyStr] = input_data[keyStr] + load_val except:     data_dict[keyStr] = load_val 

This is infinitely faster, and can read / write over 350,000 lines of code in 3 seconds.

My question was why does the if key in data_dict.keys(): command take sooo much longer than the call to try: data_dict[keyStr]? And why wouldn't Python utilize the try statement when searching for a key in a dictionary?

like image 964
Brumdog22 Avatar asked Sep 30 '13 21:09

Brumdog22


People also ask

Is it fast to search for a key in a dictionary?

The two times above for 100 and 10000000 are almost the same for a dictionary, which is because a dictionary can almost instantly jump to the key it is asked for thanks to the lookups. Lookups are faster in dictionaries because Python implements them using hash tables.

Are lookups faster with dictionaries or lists in Python?

The list is an ordered collection of data, whereas the dictionaries store the data in the form of key-value pairs using the hashtable structure. Due to this, fetching the elements from the list data structure is quite complex compared to dictionaries in Python. Therefore, the dictionary is faster than a list in Python.

Why are dictionaries faster than lists?

The reason is because a dictionary is a lookup, while a list is an iteration. Dictionary uses a hash lookup, while your list requires walking through the list until it finds the result from beginning to the result each time.

How fast is a dictionary lookup?

Speed. Lookups in lists are O(n), lookups in dictionaries are amortized O(1), with regard to the number of items in the data structure. If you don't need to associate values, use sets.


2 Answers

The problem is that for every test you're generating a new list of keys with .keys(). As the list of keys gets longer, the time required goes up. Also as noted by dckrooney, the search for the key becomes linear instead of taking advantage of the hash-table structure of the dictionary.

Replace with:

if key in data_dict: 
like image 168
Mark Ransom Avatar answered Sep 25 '22 20:09

Mark Ransom


data_dict.keys() returns an unsorted list of keys in the dictionary. Thus each time you check to see if a given key is in the dictionary, you're doing a linear search across the list of keys (an O(n) operation). The longer your list, the longer it takes to search for a given key.

Contrast this to data_dict[keyStr]. This performs a hash lookup, which is an O(1) operation. It doesn't (directly) depend on the number of keys in the dictionary; even as you add more keys, the time to check if a given key is in the dictionary stays constant.

like image 21
dckrooney Avatar answered Sep 21 '22 20:09

dckrooney