Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More efficient way to look up dictionary values whose keys start with same prefix

I have a dictionary whose keys come in sets that share the same prefix, like this:

d = { "key1":"valA", "key123":"valB", "key1XY":"valC",
      "key2":"valD", "key2-22":"valE" }

Given a query string, I need to look up all the values associated with keys that start with that prefix, e.g. for query="key1" I need to get ["valA", "valB", "valC"]

My implementation below works but is too slow for a large number of queries since the dictionary d has about 30,000 keys and most of the keys are more than 20 characters long:

result = [d[s] for s in d.keys() if s.startswith(query)]

Is there a faster/more efficient way to implement this?

like image 873
I Z Avatar asked Aug 05 '15 19:08

I Z


People also ask

Can dictionary have same keys with different values?

Duplicate keys are not allowed. A dictionary maps each key to a corresponding value, so it doesn't make sense to map a particular key more than once.

Can a dictionary have two keys with the same value two values with the same key?

No, each key in a dictionary should be unique. You can't have two keys with the same value. Attempting to use the same key again will just overwrite the previous value stored. If a key needs to store multiple values, then the value associated with the key should be a list or another dictionary.

How do you compare two keys in a dictionary?

The compare method cmp() is used in Python to compare values and keys of two dictionaries. If method returns 0 if both dictionaries are equal, 1 if dic1 > dict2 and -1 if dict1 < dict2.

What happens if you look for a key-value pair in a dictionary and that key-value does not exist Python?

If we try to access the value of key that does not exist in the dictionary, then it will raise KeyError. This can also be a way to check if exist in dict or not i.e. Here it confirms that the key 'test' exist in the dictionary.


1 Answers

You can avoid producing the intermediate list generated by dict.keys() (in python 2.x):

result = [d[key] for key in d if key.startswith(query)]

But you most likely want to use a trie instead of a dictionary, so you can find all the values associated with a key with a common prefix (a trie is similar to a tree based on prefixes).

Here you can find some different implementation of tries.

A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn".

A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn". (source wikipedia)


Let's compare the timings for the different solutions:

# create a dictionary with 30k entries
d = {str(x):str(x) for x in xrange(1, 30001)}
query = '108'

# dict with keys()
%timeit [d[s] for s in d.keys() if s.startswith(query)]

    100 loops, best of 3: 8.87 ms per loop

# dict without keys()
%timeit [d[s] for s in d if s.startswith(query)]

    100 loops, best of 3: 7.83 ms per loop

# 11.72% improvement

# PyTrie (https://pypi.python.org/pypi/PyTrie/0.2)
import pytrie
pt = pytrie.Trie(d)

%timeit [pt[s] for s in pt.iterkeys(query)]

    1000 loops, best of 3: 320 µs per loop

# 96.36% improvement

# datrie (https://pypi.python.org/pypi/datrie/0.7)
import datrie
dt = datrie.Trie('0123456789')
for key, val in d.iteritems():
    dt[unicode(key)] = val

%timeit [dt[s] for s in dt.keys(unicode(query))]

    10000 loops, best of 3: 162 µs per loop

# 98.17% improvement
like image 175
enrico.bacis Avatar answered Nov 09 '22 05:11

enrico.bacis