I have a dictionary whose keys come in sets that share the same prefix, like this:
d = { "key1":"valA", "key123":"valB", "key1XY":"valC",
"key2":"valD", "key2-22":"valE" }
Given a query string, I need to look up all the values associated with keys that start with that prefix, e.g. for query="key1"
I need to get ["valA", "valB", "valC"]
My implementation below works but is too slow for a large number of queries since the dictionary d
has about 30,000 keys and most of the keys are more than 20 characters long:
result = [d[s] for s in d.keys() if s.startswith(query)]
Is there a faster/more efficient way to implement this?
Duplicate keys are not allowed. A dictionary maps each key to a corresponding value, so it doesn't make sense to map a particular key more than once.
No, each key in a dictionary should be unique. You can't have two keys with the same value. Attempting to use the same key again will just overwrite the previous value stored. If a key needs to store multiple values, then the value associated with the key should be a list or another dictionary.
The compare method cmp() is used in Python to compare values and keys of two dictionaries. If method returns 0 if both dictionaries are equal, 1 if dic1 > dict2 and -1 if dict1 < dict2.
If we try to access the value of key that does not exist in the dictionary, then it will raise KeyError. This can also be a way to check if exist in dict or not i.e. Here it confirms that the key 'test' exist in the dictionary.
You can avoid producing the intermediate list generated by dict.keys()
(in python 2.x):
result = [d[key] for key in d if key.startswith(query)]
But you most likely want to use a trie instead of a dictionary, so you can find all the values associated with a key with a common prefix (a trie is similar to a tree based on prefixes).
Here you can find some different implementation of tries.
A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn". (source wikipedia)
Let's compare the timings for the different solutions:
# create a dictionary with 30k entries
d = {str(x):str(x) for x in xrange(1, 30001)}
query = '108'
# dict with keys()
%timeit [d[s] for s in d.keys() if s.startswith(query)]
100 loops, best of 3: 8.87 ms per loop
# dict without keys()
%timeit [d[s] for s in d if s.startswith(query)]
100 loops, best of 3: 7.83 ms per loop
# 11.72% improvement
# PyTrie (https://pypi.python.org/pypi/PyTrie/0.2)
import pytrie
pt = pytrie.Trie(d)
%timeit [pt[s] for s in pt.iterkeys(query)]
1000 loops, best of 3: 320 µs per loop
# 96.36% improvement
# datrie (https://pypi.python.org/pypi/datrie/0.7)
import datrie
dt = datrie.Trie('0123456789')
for key, val in d.iteritems():
dt[unicode(key)] = val
%timeit [dt[s] for s in dt.keys(unicode(query))]
10000 loops, best of 3: 162 µs per loop
# 98.17% improvement
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With