I am trying to optimize the performance of a script that looks up similar words in a lexicon for each word given.
Each unique word is to be split into letter n-grams and for each n-gram, the lexicon returns a list of words that contain the same letter n-gram. Each word from this list is then added to a dictionary as a key and it's value is incremented by one. This gives me a dictionary of similar words with corresponding frequency scores.
word_dict = {}
get = word_dict.get
for letter_n_gram in word:
for entry in lexicon[n_gram]:
word_dict[entry] = get(entry, 0) + 1
This implementation works, but the script could supposedly run faster by switching the dict
for collections.defaultdict
.
word_dd = defaultdict(int)
for letter_n_gram in word:
for entry in lexicon[n_gram]:
word_dd[entry] += 1
No other code has been changed.
I was under the impression that both code snippets (most importantly the score adding) should work in the exact same way, i.e. if the key exists, increase its value by 1, if it does not exist, create the key and set the value to 1.
After running the new code, however, some of the keys had values of 0, which I find logically impossible.
Is my logic or knowledge of defaultdict
functionality flawed? If not, how can any value in word_dd
be set to 0?
edit: I am also very sure that no other part of the script skews these results, as I test the dictionary immediately after shown code by using:
for item in word_dd.iteritems():
if item[1] == 0:
print "Found zero value element"
break
When you access a key in a defaultdict
, if it is not there, it will be created automatically. Since we have int
as the default factory function, it creates the key and gives the default value 0.
from collections import defaultdict
d = defaultdict(int)
print d["a"]
# 0
print d
# defaultdict(<type 'int'>, {'a': 0})
So, before accessing the key, you should make sure that it exists in the defaultdict
instance, like this
print "a" in d
# False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With