Consider the following Python code, that iterates over an array of words and counts them into the dictionary a['words']
a['words'] = {}
for word in words:
if word not in a['words']:
a['words'][word] = 0
a['words'][word] += 1
The question is, whether the repeated access to a['words'] is optimized in Python, in a way that the reference of a['words'] is automatically saved somewhere until changed, OR should I write myself the "optimized" code, this way:
a['words'] = {}
words_dict = a['words']
for word in words:
if word not in words_dict:
words_dict[word] = 0
words_dict[word] += 1
Good solution is collections.Counter as it is high-performance container:
from collections import Counter
words = ['aaa', 'bbb', 'ccc', 'ddd', 'aaa', 'bbb', 'eee']
a = {'words' : dict(Counter(words))}
a
#{'words': {'aaa': 2, 'bbb': 2, 'ccc': 1, 'ddd': 1, 'eee': 1}}
for word in words:
if word not in words_dict:
words_dict[word] = 0
words_dict[word] += 1
performs up to 3 dict accesses per loop. Even if access is O(1), hashing is far from free, specially on string objects.
In that particular case collections.Counter is perfectly suited. For other cases (like creating a list or appending to it), collections.defaultdict is a good alternative, and it's faster. Fictive alternate example:
c = collections.defaultdict(list)
for i,word in enumerate(words):
c[word].append(i)
there's also the dict.setdefault() solution, if you want to avoid collections module.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With