Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is repeated dictionary access optimized in Python

Tags:

python

Consider the following Python code, that iterates over an array of words and counts them into the dictionary a['words']

a['words'] = {}
for word in words:
    if word not in a['words']:
        a['words'][word] = 0
    a['words'][word] += 1

The question is, whether the repeated access to a['words'] is optimized in Python, in a way that the reference of a['words'] is automatically saved somewhere until changed, OR should I write myself the "optimized" code, this way:

a['words'] = {}
words_dict = a['words']
for word in words:
    if word not in words_dict:
        words_dict[word] = 0
    words_dict[word] += 1
like image 428
SomethingSomething Avatar asked Nov 18 '25 20:11

SomethingSomething


2 Answers

Good solution is collections.Counter as it is high-performance container:

from collections import Counter
words = ['aaa', 'bbb', 'ccc', 'ddd', 'aaa', 'bbb', 'eee']
a = {'words' : dict(Counter(words))}
a
#{'words': {'aaa': 2, 'bbb': 2, 'ccc': 1, 'ddd': 1, 'eee': 1}}
like image 185
zipa Avatar answered Nov 21 '25 10:11

zipa


for word in words:
    if word not in words_dict:
        words_dict[word] = 0
    words_dict[word] += 1

performs up to 3 dict accesses per loop. Even if access is O(1), hashing is far from free, specially on string objects.

In that particular case collections.Counter is perfectly suited. For other cases (like creating a list or appending to it), collections.defaultdict is a good alternative, and it's faster. Fictive alternate example:

c = collections.defaultdict(list)
for i,word in enumerate(words):
    c[word].append(i)

there's also the dict.setdefault() solution, if you want to avoid collections module.

like image 34
Jean-François Fabre Avatar answered Nov 21 '25 08:11

Jean-François Fabre



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!