Is repeated dictionary access optimized in Python

Question

Consider the following Python code, that iterates over an array of words and counts them into the dictionary a['words']

a['words'] = {}
for word in words:
    if word not in a['words']:
        a['words'][word] = 0
    a['words'][word] += 1

The question is, whether the repeated access to a['words'] is optimized in Python, in a way that the reference of a['words'] is automatically saved somewhere until changed, OR should I write myself the "optimized" code, this way:

a['words'] = {}
words_dict = a['words']
for word in words:
    if word not in words_dict:
        words_dict[word] = 0
    words_dict[word] += 1

zipa · Accepted Answer

Good solution is collections.Counter as it is high-performance container:

from collections import Counter
words = ['aaa', 'bbb', 'ccc', 'ddd', 'aaa', 'bbb', 'eee']
a = {'words' : dict(Counter(words))}
a
#{'words': {'aaa': 2, 'bbb': 2, 'ccc': 1, 'ddd': 1, 'eee': 1}}

Jean-François Fabre · Answer

for word in words:
    if word not in words_dict:
        words_dict[word] = 0
    words_dict[word] += 1

performs up to 3 dict accesses per loop. Even if access is O(1), hashing is far from free, specially on string objects.

In that particular case collections.Counter is perfectly suited. For other cases (like creating a list or appending to it), collections.defaultdict is a good alternative, and it's faster. Fictive alternate example:

c = collections.defaultdict(list)
for i,word in enumerate(words):
    c[word].append(i)

there's also the dict.setdefault() solution, if you want to avoid collections module.

Is repeated dictionary access optimized in Python

Tags:

python

SomethingSomething

2 Answers

zipa

Jean-François Fabre

Recent Activity

Donate For Us

Is repeated dictionary access optimized in Python

Tags:

python

SomethingSomething

2 Answers

zipa

Jean-François Fabre

Related questions

Recent Activity

Donate For Us