Efficient keys for dictionaries

Question

I'm new to posting here, so I hope I provide enough detail. I'm trying to find out if key selection effects the efficiency of dictionaries in Python. Some comparisons I'm thinking of are:

numbers vs strings (e.g. would my_dict[20] be faster than my_dict['twenty'])
len of strings (e.g. would my_dict['a'] be faster than my_dict['abcdefg'])
mixing key types within a dictionary, for instance using numbers, strings, and/or tuples (e.g. would my_dict = {0: 'zero', 2: 'two'} perform faster than {0: 'zero', 'two': 2})

I haven't been able to find this topic from a google search, so I thought maybe someone here might know.

BPL · Accepted Answer

First of all, I'd recommend you to understand How are Python's Built In Dictionaries Implemented.

Now, let's make a little random experiment to prove the theory (at least partially):

import timeit
import string
import random
import time


def random_str(N):
    return ''.join(
        random.choice(string.ascii_uppercase + string.digits) for _ in range(N)
    )


def experiment1(dct, keys):
    s = time.time()
    {dct[k] for k in keys}
    return time.time() - s


if __name__ == "__main__":
    N = 10000
    K = 200
    S = 50000
    dct1 = {random_str(K): None for k in range(N)}
    dct2 = {i: None for i in range(N)}
    keys1 = list(dct1.keys())
    keys2 = list(dct2.keys())

    samples1 = []
    samples2 = []
    for i in range(S):
        samples1.append(experiment1(dct1, keys1))
        samples2.append(experiment1(dct2, keys2))

    print(sum(samples1), sum(samples2))

    # print(
    #     timeit.timeit('{dct1[k] for k in keys1}', setup='from __main__ import dct1, keys1')
    # )

    # print(
    #     timeit.timeit('{dct2[k] for k in keys2}', setup='from __main__ import dct2, keys2')
    # )

The results I've got with different sampling sizes on my box were:

N=10000, K=200, S=100 => 0.08300423622131348 0.07200479507446289
N=10000, K=200, S=1000 => 0.885051965713501 0.7120392322540283
N=10000, K=200, S=10000 => 8.88549256324768 7.005417346954346
N=10000, K=200, S=50000 => 43.57453536987305 34.82594871520996

So as you can see, no matter whether you use big random strings to lookup dictionaries or integers, the performance will remain almost the same. The only "real" difference you'd like to consider would be in terms of memory consumption of both dictionaries. That may be relevant when dumping/loading huge dictionaries to/from disk, in that case it may be worth to have compact form of your data structures so you'll be able to shave off few seconds when caching/reading them.

NS: If anyone is able to explain why I was getting really huge times when using timeit (commented parts) please let me... with little experiment constants I'd get really high values... that's why I left it uncommented. Add a comment if you know the reason ;D

Efficient keys for dictionaries

Tags:

python

dictionary

key

processing-efficiency

memory-efficient

ThatNewGuy

1 Answers

BPL

Recent Activity

Donate For Us

Efficient keys for dictionaries

Tags:

python

dictionary

key

processing-efficiency

memory-efficient

ThatNewGuy

1 Answers

BPL

Related questions

Recent Activity

Donate For Us