I'm new to posting here, so I hope I provide enough detail. I'm trying to find out if key selection effects the efficiency of dictionaries in Python. Some comparisons I'm thinking of are:
my_dict[20] be faster than my_dict['twenty'])len of strings (e.g. would my_dict['a'] be faster than my_dict['abcdefg'])my_dict = {0: 'zero', 2: 'two'} perform faster than {0: 'zero', 'two': 2})I haven't been able to find this topic from a google search, so I thought maybe someone here might know.
First of all, I'd recommend you to understand How are Python's Built In Dictionaries Implemented.
Now, let's make a little random experiment to prove the theory (at least partially):
import timeit
import string
import random
import time
def random_str(N):
return ''.join(
random.choice(string.ascii_uppercase + string.digits) for _ in range(N)
)
def experiment1(dct, keys):
s = time.time()
{dct[k] for k in keys}
return time.time() - s
if __name__ == "__main__":
N = 10000
K = 200
S = 50000
dct1 = {random_str(K): None for k in range(N)}
dct2 = {i: None for i in range(N)}
keys1 = list(dct1.keys())
keys2 = list(dct2.keys())
samples1 = []
samples2 = []
for i in range(S):
samples1.append(experiment1(dct1, keys1))
samples2.append(experiment1(dct2, keys2))
print(sum(samples1), sum(samples2))
# print(
# timeit.timeit('{dct1[k] for k in keys1}', setup='from __main__ import dct1, keys1')
# )
# print(
# timeit.timeit('{dct2[k] for k in keys2}', setup='from __main__ import dct2, keys2')
# )
The results I've got with different sampling sizes on my box were:
So as you can see, no matter whether you use big random strings to lookup dictionaries or integers, the performance will remain almost the same. The only "real" difference you'd like to consider would be in terms of memory consumption of both dictionaries. That may be relevant when dumping/loading huge dictionaries to/from disk, in that case it may be worth to have compact form of your data structures so you'll be able to shave off few seconds when caching/reading them.
NS: If anyone is able to explain why I was getting really huge times when using timeit (commented parts) please let me... with little experiment constants I'd get really high values... that's why I left it uncommented. Add a comment if you know the reason ;D
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With