I read from python3 document, that python use hash table for dict(). So the search time complexity should be O(1) with O(N) as the worst case. However, recently as I took a course, the teacher says that happens only when you use int as the key. If you use a string of length L as keys the search time complexity is O(L).
I write a code snippet to test out his honesty
import random
import string
from time import time
import matplotlib.pyplot as plt
def randomString(stringLength=10):
"""Generate a random string of fixed length """
letters = string.ascii_lowercase
return ''.join(random.choice(letters) for i in range(stringLength))
def test(L):
#L: int length of keys
N = 1000 # number of keys
d = dict()
for i in range(N):
d[randomString(L)] = None
tic = time()
for key in d.keys():
d[key]
toc = time() - tic
tic = time()
for key in d.keys():
pass
t_idle = time() - tic
t_total = toc - t_idle
return t_total
L = [i * 10000 for i in range(5, 15)]
ans = [test(l) for l in L]
plt.figure()
plt.plot(L, ans)
plt.show()
The result is very interesting. As you can see, the x-axis is the length of the strings used as keys and the y-axis is the total time to query all 1000 keys in the dictionary.
Can anyone explain this result?
Please be gentle on me. As you can see, if I ask this basic question, that means I don't have the ability to read python source code or equivalently complex insider document.
Since a dictionary is a hashtable, and looking up a key in a hashtable requires computing the key's hash, then the time complexity of looking up the key in the dictionary cannot be less than the time complexity of the hash function.
In current versions of CPython, a string of length L takes O(L) time to compute the hash of if it's the first time you've hashed that particular string object, and O(1) time if the hash for that string object has already been computed (since the hash is stored):
>>> from timeit import timeit
>>> s = 'b' * (10**9) # string of length 1 billion
>>> timeit(lambda: hash(s), number=1)
0.48574538500002973 # half a second
>>> timeit(lambda: hash(s), number=1)
5.301000044255488e-06 # 5 microseconds
So that's also how long it takes when you look up the key in a dictionary:
>>> s = 'c' * (10**9) # string of length 1 billion
>>> d = dict()
>>> timeit(lambda: s in d, number=1)
0.48521506899999167 # half a second
>>> timeit(lambda: s in d, number=1)
4.491000026973779e-06 # 5 microseconds
You also need to be aware that a key in a dictionary is not looked up only by its hash: when the hashes match, it still needs to test that the key you looked up is equal to the key used in the dictionary, in case the hash matching is a false positive. Testing equality of strings takes O(L) time in the worst case:
>>> s1 = 'a'*(10**9)
>>> s2 = 'a'*(10**9)
>>> timeit(lambda: s1 == s2, number=1)
0.2006020820001595
So for a key of length L and a dictionary of length n:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With