In the code below, rel_column is a string, and 'id_map' is a python dict and it has 2.2 million key-value pair.
if rel_column in id_map:
node_ids = id_map[rel_column]
In debugging a speed bottleneck issue, my colleague said that this piece of code is the cause of the slowness. He said that the 'if' statement takes log(n) time, but my impression has been that for searching a key in a map or set, the complexity has been constant time O(1). Considering that the the total size is only 2.2 millions of the dict, this should not be the speed bottleneck.
I am confused. Is my colleague right?
When it doubt, benchmark. Arbitrarily choosing random strings of length 8 for key and value, and using the size specified in the question.
import random
import string
import timeit
def randstr(N):
return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))
id_map = {randstr(8): randstr(8) for _ in range(int(2e6))}
# key not present
x = randstr(8)
print(timeit.timeit(stmt=lambda: id_map[x] if x in id_map else None, number=1000000))
# key present, first with in/[], then with .get()
x = random.choice(list(id_map.keys()))
print(timeit.timeit(stmt=lambda: id_map[x] if x in id_map else None, number=1000000))
print(timeit.timeit(stmt=lambda: id_map.get(x, None), number=1000000))
Results:
0.11468726862221956
0.13692271895706654
0.13748164474964142
Conclusions (Python 3.9, well-distributed keys, small data, 2M entries):
in followed by [] is about the same speed as .get()If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With