Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is python dict very slow for searching key?

In the code below, rel_column is a string, and 'id_map' is a python dict and it has 2.2 million key-value pair.

if rel_column in id_map:
   node_ids = id_map[rel_column]

In debugging a speed bottleneck issue, my colleague said that this piece of code is the cause of the slowness. He said that the 'if' statement takes log(n) time, but my impression has been that for searching a key in a map or set, the complexity has been constant time O(1). Considering that the the total size is only 2.2 millions of the dict, this should not be the speed bottleneck.

I am confused. Is my colleague right?

like image 384
marlon Avatar asked Mar 16 '26 22:03

marlon


1 Answers

When it doubt, benchmark. Arbitrarily choosing random strings of length 8 for key and value, and using the size specified in the question.

import random
import string
import timeit

def randstr(N):
    return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))

id_map = {randstr(8): randstr(8) for _ in range(int(2e6))}

# key not present
x = randstr(8)
print(timeit.timeit(stmt=lambda: id_map[x] if x in id_map else None, number=1000000))

# key present, first with in/[], then with .get()
x = random.choice(list(id_map.keys()))
print(timeit.timeit(stmt=lambda: id_map[x] if x in id_map else None, number=1000000))
print(timeit.timeit(stmt=lambda: id_map.get(x, None), number=1000000))

Results:

0.11468726862221956
0.13692271895706654
0.13748164474964142

Conclusions (Python 3.9, well-distributed keys, small data, 2M entries):

  • key not found is slightly faster than key found
  • in followed by [] is about the same speed as .get()
  • Both are on the order of 100ns per lookup
like image 194
Peter Avatar answered Mar 21 '26 03:03

Peter