Intuitively I think key in dict
is faster than key in dict.keys()
since .keys()
creates a list of keys. This question is to confirm if this is true.
Just wondering if key in dict
internally creates/uses a list to find if the key is present?
Also, is one method faster than the other?
Short answer:
dict.keys()
slows down.in dict.keys()
performs like like in dict
Details for py2 and py3 follow below.
Python 2.7 answer:
Your assumption is correct, for the following reasons:
dict.keys()
involves an additional function call (stack overhead).dict.keys()
returns a list, which contains all keys in memory (as opposed to a lazy generator object). Hence it needs to allocate memory.key in dict
can use a set object internally, which is an indexed lookup. key in dict.keys()
is a linear search in a listI created a small benckmark script to show my point:
#! /usr/bin/python2.7
import datetime as dt
import random
dict_size = 1000000
num_iterations = 100
d = {i: i for i in xrange(dict_size)}
def f():
k = random.randint(0,dict_size-1)
return (k in d)
def g():
k = random.randint(0,dict_size-1)
return (k in d.keys())
def test(func):
t = dt.datetime.utcnow()
for i in xrange(num_iterations):
func()
print "%s --> %1.6f s" % (func, (dt.datetime.utcnow()-t).total_seconds())
test(f)
test(g)
Output (python 2.7.6 Ubuntu 14.04):
<function f at 0x7ff2e0126d70> --> 0.000598 s
<function g at 0x7ff2e0126de8> --> 5.191553 s
I also tested with the nr of iterations and the dict size swapped (dict of only 100 items, 1M iterations).
<function f at 0x7f94cb5e6d70> --> 3.614162 s
<function g at 0x7f94cb5e6de8> --> 7.007922 s
Here the results are much closer together.
So the performance difference really grows with the size of the dict.
Python 3 answer:
I adapted the script for python 3:
xrange
is gone, use range
instead. (not in the performance-critical inner loop of the test function, so limited performance impact)print
#!/usr/bin/python3
And tested with python 3.4.3 on the same machine.
dict_size = 1000000; num_iterations = 100
f --> 0.000590 s g --> 0.000565 s
dict_size = 100; num_iterations = 1000000
f --> 4.525487 s g --> 4.837232 s
So in python 3, the performance difference is gone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With