While reading Python 101 by Michael Driscoll I got to his explanation on checking if a key exists in a dictionary. I checked it on my machine with a dictionary containing keys 'a'
to 'z'
where the value is their order, and a function that measures time on retrieving the key 't'
(picked it at random)
Here's my function:
def f(d, flag):
start_time = time.time()
if flag:
print("t" in d)
else:
print("t" in d.keys())
print("--- %s seconds ---" % (time.time() - start_time))
And here are the results:
>>> f(dict,True)
True
--- 0.03937530517578125 seconds ---
>>> f(dict,False)
True
--- 0.05114388465881348 seconds ---
BUT, I still don't get it. I thought that key in dict.keys()
would result in iteration on much smaller collection, which will be faster. Is there something special in the implementation of in
or keys()
that causes this?
Using dictionary.keys()
is slower because it does more work:
dictionary.keys
keys()
), requiring the current call frame to be pushed onto the stack, and popped afterwards.None of this is needed, because a containment test against the dictionary and the dictionary view on the keys test the exact same thing. Containment testing directly on a dictionary does not include the values, you are testing for keys only, in both cases.
From the dict()
object documentation:
key in d
ReturnTrue
if d has a key key, elseFalse
.
Note that using walk-clock time is not a great way of testing for performance differences. Use the timeit
module instead, which picks the best performing timer, disables the GC to eliminate a source of skew, and repeats the test many times to minimise system skew.
You can reproduce the time difference by testing for the additional steps above separately (combining the call and object creation into one). By default timeit.timet()
repeats the test 1 million times and returns the total time taken:
>>> import timeit
>>> from string import ascii_lowercase
>>> d = {l: i for i, l in enumerate(ascii_lowercase)}
>>> 't' in d
True
>>> timeit.timeit('d.keys', globals={'d': d})
0.0439452639548108
>>> timeit.timeit('keys()', globals={'keys': d.keys})
0.06267352704890072
So merely looking up the .keys
attribute 1 million times already takes 44 milliseconds, while calling the method (without attribute lookup) adds another 63ms. Both methods have some overhead for looking up the global name however:
>>> timeit.timeit('d', globals={'d': d})
0.027833244064822793
So one would expect there to be a 107 - 28 == 79ms (roughly) difference between the two methods.
And indeed, the time difference between using 't' in d
and 't' in d.keys()
is about that much:
>>> timeit.timeit('"t" in d.keys()', globals={'d': d})
0.11647015693597496
>>> timeit.timeit('"t" in d', globals={'d': d})
0.0370339349610731
116 - 37 is 79 milliseconds, as predicted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With