Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python is 'key in dict' different/faster than 'key in dict.keys()' [duplicate]

Intuitively I think key in dict is faster than key in dict.keys() since .keys() creates a list of keys. This question is to confirm if this is true.

Just wondering if key in dict internally creates/uses a list to find if the key is present?

Also, is one method faster than the other?

like image 922
Yogesh Mangaj Avatar asked Jul 31 '15 05:07

Yogesh Mangaj


1 Answers

Short answer:

  • In python 2: your assumption is correct: dict.keys() slows down.
  • In python 3: your assumption is not correct: in dict.keys() performs like like in dict

Details for py2 and py3 follow below.

Python 2.7 answer:

Your assumption is correct, for the following reasons:

  • dict.keys() involves an additional function call (stack overhead).
  • dict.keys() returns a list, which contains all keys in memory (as opposed to a lazy generator object). Hence it needs to allocate memory.
  • key in dict can use a set object internally, which is an indexed lookup. key in dict.keys() is a linear search in a list

I created a small benckmark script to show my point:

#! /usr/bin/python2.7

import datetime as dt
import random

dict_size = 1000000
num_iterations = 100

d = {i: i for i in xrange(dict_size)}

def f():
    k = random.randint(0,dict_size-1)
    return (k in d)

def g():
    k = random.randint(0,dict_size-1)
    return (k in d.keys())

def test(func):
    t = dt.datetime.utcnow()
    for i in xrange(num_iterations):
        func()
    print "%s --> %1.6f s" % (func, (dt.datetime.utcnow()-t).total_seconds())

test(f)
test(g)

Output (python 2.7.6 Ubuntu 14.04):

<function f at 0x7ff2e0126d70> --> 0.000598 s
<function g at 0x7ff2e0126de8> --> 5.191553 s

I also tested with the nr of iterations and the dict size swapped (dict of only 100 items, 1M iterations).

<function f at 0x7f94cb5e6d70> --> 3.614162 s
<function g at 0x7f94cb5e6de8> --> 7.007922 s

Here the results are much closer together.

So the performance difference really grows with the size of the dict.

Python 3 answer:

I adapted the script for python 3:

  • xrange is gone, use range instead. (not in the performance-critical inner loop of the test function, so limited performance impact)
  • use braces with print
  • change shabang line to #!/usr/bin/python3

And tested with python 3.4.3 on the same machine.

  • dict_size = 1000000; num_iterations = 100

    f --> 0.000590 s g --> 0.000565 s

  • dict_size = 100; num_iterations = 1000000

    f --> 4.525487 s g --> 4.837232 s

So in python 3, the performance difference is gone.

like image 99
5 revs Avatar answered Oct 23 '22 10:10

5 revs