Python dictionary lookup performance, get vs in

Tags:

This isn't premature optimization. My use case has the double-checking of dict's right in the inner-most of inner loops, running all the time. Also, it's intellectually irksome (see results).

Which of these approaches is faster?

mydict = { 'hello': 'yes', 'goodbye': 'no' }
key = 'hello'

# (A)
if key in mydict:
    a = mydict[key]
    do_things(a)
else:
    handle_an_error()

# vs (B)
a = mydict.get(key,None)
if a is not None:
    do_things(a)
else:
    handle_an_error()

Edit: these are the same speed. Common sense tells me that (B) should be noticeably faster since it's only one dict lookup vs. 2, but the results are different. I'm scratching my head.

Results of a benchmark averaged over 12 runs, 1/2 of which are hits, the other half are misses:

doing in
switching to get
total time for IN:  0.532250006994
total time for GET:  0.480916659037
times found: 12000000
times not found: 12000000

And when a similar one is run (*10 more loops) without ever finding the key,

doing in
switching to get
total time for IN:  2.35899998744
total time for GET:  4.13858334223

Why!?

(correct) code

import time
smalldict = {}
for i in range(10):
    smalldict[str(i*4)] = str(i*18)

smalldict["8"] = "hello"

bigdict = {}
for i in range(10000):
    bigdict[str(i*100)] = str(i*4123)
bigdict["hello"] = "yes!"

timetotal = 0
totalin = 0
totalget = 0
key = "hello"
found= 0
notfound = 0

ddo = bigdict # change to smalldict for small dict gets
print 'doing in'


for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(1000000):
        if a == 0:
            if str(key) in ddo:
                found = found + 1
                foo = ddo[str(key)]
            else:
                notfound = notfound + 1
                foo = "nooo"
        else:
            if 'yo' in ddo:
                found = found + 1
                foo = ddo['yo']
            else:
                notfound = notfound + 1
                foo = "nooo"
    timetotal = timetotal + (time.time() - start)

totalin = timetotal / 12.0 

print 'switching to get'
timetotal = 0
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(1000000):
        if a == 0:
            foo = ddo.get(key,None)
            if foo is not None:
                found = found + 1
            else:
                notfound = notfound + 1
                foo = "nooo"
        else:
            foo = ddo.get('yo',None)
            if foo is not None:
                found = found + 1
                notfound = notfound + 1
            else:
                notfound = notfound + 1
                foo = "oooo"
    timetotal = timetotal + (time.time() - start)

totalget = timetotal / 12

print "total time for IN: ", totalin
print 'total time for GET: ', totalget
print 'times found:', found
print 'times not found:', notfound

(original) code import time smalldict = {} for i in range(10): smalldict[str(i*4)] = str(i*18)

smalldict["8"] = "hello"

bigdict = {}
for i in range(10000):
    bigdict[str(i*100)] = str(i*4123)
bigdict["8000"] = "hello"

timetotal = 0
totalin = 0
totalget = 0
key = "hello"
found= 0
notfound = 0

ddo = bigdict # change to smalldict for small dict gets
print 'doing in'
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(10000000):
        if a == 0:
            if key in ddo:
                foo = ddo[key]
            else:
                foo = "nooo"
        else:
            if 'yo' in ddo:
                foo = ddo['yo']
            else:
                foo = "nooo"
    timetotal = timetotal + (time.time() - start)

totalin = timetotal / 12.0 

print 'switching to get'
timetotal = 0
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(10000000):
        if a == 0:
            foo = ddo.get(key,None)
            if foo is not None:
                # yaaay
                pass
            else:
                foo = "nooo"
        else:
            foo = ddo.get('yo',None)
            if foo is not None:
                #yaaay
                pass
            else:
                foo = "oooo"
    timetotal = timetotal + (time.time() - start)

totalget = timetotal / 12

print "total time for IN: ", totalin
print 'total time for GET: ', totalget

415

asked Sep 19 '16 21:09

std''OrgnlDave

1 Answers

We can do some better timings:

import timeit

d = dict.fromkeys(range(10000))

def d_get_has(d):
    return d.get(1)

def d_get_not_has(d):
    return d.get(-1)

def d_in_has(d):
    if 1 in d:
        return d[1]

def d_in_not_has(d):
    if -1 in d:
        return d[-1]


print timeit.timeit('d_get_has(d)', 'from __main__ import d, d_get_has')
print timeit.timeit('d_get_not_has(d)', 'from __main__ import d, d_get_not_has')
print timeit.timeit('d_in_has(d)', 'from __main__ import d, d_in_has')
print timeit.timeit('d_in_not_has(d)', 'from __main__ import d, d_in_not_has')

On my computer, the "in" variants are faster than the .get variants. This is probably because .get is an attribute lookup on the dict and an attribute lookup is likely to be as expensive as a membership test on the dict. Note that in and item lookup using dict[x] can be done directly in bytecode so the normal method lookups can be bypassed...

It also might be worth pointing out that I get a HUGE optimization if I just use pypy :-):

$ python ~/sandbox/test.py
0.169840812683
0.1732609272
0.122044086456
0.0991759300232

$ pypy ~/sandbox/test.py
0.00974893569946
0.00752687454224
0.00812077522278
0.00686597824097

answered Oct 12 '22 11:10

mgilson

Related questions
                            
                                Is `if x:` completely equivalent to `if bool(x) is True:`?
                            
                                Named string format arguments in Python
                            
                                How to filter data from a data frame when the number of columns are dynamic?
                            
                                How can I capture a key press (key logging) in Linux?
                            
                                what are the differences between import and extends in Flask?
                            
                                Execute flask-SQLAlchemy subquery
                            
                                How to put a JSON file's content in a response
                            
                                List comprehension works but not for loop––why?
                            
                                Finding the area of intersection of multiple overlapping rectangles in Python
                            
                                Opening a gzip file in python Apache Beam
                            
                                Do locally set Cython compiler directives affect one or all functions?
                            
                                additional column when saving pandas data frame to csv file
                            
                                Pandas Dataframe Line Plot: Show Random Markers
                            
                                Python Pandas read_excel doesn't recognize null cell
                            
                                Run multiple servers in python at same time (Threading)
                            
                                How to use yaml.load_all with fileinput.input?
                            
                                Divide two dataframes with python
                            
                                crontab to run python file if not running already
                            
                                How move a multipolygon with geopandas in python2
                            
                                Calculating the sum of a series?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python dictionary lookup performance, get vs in

Tags:

performance

python

dictionary

std''OrgnlDave

People also ask

1 Answers

mgilson

Recent Activity

Donate For Us